KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without panicking
Currently, if a machine check interrupt happens while we are in the
guest, we exit the guest and call the host's machine check handler,
which tends to cause the host to panic. Some machine checks can be
triggered by the guest; for example, if the guest creates two entries
in the SLB that map the same effective address, and then accesses that
effective address, the CPU will take a machine check interrupt.
To handle this better, when a machine check happens inside the guest,
we call a new function, kvmppc_realmode_machine_check(), while still in
real mode before exiting the guest. On POWER7, it handles the cases
that the guest can trigger, either by flushing and reloading the SLB,
or by flushing the TLB, and then it delivers the machine check interrupt
directly to the guest without going back to the host. On POWER7, the
OPAL firmware patches the machine check interrupt vector so that it
gets control first, and it leaves behind its analysis of the situation
in a structure pointed to by the opal_mc_evt field of the paca. The
kvmppc_realmode_machine_check() function looks at this, and if OPAL
reports that there was no error, or that it has handled the error, we
also go straight back to the guest with a machine check. We have to
deliver a machine check to the guest since the machine check interrupt
might have trashed valid values in SRR0/1.
If the machine check is one we can't handle in real mode, and one that
OPAL hasn't already handled, or on PPC970, we exit the guest and call
the host's machine check handler. We do this by jumping to the
machine_check_fwnmi label, rather than absolute address 0x200, because
we don't want to re-execute OPAL's handler on POWER7. On PPC970, the
two are equivalent because address 0x200 just contains a branch.
Then, if the host machine check handler decides that the system can
continue executing, kvmppc_handle_exit() delivers a machine check
interrupt to the guest -- once again to let the guest know that SRR0/1
have been modified.
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix checkpatch warnings]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-11-24 06:37:50 +08:00
|
|
|
/*
|
|
|
|
* This program is free software; you can redistribute it and/or modify
|
|
|
|
* it under the terms of the GNU General Public License, version 2, as
|
|
|
|
* published by the Free Software Foundation.
|
|
|
|
*
|
|
|
|
* Copyright 2012 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <linux/types.h>
|
|
|
|
#include <linux/string.h>
|
|
|
|
#include <linux/kvm.h>
|
|
|
|
#include <linux/kvm_host.h>
|
|
|
|
#include <linux/kernel.h>
|
|
|
|
#include <asm/opal.h>
|
2013-10-30 22:35:40 +08:00
|
|
|
#include <asm/mce.h>
|
KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without panicking
Currently, if a machine check interrupt happens while we are in the
guest, we exit the guest and call the host's machine check handler,
which tends to cause the host to panic. Some machine checks can be
triggered by the guest; for example, if the guest creates two entries
in the SLB that map the same effective address, and then accesses that
effective address, the CPU will take a machine check interrupt.
To handle this better, when a machine check happens inside the guest,
we call a new function, kvmppc_realmode_machine_check(), while still in
real mode before exiting the guest. On POWER7, it handles the cases
that the guest can trigger, either by flushing and reloading the SLB,
or by flushing the TLB, and then it delivers the machine check interrupt
directly to the guest without going back to the host. On POWER7, the
OPAL firmware patches the machine check interrupt vector so that it
gets control first, and it leaves behind its analysis of the situation
in a structure pointed to by the opal_mc_evt field of the paca. The
kvmppc_realmode_machine_check() function looks at this, and if OPAL
reports that there was no error, or that it has handled the error, we
also go straight back to the guest with a machine check. We have to
deliver a machine check to the guest since the machine check interrupt
might have trashed valid values in SRR0/1.
If the machine check is one we can't handle in real mode, and one that
OPAL hasn't already handled, or on PPC970, we exit the guest and call
the host's machine check handler. We do this by jumping to the
machine_check_fwnmi label, rather than absolute address 0x200, because
we don't want to re-execute OPAL's handler on POWER7. On PPC970, the
two are equivalent because address 0x200 just contains a branch.
Then, if the host machine check handler decides that the system can
continue executing, kvmppc_handle_exit() delivers a machine check
interrupt to the guest -- once again to let the guest know that SRR0/1
have been modified.
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix checkpatch warnings]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-11-24 06:37:50 +08:00
|
|
|
|
|
|
|
/* SRR1 bits for machine check on POWER7 */
|
|
|
|
#define SRR1_MC_LDSTERR (1ul << (63-42))
|
|
|
|
#define SRR1_MC_IFETCH_SH (63-45)
|
|
|
|
#define SRR1_MC_IFETCH_MASK 0x7
|
|
|
|
#define SRR1_MC_IFETCH_SLBPAR 2 /* SLB parity error */
|
|
|
|
#define SRR1_MC_IFETCH_SLBMULTI 3 /* SLB multi-hit */
|
|
|
|
#define SRR1_MC_IFETCH_SLBPARMULTI 4 /* SLB parity + multi-hit */
|
|
|
|
#define SRR1_MC_IFETCH_TLBMULTI 5 /* I-TLB multi-hit */
|
|
|
|
|
|
|
|
/* DSISR bits for machine check on POWER7 */
|
|
|
|
#define DSISR_MC_DERAT_MULTI 0x800 /* D-ERAT multi-hit */
|
|
|
|
#define DSISR_MC_TLB_MULTI 0x400 /* D-TLB multi-hit */
|
|
|
|
#define DSISR_MC_SLB_PARITY 0x100 /* SLB parity error */
|
|
|
|
#define DSISR_MC_SLB_MULTI 0x080 /* SLB multi-hit */
|
|
|
|
#define DSISR_MC_SLB_PARMULTI 0x040 /* SLB parity + multi-hit */
|
|
|
|
|
|
|
|
/* POWER7 SLB flush and reload */
|
|
|
|
static void reload_slb(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
struct slb_shadow *slb;
|
|
|
|
unsigned long i, n;
|
|
|
|
|
|
|
|
/* First clear out SLB */
|
|
|
|
asm volatile("slbmte %0,%0; slbia" : : "r" (0));
|
|
|
|
|
|
|
|
/* Do they have an SLB shadow buffer registered? */
|
|
|
|
slb = vcpu->arch.slb_shadow.pinned_addr;
|
|
|
|
if (!slb)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/* Sanity check */
|
2014-06-11 16:34:19 +08:00
|
|
|
n = min_t(u32, be32_to_cpu(slb->persistent), SLB_MIN_SIZE);
|
KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without panicking
Currently, if a machine check interrupt happens while we are in the
guest, we exit the guest and call the host's machine check handler,
which tends to cause the host to panic. Some machine checks can be
triggered by the guest; for example, if the guest creates two entries
in the SLB that map the same effective address, and then accesses that
effective address, the CPU will take a machine check interrupt.
To handle this better, when a machine check happens inside the guest,
we call a new function, kvmppc_realmode_machine_check(), while still in
real mode before exiting the guest. On POWER7, it handles the cases
that the guest can trigger, either by flushing and reloading the SLB,
or by flushing the TLB, and then it delivers the machine check interrupt
directly to the guest without going back to the host. On POWER7, the
OPAL firmware patches the machine check interrupt vector so that it
gets control first, and it leaves behind its analysis of the situation
in a structure pointed to by the opal_mc_evt field of the paca. The
kvmppc_realmode_machine_check() function looks at this, and if OPAL
reports that there was no error, or that it has handled the error, we
also go straight back to the guest with a machine check. We have to
deliver a machine check to the guest since the machine check interrupt
might have trashed valid values in SRR0/1.
If the machine check is one we can't handle in real mode, and one that
OPAL hasn't already handled, or on PPC970, we exit the guest and call
the host's machine check handler. We do this by jumping to the
machine_check_fwnmi label, rather than absolute address 0x200, because
we don't want to re-execute OPAL's handler on POWER7. On PPC970, the
two are equivalent because address 0x200 just contains a branch.
Then, if the host machine check handler decides that the system can
continue executing, kvmppc_handle_exit() delivers a machine check
interrupt to the guest -- once again to let the guest know that SRR0/1
have been modified.
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix checkpatch warnings]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-11-24 06:37:50 +08:00
|
|
|
if ((void *) &slb->save_area[n] > vcpu->arch.slb_shadow.pinned_end)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/* Load up the SLB from that */
|
|
|
|
for (i = 0; i < n; ++i) {
|
2014-06-11 16:34:19 +08:00
|
|
|
unsigned long rb = be64_to_cpu(slb->save_area[i].esid);
|
|
|
|
unsigned long rs = be64_to_cpu(slb->save_area[i].vsid);
|
KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without panicking
Currently, if a machine check interrupt happens while we are in the
guest, we exit the guest and call the host's machine check handler,
which tends to cause the host to panic. Some machine checks can be
triggered by the guest; for example, if the guest creates two entries
in the SLB that map the same effective address, and then accesses that
effective address, the CPU will take a machine check interrupt.
To handle this better, when a machine check happens inside the guest,
we call a new function, kvmppc_realmode_machine_check(), while still in
real mode before exiting the guest. On POWER7, it handles the cases
that the guest can trigger, either by flushing and reloading the SLB,
or by flushing the TLB, and then it delivers the machine check interrupt
directly to the guest without going back to the host. On POWER7, the
OPAL firmware patches the machine check interrupt vector so that it
gets control first, and it leaves behind its analysis of the situation
in a structure pointed to by the opal_mc_evt field of the paca. The
kvmppc_realmode_machine_check() function looks at this, and if OPAL
reports that there was no error, or that it has handled the error, we
also go straight back to the guest with a machine check. We have to
deliver a machine check to the guest since the machine check interrupt
might have trashed valid values in SRR0/1.
If the machine check is one we can't handle in real mode, and one that
OPAL hasn't already handled, or on PPC970, we exit the guest and call
the host's machine check handler. We do this by jumping to the
machine_check_fwnmi label, rather than absolute address 0x200, because
we don't want to re-execute OPAL's handler on POWER7. On PPC970, the
two are equivalent because address 0x200 just contains a branch.
Then, if the host machine check handler decides that the system can
continue executing, kvmppc_handle_exit() delivers a machine check
interrupt to the guest -- once again to let the guest know that SRR0/1
have been modified.
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix checkpatch warnings]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-11-24 06:37:50 +08:00
|
|
|
|
|
|
|
rb = (rb & ~0xFFFul) | i; /* insert entry number */
|
|
|
|
asm volatile("slbmte %0,%1" : : "r" (rs), "r" (rb));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* On POWER7, see if we can handle a machine check that occurred inside
|
|
|
|
* the guest in real mode, without switching to the host partition.
|
|
|
|
*
|
|
|
|
* Returns: 0 => exit guest, 1 => deliver machine check to guest
|
|
|
|
*/
|
|
|
|
static long kvmppc_realmode_mc_power7(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
unsigned long srr1 = vcpu->arch.shregs.msr;
|
2013-10-30 22:35:40 +08:00
|
|
|
struct machine_check_event mce_evt;
|
KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without panicking
Currently, if a machine check interrupt happens while we are in the
guest, we exit the guest and call the host's machine check handler,
which tends to cause the host to panic. Some machine checks can be
triggered by the guest; for example, if the guest creates two entries
in the SLB that map the same effective address, and then accesses that
effective address, the CPU will take a machine check interrupt.
To handle this better, when a machine check happens inside the guest,
we call a new function, kvmppc_realmode_machine_check(), while still in
real mode before exiting the guest. On POWER7, it handles the cases
that the guest can trigger, either by flushing and reloading the SLB,
or by flushing the TLB, and then it delivers the machine check interrupt
directly to the guest without going back to the host. On POWER7, the
OPAL firmware patches the machine check interrupt vector so that it
gets control first, and it leaves behind its analysis of the situation
in a structure pointed to by the opal_mc_evt field of the paca. The
kvmppc_realmode_machine_check() function looks at this, and if OPAL
reports that there was no error, or that it has handled the error, we
also go straight back to the guest with a machine check. We have to
deliver a machine check to the guest since the machine check interrupt
might have trashed valid values in SRR0/1.
If the machine check is one we can't handle in real mode, and one that
OPAL hasn't already handled, or on PPC970, we exit the guest and call
the host's machine check handler. We do this by jumping to the
machine_check_fwnmi label, rather than absolute address 0x200, because
we don't want to re-execute OPAL's handler on POWER7. On PPC970, the
two are equivalent because address 0x200 just contains a branch.
Then, if the host machine check handler decides that the system can
continue executing, kvmppc_handle_exit() delivers a machine check
interrupt to the guest -- once again to let the guest know that SRR0/1
have been modified.
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix checkpatch warnings]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-11-24 06:37:50 +08:00
|
|
|
long handled = 1;
|
|
|
|
|
|
|
|
if (srr1 & SRR1_MC_LDSTERR) {
|
|
|
|
/* error on load/store */
|
|
|
|
unsigned long dsisr = vcpu->arch.shregs.dsisr;
|
|
|
|
|
|
|
|
if (dsisr & (DSISR_MC_SLB_PARMULTI | DSISR_MC_SLB_MULTI |
|
|
|
|
DSISR_MC_SLB_PARITY | DSISR_MC_DERAT_MULTI)) {
|
|
|
|
/* flush and reload SLB; flushes D-ERAT too */
|
|
|
|
reload_slb(vcpu);
|
|
|
|
dsisr &= ~(DSISR_MC_SLB_PARMULTI | DSISR_MC_SLB_MULTI |
|
|
|
|
DSISR_MC_SLB_PARITY | DSISR_MC_DERAT_MULTI);
|
|
|
|
}
|
|
|
|
if (dsisr & DSISR_MC_TLB_MULTI) {
|
2013-10-30 22:34:56 +08:00
|
|
|
if (cur_cpu_spec && cur_cpu_spec->flush_tlb)
|
|
|
|
cur_cpu_spec->flush_tlb(TLBIEL_INVAL_SET_LPID);
|
KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without panicking
Currently, if a machine check interrupt happens while we are in the
guest, we exit the guest and call the host's machine check handler,
which tends to cause the host to panic. Some machine checks can be
triggered by the guest; for example, if the guest creates two entries
in the SLB that map the same effective address, and then accesses that
effective address, the CPU will take a machine check interrupt.
To handle this better, when a machine check happens inside the guest,
we call a new function, kvmppc_realmode_machine_check(), while still in
real mode before exiting the guest. On POWER7, it handles the cases
that the guest can trigger, either by flushing and reloading the SLB,
or by flushing the TLB, and then it delivers the machine check interrupt
directly to the guest without going back to the host. On POWER7, the
OPAL firmware patches the machine check interrupt vector so that it
gets control first, and it leaves behind its analysis of the situation
in a structure pointed to by the opal_mc_evt field of the paca. The
kvmppc_realmode_machine_check() function looks at this, and if OPAL
reports that there was no error, or that it has handled the error, we
also go straight back to the guest with a machine check. We have to
deliver a machine check to the guest since the machine check interrupt
might have trashed valid values in SRR0/1.
If the machine check is one we can't handle in real mode, and one that
OPAL hasn't already handled, or on PPC970, we exit the guest and call
the host's machine check handler. We do this by jumping to the
machine_check_fwnmi label, rather than absolute address 0x200, because
we don't want to re-execute OPAL's handler on POWER7. On PPC970, the
two are equivalent because address 0x200 just contains a branch.
Then, if the host machine check handler decides that the system can
continue executing, kvmppc_handle_exit() delivers a machine check
interrupt to the guest -- once again to let the guest know that SRR0/1
have been modified.
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix checkpatch warnings]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-11-24 06:37:50 +08:00
|
|
|
dsisr &= ~DSISR_MC_TLB_MULTI;
|
|
|
|
}
|
|
|
|
/* Any other errors we don't understand? */
|
|
|
|
if (dsisr & 0xffffffffUL)
|
|
|
|
handled = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
switch ((srr1 >> SRR1_MC_IFETCH_SH) & SRR1_MC_IFETCH_MASK) {
|
|
|
|
case 0:
|
|
|
|
break;
|
|
|
|
case SRR1_MC_IFETCH_SLBPAR:
|
|
|
|
case SRR1_MC_IFETCH_SLBMULTI:
|
|
|
|
case SRR1_MC_IFETCH_SLBPARMULTI:
|
|
|
|
reload_slb(vcpu);
|
|
|
|
break;
|
|
|
|
case SRR1_MC_IFETCH_TLBMULTI:
|
2013-10-30 22:34:56 +08:00
|
|
|
if (cur_cpu_spec && cur_cpu_spec->flush_tlb)
|
|
|
|
cur_cpu_spec->flush_tlb(TLBIEL_INVAL_SET_LPID);
|
KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without panicking
Currently, if a machine check interrupt happens while we are in the
guest, we exit the guest and call the host's machine check handler,
which tends to cause the host to panic. Some machine checks can be
triggered by the guest; for example, if the guest creates two entries
in the SLB that map the same effective address, and then accesses that
effective address, the CPU will take a machine check interrupt.
To handle this better, when a machine check happens inside the guest,
we call a new function, kvmppc_realmode_machine_check(), while still in
real mode before exiting the guest. On POWER7, it handles the cases
that the guest can trigger, either by flushing and reloading the SLB,
or by flushing the TLB, and then it delivers the machine check interrupt
directly to the guest without going back to the host. On POWER7, the
OPAL firmware patches the machine check interrupt vector so that it
gets control first, and it leaves behind its analysis of the situation
in a structure pointed to by the opal_mc_evt field of the paca. The
kvmppc_realmode_machine_check() function looks at this, and if OPAL
reports that there was no error, or that it has handled the error, we
also go straight back to the guest with a machine check. We have to
deliver a machine check to the guest since the machine check interrupt
might have trashed valid values in SRR0/1.
If the machine check is one we can't handle in real mode, and one that
OPAL hasn't already handled, or on PPC970, we exit the guest and call
the host's machine check handler. We do this by jumping to the
machine_check_fwnmi label, rather than absolute address 0x200, because
we don't want to re-execute OPAL's handler on POWER7. On PPC970, the
two are equivalent because address 0x200 just contains a branch.
Then, if the host machine check handler decides that the system can
continue executing, kvmppc_handle_exit() delivers a machine check
interrupt to the guest -- once again to let the guest know that SRR0/1
have been modified.
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix checkpatch warnings]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-11-24 06:37:50 +08:00
|
|
|
break;
|
|
|
|
default:
|
|
|
|
handled = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2013-10-30 22:35:40 +08:00
|
|
|
* See if we have already handled the condition in the linux host.
|
|
|
|
* We assume that if the condition is recovered then linux host
|
KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without panicking
Currently, if a machine check interrupt happens while we are in the
guest, we exit the guest and call the host's machine check handler,
which tends to cause the host to panic. Some machine checks can be
triggered by the guest; for example, if the guest creates two entries
in the SLB that map the same effective address, and then accesses that
effective address, the CPU will take a machine check interrupt.
To handle this better, when a machine check happens inside the guest,
we call a new function, kvmppc_realmode_machine_check(), while still in
real mode before exiting the guest. On POWER7, it handles the cases
that the guest can trigger, either by flushing and reloading the SLB,
or by flushing the TLB, and then it delivers the machine check interrupt
directly to the guest without going back to the host. On POWER7, the
OPAL firmware patches the machine check interrupt vector so that it
gets control first, and it leaves behind its analysis of the situation
in a structure pointed to by the opal_mc_evt field of the paca. The
kvmppc_realmode_machine_check() function looks at this, and if OPAL
reports that there was no error, or that it has handled the error, we
also go straight back to the guest with a machine check. We have to
deliver a machine check to the guest since the machine check interrupt
might have trashed valid values in SRR0/1.
If the machine check is one we can't handle in real mode, and one that
OPAL hasn't already handled, or on PPC970, we exit the guest and call
the host's machine check handler. We do this by jumping to the
machine_check_fwnmi label, rather than absolute address 0x200, because
we don't want to re-execute OPAL's handler on POWER7. On PPC970, the
two are equivalent because address 0x200 just contains a branch.
Then, if the host machine check handler decides that the system can
continue executing, kvmppc_handle_exit() delivers a machine check
interrupt to the guest -- once again to let the guest know that SRR0/1
have been modified.
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix checkpatch warnings]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-11-24 06:37:50 +08:00
|
|
|
* will have generated an error log event that we will pick
|
|
|
|
* up and log later.
|
2014-06-11 16:48:21 +08:00
|
|
|
* Don't release mce event now. We will queue up the event so that
|
|
|
|
* we can log the MCE event info on host console.
|
KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without panicking
Currently, if a machine check interrupt happens while we are in the
guest, we exit the guest and call the host's machine check handler,
which tends to cause the host to panic. Some machine checks can be
triggered by the guest; for example, if the guest creates two entries
in the SLB that map the same effective address, and then accesses that
effective address, the CPU will take a machine check interrupt.
To handle this better, when a machine check happens inside the guest,
we call a new function, kvmppc_realmode_machine_check(), while still in
real mode before exiting the guest. On POWER7, it handles the cases
that the guest can trigger, either by flushing and reloading the SLB,
or by flushing the TLB, and then it delivers the machine check interrupt
directly to the guest without going back to the host. On POWER7, the
OPAL firmware patches the machine check interrupt vector so that it
gets control first, and it leaves behind its analysis of the situation
in a structure pointed to by the opal_mc_evt field of the paca. The
kvmppc_realmode_machine_check() function looks at this, and if OPAL
reports that there was no error, or that it has handled the error, we
also go straight back to the guest with a machine check. We have to
deliver a machine check to the guest since the machine check interrupt
might have trashed valid values in SRR0/1.
If the machine check is one we can't handle in real mode, and one that
OPAL hasn't already handled, or on PPC970, we exit the guest and call
the host's machine check handler. We do this by jumping to the
machine_check_fwnmi label, rather than absolute address 0x200, because
we don't want to re-execute OPAL's handler on POWER7. On PPC970, the
two are equivalent because address 0x200 just contains a branch.
Then, if the host machine check handler decides that the system can
continue executing, kvmppc_handle_exit() delivers a machine check
interrupt to the guest -- once again to let the guest know that SRR0/1
have been modified.
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix checkpatch warnings]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-11-24 06:37:50 +08:00
|
|
|
*/
|
2013-10-30 22:35:40 +08:00
|
|
|
if (!get_mce_event(&mce_evt, MCE_EVENT_DONTRELEASE))
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
if (mce_evt.version == MCE_V1 &&
|
|
|
|
(mce_evt.severity == MCE_SEV_NO_ERROR ||
|
|
|
|
mce_evt.disposition == MCE_DISPOSITION_RECOVERED))
|
KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without panicking
Currently, if a machine check interrupt happens while we are in the
guest, we exit the guest and call the host's machine check handler,
which tends to cause the host to panic. Some machine checks can be
triggered by the guest; for example, if the guest creates two entries
in the SLB that map the same effective address, and then accesses that
effective address, the CPU will take a machine check interrupt.
To handle this better, when a machine check happens inside the guest,
we call a new function, kvmppc_realmode_machine_check(), while still in
real mode before exiting the guest. On POWER7, it handles the cases
that the guest can trigger, either by flushing and reloading the SLB,
or by flushing the TLB, and then it delivers the machine check interrupt
directly to the guest without going back to the host. On POWER7, the
OPAL firmware patches the machine check interrupt vector so that it
gets control first, and it leaves behind its analysis of the situation
in a structure pointed to by the opal_mc_evt field of the paca. The
kvmppc_realmode_machine_check() function looks at this, and if OPAL
reports that there was no error, or that it has handled the error, we
also go straight back to the guest with a machine check. We have to
deliver a machine check to the guest since the machine check interrupt
might have trashed valid values in SRR0/1.
If the machine check is one we can't handle in real mode, and one that
OPAL hasn't already handled, or on PPC970, we exit the guest and call
the host's machine check handler. We do this by jumping to the
machine_check_fwnmi label, rather than absolute address 0x200, because
we don't want to re-execute OPAL's handler on POWER7. On PPC970, the
two are equivalent because address 0x200 just contains a branch.
Then, if the host machine check handler decides that the system can
continue executing, kvmppc_handle_exit() delivers a machine check
interrupt to the guest -- once again to let the guest know that SRR0/1
have been modified.
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix checkpatch warnings]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-11-24 06:37:50 +08:00
|
|
|
handled = 1;
|
|
|
|
|
2013-10-30 22:35:40 +08:00
|
|
|
out:
|
|
|
|
/*
|
2014-06-11 16:48:21 +08:00
|
|
|
* We are now going enter guest either through machine check
|
|
|
|
* interrupt (for unhandled errors) or will continue from
|
|
|
|
* current HSRR0 (for handled errors) in guest. Hence
|
|
|
|
* queue up the event so that we can log it from host console later.
|
2013-10-30 22:35:40 +08:00
|
|
|
*/
|
2014-06-11 16:48:21 +08:00
|
|
|
machine_check_queue_event();
|
KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without panicking
Currently, if a machine check interrupt happens while we are in the
guest, we exit the guest and call the host's machine check handler,
which tends to cause the host to panic. Some machine checks can be
triggered by the guest; for example, if the guest creates two entries
in the SLB that map the same effective address, and then accesses that
effective address, the CPU will take a machine check interrupt.
To handle this better, when a machine check happens inside the guest,
we call a new function, kvmppc_realmode_machine_check(), while still in
real mode before exiting the guest. On POWER7, it handles the cases
that the guest can trigger, either by flushing and reloading the SLB,
or by flushing the TLB, and then it delivers the machine check interrupt
directly to the guest without going back to the host. On POWER7, the
OPAL firmware patches the machine check interrupt vector so that it
gets control first, and it leaves behind its analysis of the situation
in a structure pointed to by the opal_mc_evt field of the paca. The
kvmppc_realmode_machine_check() function looks at this, and if OPAL
reports that there was no error, or that it has handled the error, we
also go straight back to the guest with a machine check. We have to
deliver a machine check to the guest since the machine check interrupt
might have trashed valid values in SRR0/1.
If the machine check is one we can't handle in real mode, and one that
OPAL hasn't already handled, or on PPC970, we exit the guest and call
the host's machine check handler. We do this by jumping to the
machine_check_fwnmi label, rather than absolute address 0x200, because
we don't want to re-execute OPAL's handler on POWER7. On PPC970, the
two are equivalent because address 0x200 just contains a branch.
Then, if the host machine check handler decides that the system can
continue executing, kvmppc_handle_exit() delivers a machine check
interrupt to the guest -- once again to let the guest know that SRR0/1
have been modified.
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix checkpatch warnings]
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-11-24 06:37:50 +08:00
|
|
|
|
|
|
|
return handled;
|
|
|
|
}
|
|
|
|
|
|
|
|
long kvmppc_realmode_machine_check(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (cpu_has_feature(CPU_FTR_ARCH_206))
|
|
|
|
return kvmppc_realmode_mc_power7(vcpu);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|