linux/arch/x86/include/asm/nmi.h

67 lines
1.4 KiB
C
Raw Normal View History

#ifndef _ASM_X86_NMI_H
#define _ASM_X86_NMI_H
#include <linux/irq_work.h>
#include <linux/pm.h>
#include <asm/irq.h>
#include <asm/io.h>
x86, nmi_watchdog: Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR The x86 arch has shifted its use of the nmi_watchdog from a local implementation to the global one provide by kernel/watchdog.c. This shift has caused a whole bunch of compile problems under different config options. I attempt to simplify things with the patch below. In order to simplify things, I had to come to terms with the meaning of two terms ARCH_HAS_NMI_WATCHDOG and CONFIG_HARDLOCKUP_DETECTOR. Basically they mean the same thing, the former on a local level and the latter on a global level. With the old x86 nmi watchdog gone, there is no need to rely on defining the ARCH_HAS_NMI_WATCHDOG variable because it doesn't make sense any more. x86 will now use the global implementation. The changes below do a few things. First it changes the few places that relied on ARCH_HAS_NMI_WATCHDOG to use CONFIG_X86_LOCAL_APIC (the former was an alias for the latter anyway, so nothing unusual here). Those pieces of code were relying more on local apic functionality the nmi watchdog functionality, so the change should make sense. Second, I removed the x86 implementation of touch_nmi_watchdog(). It isn't need now, instead x86 will rely on kernel/watchdog.c's implementation. Third, I removed the #define ARCH_HAS_NMI_WATCHDOG itself from x86. And tweaked the include/linux/nmi.h file to tell users to look for an externally defined touch_nmi_watchdog in the case of ARCH_HAS_NMI_WATCHDOG _or_ CONFIG_HARDLOCKUP_DETECTOR. This changes removes some of the ugliness in that file. Finally, I added a Kconfig dependency for CONFIG_HARDLOCKUP_DETECTOR that said you can't have ARCH_HAS_NMI_WATCHDOG _and_ CONFIG_HARDLOCKUP_DETECTOR. You can only have one nmi_watchdog. Tested with ARCH=i386: allnoconfig, defconfig, allyesconfig, (various broken configs) ARCH=x86_64: allnoconfig, defconfig, allyesconfig, (various broken configs) Hopefully, after this patch I won't get any more compile broken emails. :-) v3: changed a couple of 'linux/nmi.h' -> 'asm/nmi.h' to pick-up correct function prototypes when CONFIG_HARDLOCKUP_DETECTOR is not set. Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: fweisbec@gmail.com LKML-Reference: <1293044403-14117-1-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-23 03:00:03 +08:00
#ifdef CONFIG_X86_LOCAL_APIC
extern int avail_to_resrv_perfctr_nmi_bit(unsigned int);
extern int reserve_perfctr_nmi(unsigned int);
extern void release_perfctr_nmi(unsigned int);
extern int reserve_evntsel_nmi(unsigned int);
extern void release_evntsel_nmi(unsigned int);
struct ctl_table;
extern int proc_nmi_enabled(struct ctl_table *, int ,
void __user *, size_t *, loff_t *);
extern int unknown_nmi_panic;
#endif /* CONFIG_X86_LOCAL_APIC */
x86, nmi: Create new NMI handler routines The NMI handlers used to rely on the notifier infrastructure. This worked great until we wanted to support handling multiple events better. One of the key ideas to the nmi handling is to process _all_ the handlers for each NMI. The reason behind this switch is because NMIs are edge triggered. If enough NMIs are triggered, then they could be lost because the cpu can only latch at most one NMI (besides the one currently being processed). In order to deal with this we have decided to process all the NMI handlers for each NMI. This allows the handlers to determine if they recieved an event or not (the ones that can not determine this will be left to fend for themselves on the unknown NMI list). As a result of this change it is now possible to have an extra NMI that was destined to be received for an already processed event. Because the event was processed in the previous NMI, this NMI gets dropped and becomes an 'unknown' NMI. This of course will cause printks that scare people. However, we prefer to have extra NMIs as opposed to losing NMIs and as such are have developed a basic mechanism to catch most of them. That will be a later patch. To accomplish this idea, I unhooked the nmi handlers from the notifier routines and created a new mechanism loosely based on doIRQ. The reason for this is the notifier routines have a couple of shortcomings. One we could't guarantee all future NMI handlers used NOTIFY_OK instead of NOTIFY_STOP. Second, we couldn't keep track of the number of events being handled in each routine (most only handle one, perf can handle more than one). Third, I wanted to eventually display which nmi handlers are registered in the system in /proc/interrupts to help see who is generating NMIs. The patch below just implements the new infrastructure but doesn't wire it up yet (that is the next patch). Its design is based on doIRQ structs and the atomic notifier routines. So the rcu stuff in the patch isn't entirely untested (as the notifier routines have soaked it) but it should be double checked in case I copied the code wrong. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317409584-23662-3-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-01 03:06:20 +08:00
#define NMI_FLAG_FIRST 1
enum {
NMI_LOCAL=0,
NMI_UNKNOWN,
NMI_SERR,
NMI_IO_CHECK,
x86, nmi: Create new NMI handler routines The NMI handlers used to rely on the notifier infrastructure. This worked great until we wanted to support handling multiple events better. One of the key ideas to the nmi handling is to process _all_ the handlers for each NMI. The reason behind this switch is because NMIs are edge triggered. If enough NMIs are triggered, then they could be lost because the cpu can only latch at most one NMI (besides the one currently being processed). In order to deal with this we have decided to process all the NMI handlers for each NMI. This allows the handlers to determine if they recieved an event or not (the ones that can not determine this will be left to fend for themselves on the unknown NMI list). As a result of this change it is now possible to have an extra NMI that was destined to be received for an already processed event. Because the event was processed in the previous NMI, this NMI gets dropped and becomes an 'unknown' NMI. This of course will cause printks that scare people. However, we prefer to have extra NMIs as opposed to losing NMIs and as such are have developed a basic mechanism to catch most of them. That will be a later patch. To accomplish this idea, I unhooked the nmi handlers from the notifier routines and created a new mechanism loosely based on doIRQ. The reason for this is the notifier routines have a couple of shortcomings. One we could't guarantee all future NMI handlers used NOTIFY_OK instead of NOTIFY_STOP. Second, we couldn't keep track of the number of events being handled in each routine (most only handle one, perf can handle more than one). Third, I wanted to eventually display which nmi handlers are registered in the system in /proc/interrupts to help see who is generating NMIs. The patch below just implements the new infrastructure but doesn't wire it up yet (that is the next patch). Its design is based on doIRQ structs and the atomic notifier routines. So the rcu stuff in the patch isn't entirely untested (as the notifier routines have soaked it) but it should be double checked in case I copied the code wrong. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317409584-23662-3-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-01 03:06:20 +08:00
NMI_MAX
};
#define NMI_DONE 0
#define NMI_HANDLED 1
typedef int (*nmi_handler_t)(unsigned int, struct pt_regs *);
x86/nmi: Fix page faults by nmiaction if kmemcheck is enabled This patch tries to fix the problem of page fault exception caused by accessing nmiaction structure in nmi if kmemcheck is enabled. If kmemcheck is enabled, the memory allocated through slab are in pages that are marked non-present, so that some checks could be done in the page fault handling code ( e.g. whether the memory is read before written to ). As nmiaction is allocated in this way, so it resides in a non-present page. Then there is a page fault while the nmi code accessing the nmiaction structure, which would then cause a warning by WARN_ON_ONCE(in_nmi()) in kmemcheck_fault(), called by do_page_fault(). This significantly simplifies the code as well, as the whole dynamic allocation dance goes away. v2: as Peter suggested, changed the nmiaction to use static storage. v3: as Peter suggested, use macro to shorten the codes. Also keep the original usage of register_nmi_handler, so users of this call doesn't need change. Tested-by: Seiji Aguchi <seiji.aguchi@hds.com> Fixes: https://lkml.org/lkml/2012/3/2/356 Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> [ simplified the wrappers ] Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: thomas.mingarelli@hp.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1333051877-15755-4-git-send-email-dzickus@redhat.com [ tidied the patch a bit ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-03-30 04:11:17 +08:00
struct nmiaction {
struct list_head list;
nmi_handler_t handler;
u64 max_duration;
struct irq_work irq_work;
unsigned long flags;
x86/nmi: Fix page faults by nmiaction if kmemcheck is enabled This patch tries to fix the problem of page fault exception caused by accessing nmiaction structure in nmi if kmemcheck is enabled. If kmemcheck is enabled, the memory allocated through slab are in pages that are marked non-present, so that some checks could be done in the page fault handling code ( e.g. whether the memory is read before written to ). As nmiaction is allocated in this way, so it resides in a non-present page. Then there is a page fault while the nmi code accessing the nmiaction structure, which would then cause a warning by WARN_ON_ONCE(in_nmi()) in kmemcheck_fault(), called by do_page_fault(). This significantly simplifies the code as well, as the whole dynamic allocation dance goes away. v2: as Peter suggested, changed the nmiaction to use static storage. v3: as Peter suggested, use macro to shorten the codes. Also keep the original usage of register_nmi_handler, so users of this call doesn't need change. Tested-by: Seiji Aguchi <seiji.aguchi@hds.com> Fixes: https://lkml.org/lkml/2012/3/2/356 Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> [ simplified the wrappers ] Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: thomas.mingarelli@hp.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1333051877-15755-4-git-send-email-dzickus@redhat.com [ tidied the patch a bit ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-03-30 04:11:17 +08:00
const char *name;
};
#define register_nmi_handler(t, fn, fg, n, init...) \
x86/nmi: Fix page faults by nmiaction if kmemcheck is enabled This patch tries to fix the problem of page fault exception caused by accessing nmiaction structure in nmi if kmemcheck is enabled. If kmemcheck is enabled, the memory allocated through slab are in pages that are marked non-present, so that some checks could be done in the page fault handling code ( e.g. whether the memory is read before written to ). As nmiaction is allocated in this way, so it resides in a non-present page. Then there is a page fault while the nmi code accessing the nmiaction structure, which would then cause a warning by WARN_ON_ONCE(in_nmi()) in kmemcheck_fault(), called by do_page_fault(). This significantly simplifies the code as well, as the whole dynamic allocation dance goes away. v2: as Peter suggested, changed the nmiaction to use static storage. v3: as Peter suggested, use macro to shorten the codes. Also keep the original usage of register_nmi_handler, so users of this call doesn't need change. Tested-by: Seiji Aguchi <seiji.aguchi@hds.com> Fixes: https://lkml.org/lkml/2012/3/2/356 Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> [ simplified the wrappers ] Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: thomas.mingarelli@hp.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1333051877-15755-4-git-send-email-dzickus@redhat.com [ tidied the patch a bit ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-03-30 04:11:17 +08:00
({ \
static struct nmiaction init fn##_na = { \
x86/nmi: Fix page faults by nmiaction if kmemcheck is enabled This patch tries to fix the problem of page fault exception caused by accessing nmiaction structure in nmi if kmemcheck is enabled. If kmemcheck is enabled, the memory allocated through slab are in pages that are marked non-present, so that some checks could be done in the page fault handling code ( e.g. whether the memory is read before written to ). As nmiaction is allocated in this way, so it resides in a non-present page. Then there is a page fault while the nmi code accessing the nmiaction structure, which would then cause a warning by WARN_ON_ONCE(in_nmi()) in kmemcheck_fault(), called by do_page_fault(). This significantly simplifies the code as well, as the whole dynamic allocation dance goes away. v2: as Peter suggested, changed the nmiaction to use static storage. v3: as Peter suggested, use macro to shorten the codes. Also keep the original usage of register_nmi_handler, so users of this call doesn't need change. Tested-by: Seiji Aguchi <seiji.aguchi@hds.com> Fixes: https://lkml.org/lkml/2012/3/2/356 Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> [ simplified the wrappers ] Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: thomas.mingarelli@hp.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1333051877-15755-4-git-send-email-dzickus@redhat.com [ tidied the patch a bit ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-03-30 04:11:17 +08:00
.handler = (fn), \
.name = (n), \
.flags = (fg), \
}; \
__register_nmi_handler((t), &fn##_na); \
x86/nmi: Fix section mismatch warnings on 32-bit It was reported that compiling for 32-bit caused a bunch of section mismatch warnings: VDSOSYM arch/x86/vdso/vdso32-syms.lds LD arch/x86/vdso/built-in.o LD arch/x86/built-in.o WARNING: arch/x86/built-in.o(.data+0x5af0): Section mismatch in reference from the variable test_nmi_ipi_callback_na.10451 to the function .init.text:test_nmi_ipi_callback() [...] WARNING: arch/x86/built-in.o(.data+0x5b04): Section mismatch in reference from the variable nmi_unk_cb_na.10399 to the function .init.text:nmi_unk_cb() The variable nmi_unk_cb_na.10399 references the function __init nmi_unk_cb() [...] Both of these are attributed to the internal representation of the nmiaction struct created during register_nmi_handler. The reason for this is that those structs are not defined in the init section whereas the rest of the code in nmi_selftest.c is. To resolve this, I created a new #define, register_nmi_handler_initonly, that tags the struct as __initdata to resolve the mismatch. This #define should only be used in rare situations where the register/unregister is called during init of the kernel. Big thanks to Jan Beulich for decoding this for me as I didn't have a clue what was going on. Reported-by: Witold Baryluk <baryluk@smp.if.uj.edu.pl> Tested-by: Witold Baryluk <baryluk@smp.if.uj.edu.pl> Cc: Jan Beulich <JBeulich@suse.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/r/1338991542-23000-1-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 22:05:42 +08:00
})
x86/nmi: Fix page faults by nmiaction if kmemcheck is enabled This patch tries to fix the problem of page fault exception caused by accessing nmiaction structure in nmi if kmemcheck is enabled. If kmemcheck is enabled, the memory allocated through slab are in pages that are marked non-present, so that some checks could be done in the page fault handling code ( e.g. whether the memory is read before written to ). As nmiaction is allocated in this way, so it resides in a non-present page. Then there is a page fault while the nmi code accessing the nmiaction structure, which would then cause a warning by WARN_ON_ONCE(in_nmi()) in kmemcheck_fault(), called by do_page_fault(). This significantly simplifies the code as well, as the whole dynamic allocation dance goes away. v2: as Peter suggested, changed the nmiaction to use static storage. v3: as Peter suggested, use macro to shorten the codes. Also keep the original usage of register_nmi_handler, so users of this call doesn't need change. Tested-by: Seiji Aguchi <seiji.aguchi@hds.com> Fixes: https://lkml.org/lkml/2012/3/2/356 Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> [ simplified the wrappers ] Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: thomas.mingarelli@hp.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1333051877-15755-4-git-send-email-dzickus@redhat.com [ tidied the patch a bit ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-03-30 04:11:17 +08:00
int __register_nmi_handler(unsigned int, struct nmiaction *);
x86, nmi: Create new NMI handler routines The NMI handlers used to rely on the notifier infrastructure. This worked great until we wanted to support handling multiple events better. One of the key ideas to the nmi handling is to process _all_ the handlers for each NMI. The reason behind this switch is because NMIs are edge triggered. If enough NMIs are triggered, then they could be lost because the cpu can only latch at most one NMI (besides the one currently being processed). In order to deal with this we have decided to process all the NMI handlers for each NMI. This allows the handlers to determine if they recieved an event or not (the ones that can not determine this will be left to fend for themselves on the unknown NMI list). As a result of this change it is now possible to have an extra NMI that was destined to be received for an already processed event. Because the event was processed in the previous NMI, this NMI gets dropped and becomes an 'unknown' NMI. This of course will cause printks that scare people. However, we prefer to have extra NMIs as opposed to losing NMIs and as such are have developed a basic mechanism to catch most of them. That will be a later patch. To accomplish this idea, I unhooked the nmi handlers from the notifier routines and created a new mechanism loosely based on doIRQ. The reason for this is the notifier routines have a couple of shortcomings. One we could't guarantee all future NMI handlers used NOTIFY_OK instead of NOTIFY_STOP. Second, we couldn't keep track of the number of events being handled in each routine (most only handle one, perf can handle more than one). Third, I wanted to eventually display which nmi handlers are registered in the system in /proc/interrupts to help see who is generating NMIs. The patch below just implements the new infrastructure but doesn't wire it up yet (that is the next patch). Its design is based on doIRQ structs and the atomic notifier routines. So the rcu stuff in the patch isn't entirely untested (as the notifier routines have soaked it) but it should be double checked in case I copied the code wrong. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317409584-23662-3-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-01 03:06:20 +08:00
void unregister_nmi_handler(unsigned int, const char *);
void stop_nmi(void);
void restart_nmi(void);
x86, nmi: Add in logic to handle multiple events and unknown NMIs Previous patches allow the NMI subsystem to process multipe NMI events in one NMI. As previously discussed this can cause issues when an event triggered another NMI but is processed in the current NMI. This causes the next NMI to go unprocessed and become an 'unknown' NMI. To handle this, we first have to flag whether or not the NMI handler handled more than one event or not. If it did, then there exists a chance that the next NMI might be already processed. Once the NMI is flagged as a candidate to be swallowed, we next look for a back-to-back NMI condition. This is determined by looking at the %rip from pt_regs. If it is the same as the previous NMI, it is assumed the cpu did not have a chance to jump back into a non-NMI context and execute code and instead handled another NMI. If both of those conditions are true then we will swallow any unknown NMI. There still exists a chance that we accidentally swallow a real unknown NMI, but for now things seem better. An optimization has also been added to the nmi notifier rountine. Because x86 can latch up to one NMI while currently processing an NMI, we don't have to worry about executing _all_ the handlers in a standalone NMI. The idea is if multiple NMIs come in, the second NMI will represent them. For those back-to-back NMI cases, we have the potentail to drop NMIs. Therefore only execute all the handlers in the second half of a detected back-to-back NMI. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317409584-23662-5-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-01 03:06:22 +08:00
void local_touch_nmi(void);
#endif /* _ASM_X86_NMI_H */