linux

Commit Graph

Author	SHA1	Message	Date
Borislav Petkov	3490c0e45f	x86/mce/amd: Zap changelog It is useless and git history has it all detailed anyway. Update copyright while at it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>	2015-05-07 12:06:43 +02:00
Aravind Gopalakrishnan	868c00bb59	x86/mce/amd: Rename setup_APIC_mce 'setup_APIC_mce' doesn't give us an indication of why we are going to program LVT. Make that explicit by renaming it to setup_APIC_mce_threshold so we know. No functional change is introduced. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1430913538-1415-7-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-05-07 10:33:40 +02:00
Aravind Gopalakrishnan	24fd78a81f	x86/mce/amd: Introduce deferred error interrupt handler Deferred errors indicate error conditions that were not corrected, but require no action from S/W (or action is optional).These errors provide info about a latent UC MCE that can occur when a poisoned data is consumed by the processor. Processors that report these errors can be configured to generate APIC interrupts to notify OS about the error. Provide an interrupt handler in this patch so that OS can catch these errors as and when they happen. Currently, we simply log the errors and exit the handler as S/W action is not mandated. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1430913538-1415-5-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-05-07 10:23:32 +02:00
Aravind Gopalakrishnan	7559e13fb4	x86/mce: Add support for deferred errors on AMD Deferred errors indicate error conditions that were not corrected, but those errors have not been consumed yet. They require no action from S/W (or action is optional). These errors provide info about a latent uncorrectable MCE that can occur when a poisoned data is consumed by the processor. Newer AMD processors can generate deferred errors and can be configured to generate APIC interrupts on such events. SUCCOR stands for S/W UnCorrectable error COntainment and Recovery. It indicates support for data poisoning in HW and deferred error interrupts. Add new bitfield to mce_vendor_flags for this. We use this to verify presence of deferred error interrupts before we enable them in mce_amd.c While at it, clarify comments in mce_vendor_flags to provide an indication of usages of the bitfields. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1430913538-1415-4-git-send-email-Aravind.Gopalakrishnan@amd.com [ beef up commit message, do CPUID(8000_0007) only once. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2015-05-06 20:34:31 +02:00
Aravind Gopalakrishnan	6e6e746e33	x86/mce/amd: Collect valid address before logging an error amd_decode_mce() needs value in m->addr so it can report the error address correctly. This should be setup in __log_error() before we call mce_log(). We do this because the error address is an important bit of information which should be conveyed to userspace. The correct output then reports proper address, like this: [Hardware Error]: Corrected error, no action required. [Hardware Error]: CPU:0 (15:60:0) MC0_STATUS [-\|CE\|-\|-\|AddrV\|-\|-\|CECC]: 0x840041000028017b [Hardware Error]: MC0 Error Address: 0x00001f808f0ff040 Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1430913538-1415-3-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-05-06 19:49:31 +02:00
Aravind Gopalakrishnan	afdf344e08	x86/mce/amd: Factor out logging mechanism Refactor the code here to setup struct mce and call mce_log() to log the error. We're going to reuse this in a later patch as part of the deferred error interrupt enablement. No functional change is introduced. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1430913538-1415-2-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-05-06 19:49:20 +02:00
Linus Torvalds	07f2d8c63f	Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS changes from Ingo Molnar: "The main changes in this cycle were: - Simplify the CMCI storm logic on Intel CPUs after yet another report about a race in the code (Borislav Petkov) - Enable the MCE threshold irq on AMD CPUs by default (Aravind Gopalakrishnan) - Add AMD-specific MCE-severity grading function. Further error recovery actions will be based on its output (Aravind Gopalakrishnan) - Documentation updates (Borislav Petkov) - ... assorted fixes and cleanups" * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce/severity: Fix warning about indented braces x86/mce: Define mce_severity function pointer x86/mce: Add an AMD severities-grading function x86/mce: Reindent __mcheck_cpu_apply_quirks() properly x86/mce: Use safe MSR accesses for AMD quirk x86/MCE/AMD: Enable thresholding interrupts by default if supported x86/MCE: Make mce_panic() fatal machine check msg in the same pattern x86/MCE/intel: Cleanup CMCI storm logic Documentation/acpi/einj: Correct and streamline text x86/MCE/AMD: Drop bogus const modifier from AMD's bank4_names()	2015-04-13 13:33:20 -07:00
Aravind Gopalakrishnan	cee8f5a6c8	x86/mce/severity: Fix warning about indented braces Dan reported compiler warnings about missing curly braces in mce_severity_amd(). Reindent the catch-all "return MCE_AR_SEVERITY" correctly to single tab. While at it, chain ctx == IN_KERNEL check with mcgstatus check to make it cleaner, as suggested by Boris. No functional changes are introduced by this patch. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1427814281-18192-1-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-03 15:20:38 +02:00
Aravind Gopalakrishnan	43eaa2a1ad	x86/mce: Define mce_severity function pointer Rename mce_severity() to mce_severity_intel() and assign the mce_severity function pointer to mce_severity_amd() during init on AMD. This way, we can avoid a test to call mce_severity_amd every time we get into mce_severity(). And it's cleaner to do it this way. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Suggested-by: Tony Luck <tony.luck@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1427125373-2918-3-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-03-24 12:14:15 +01:00
Aravind Gopalakrishnan	bf80bbd7dc	x86/mce: Add an AMD severities-grading function Add a severities function that caters to AMD processors. This allows us to do some vendor-specific work within the function if necessary. Also, introduce a vendor flag bitfield for vendor-specific settings. The severities code uses this to define error scope based on the prescence of the flags field. This is based off of work by Boris Petkov. Testing details: Fam10h, Model 9h (Greyhound) Fam15h: Models 0h-0fh (Orochi), 30h-3fh (Kaveri) and 60h-6fh (Carrizo), Fam16h Model 00h-0fh (Kabini) Boris: Intel SNB AMD K8 (JH-E0) Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/1427125373-2918-2-git-send-email-Aravind.Gopalakrishnan@amd.com [ Fixup build, clean up comments. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2015-03-24 12:13:34 +01:00
Borislav Petkov	c9ce871283	x86/mce: Reindent __mcheck_cpu_apply_quirks() properly Had some strange 3 tabs + 2 chars indentation, probably from me. Fix it. No code changed: # arch/x86/kernel/cpu/mcheck/mce.o: text data bss dec hex filename 21371 5923 264 27558 6ba6 mce.o.before 21371 5923 264 27558 6ba6 mce.o.after md5: eb3996c84d15e08ed836f043df2cbb01 mce.o.before.asm eb3996c84d15e08ed836f043df2cbb01 mce.o.after.asm Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-23 10:16:44 +01:00
Jesse Larrew	f77ac507f8	x86/mce: Use safe MSR accesses for AMD quirk Certain MSRs are only relevant to a kernel in host mode, and kvm had chosen not to implement these MSRs at all for guests. If a guest kernel ever tried to access these MSRs, the result was a general protection fault. KVM will be separately patched to return 0 when these MSRs are read, and this patch ensures that MSR accesses are tolerant of exceptions. Signed-off-by: Jesse Larrew <jesse.larrew@amd.com> [ Drop {} braces around loop ] Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Joel Schopp <joel.schopp@amd.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/1426262619-5016-1-git-send-email-jesse.larrew@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-23 10:16:43 +01:00
Ingo Molnar	fa45a45ca3	Merge tag 'ras_for_3.21' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull RAS updates from Borislav Petkov: "- Enable AMD thresholding IRQ by default if supported. (Aravind Gopalakrishnan) - Unify mce_panic() message pattern. (Derek Che) - A bit more involved simplification of the CMCI logic after yet another report about race condition with the adaptive logic. (Borislav Petkov) - ACPI APEI EINJ fleshing out of the user documentation. (Borislav Petkov) - Minor cleanup. (Jan Beulich.)" Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-19 13:31:33 +01:00
Aravind Gopalakrishnan	d79f931f1c	x86/MCE/AMD: Enable thresholding interrupts by default if supported We setup APIC vectors for threshold errors if interrupt_capable. However, we don't set interrupt_enable by default. Rework threshold_restart_bank() so that when we set up lvt_offset, we also set IntType to APIC and also enable thresholding interrupts for banks which support it by default. User is still allowed to disable interrupts through sysfs. While at it, check if status is valid before we proceed to log error using mce_log. This is because, in multi-node platforms, only the NBC (Node Base Core, i.e. the first core in the node) has valid status info in its MCA registers. So, the decoding of status values on the non-NBC leads to noise on kernel logs like so: EDAC DEBUG: amd64_inject_write_store: section=0x80000000 word_bits=0x10020001 [Hardware Error]: Corrected error, no action required. [Hardware Error]: CPU:25 (15:2:0) MC4_STATUS[-\|CE\|-\|-\|- [Hardware Error]: Corrected error, no action required. [Hardware Error]: CPU:26 (15:2:0) MC4_STATUS[-\|CE\|-\|-\|- <...> WARNING: CPU: 25 PID: 0 at drivers/edac/amd64_edac.c:2147 decode_bus_error+0x1ba/0x2a0() WARNING: CPU: 26 PID: 0 at drivers/edac/amd64_edac.c:2147 decode_bus_error+0x1ba/0x2a0() Something is rotten in the state of Denmark. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1422896561-7695-1-git-send-email-aravind.gopalakrishnan@amd.com [ Massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 13:24:47 +01:00
Derek Che	8af7043a3c	x86/MCE: Make mce_panic() fatal machine check msg in the same pattern There is another mce_panic call with "Fatal machine check on current CPU" in the same mce.c file, why not keep them all in same pattern mce_panic("Fatal machine check on current CPU", &m, msg); Signed-off-by: Derek Che <drc@yahoo-inc.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 13:24:47 +01:00
Borislav Petkov	3f2f0680d1	x86/MCE/intel: Cleanup CMCI storm logic Initially, this started with the yet another report about a race condition in the CMCI storm adaptive period length thing. Yes, we have to admit, it is fragile and error prone. So let's simplify it. The simpler logic is: now, after we enter storm mode, we go straight to polling with CMCI_STORM_INTERVAL, i.e. once a second. We remain in storm mode as long as we see errors being logged while polling. Theoretically, if we see an uninterrupted error stream, we will remain in storm mode indefinitely and keep polling the MSRs. However, when the storm is actually a burst of errors, once we have logged them all, we back out of it after ~5 mins of polling and no more errors logged. If we encounter an error during those 5 minutes, we reset the polling interval to 5 mins. Making machine_check_poll() return a bool and denoting whether it has seen an error or not lets us simplify a bunch of code and move the storm handling private to mce_intel.c. Some minor cleanups while at it. Reported-by: Calvin Owens <calvinowens@fb.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1417746575-23299-1-git-send-email-calvinowens@fb.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 13:24:25 +01:00
Jan Beulich	2cd4c303a7	x86/MCE/AMD: Drop bogus const modifier from AMD's bank4_names() The compiler validly warns about it being ignored. Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/54C21511020000780005890E@mail.emea.novell.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-19 12:30:47 +01:00
Linus Torvalds	d96c757efa	Fix regression - functions on the mce notifier chain should not be able to decide that an event should not be logged -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU47aGAAoJEKurIx+X31iBiX4P+wfq7uUKwQ4riD2jFppvhrcm W2Qx/iIv9QN77ZIw5I45VqGbDKXmlThl41ISem9BlKd8jKaldY3lUlQMfrPC5V11 9bl/7LsZoQLlbuwYR6uiLdKqW9wd6d5Y1mczdSDM5wCtfMw/s+C/ETzuRVHsQZjF 1LTB0rb0NPouX+y3D8aDrvk9Os5ozZsz3N/y6e3TsI/wV8d3rqwH8C8x3RjB2Evx 3WRSwoSOq9kHEbeg1r7PMKYKWAoJs97Kwo4EgJELqn8fxYMWnSsoDZGr9P2PX8oT TKgSFPnhgCLw+qlWy81MM8hutnHKnN6oXcJKzE0nHtD8JlJ/M/HdDAPIg8G9aLIn ABxPg6OORs/4YJQYGFA8ixx3TfIMspMU2m9KGoCcerpGaHCHhrlylJyheUvhRkPP u8pjGz+31d3bVVRzCLJt1eqo3H/y0wcURWaemk23lcUIsdDqisjZDzZrZxyZuWaH eDTKmHsZB/I4wnOs4Ke+U7oo/u+NtBzPmBSJcshgKSONLPd7bSJtjckLoa3wSf5I q5DkZgxrUYkO6tIoAAi/N2tc/2qkjTOug79BP9YKN3elmv3nxW4gieSaUb5p16bd ORpZLt3SDKEPv+79a/1e7ZyR7ik9Evhzc+/72M1IZvmdr3uOjj+xkuE1uRU7vuvX 44Jmx6mEtPK1mVBfwkdG =jAUs -----END PGP SIGNATURE----- Merge tag 'please-pull-fixmcelog' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull mcelog regression fix from Tony Luck: "Fix regression - functions on the mce notifier chain should not be able to decide that an event should not be logged" * tag 'please-pull-fixmcelog' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: x86/mce: Fix regression. All error records should report via /dev/mcelog	2015-02-17 17:03:07 -08:00
Linus Torvalds	37507717de	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf updates from Ingo Molnar: "This series tightens up RDPMC permissions: currently even highly sandboxed x86 execution environments (such as seccomp) have permission to execute RDPMC, which may leak various perf events / PMU state such as timing information and other CPU execution details. This 'all is allowed' RDPMC mode is still preserved as the (non-default) /sys/devices/cpu/rdpmc=2 setting. The new default is that RDPMC access is only allowed if a perf event is mmap-ed (which is needed to correctly interpret RDPMC counter values in any case). As a side effect of these changes CR4 handling is cleaned up in the x86 code and a shadow copy of the CR4 value is added. The extra CR4 manipulation adds ~ <50ns to the context switch cost between rdpmc-capable and rdpmc-non-capable mms" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks perf/x86: Only allow rdpmc if a perf_event is mapped perf: Pass the event to arch_perf_update_userpage() perf: Add pmu callbacks to track event mapping and unmapping x86: Add a comment clarifying LDT context switching x86: Store a per-cpu shadow copy of CR4 x86: Clean up cr4 manipulation	2015-02-16 14:58:12 -08:00
Linus Torvalds	e07e0d4cb0	Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS update from Ingo Molnar: "The changes in this cycle were: - allow mmcfg access to APEI error injection handlers - improve MCE error messages - smaller cleanups" * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, mce: Fix sparse errors x86, mce: Improve timeout error messages ACPI, EINJ: Enhance error injection tolerance level	2015-02-09 18:22:04 -08:00
Tony Luck	a2413d8b29	x86/mce: Fix regression. All error records should report via /dev/mcelog I'm getting complaints from validation teams that have updated their Linux kernels from ancient versions to current. They don't see the error logs they expect. I tell the to unload any EDAC drivers[1], and things start working again. The problem is that we short-circuit the logging process if any function on the decoder chain claims to have dealt with the problem: ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m); if (ret == NOTIFY_STOP) return; The logic we used when we added this code was that we did not want to confuse users with double reports of the same error. But it turns out users are not confused - they are upset that they don't see a log where their tools used to find a log. I could also get into a long description of how the consumer of this log does more than just decode model specific details of the error. It keeps counts, tracks thresholds, takes actions and runs scripts that can alert administrators to problems. [1] We've recently compounded the problem because the acpi_extlog driver also registers for this notifier and also returns NOTIFY_STOP. Signed-off-by: Tony Luck <tony.luck@intel.com>	2015-02-09 09:36:53 -08:00
Andy Lutomirski	375074cc73	x86: Clean up cr4 manipulation CR4 manipulation was split, seemingly at random, between direct (write_cr4) and using a helper (set/clear_in_cr4). Unfortunately, the set_in_cr4 and clear_in_cr4 helpers also poke at the boot code, which only a small subset of users actually wanted. This patch replaces all cr4 access in functions that don't leave cr4 exactly the way they found it with new helpers cr4_set_bits, cr4_clear_bits, and cr4_set_bits_and_update_boot. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/495a10bdc9e67016b8fd3945700d46cfd5c12c2f.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-02-04 12:10:41 +01:00
Luck, Tony	d4812e169d	x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks We now switch to the kernel stack when a machine check interrupts during user mode. This means that we can perform recovery actions in the tail of do_machine_check() Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>	2015-01-07 07:47:42 -08:00
Andy Lutomirski	9592747538	x86, traps: Track entry into and exit from IST context We currently pretend that IST context is like standard exception context, but this is incorrect. IST entries from userspace are like standard exceptions except that they use per-cpu stacks, so they are atomic. IST entries from kernel space are like NMIs from RCU's perspective -- they are not quiescent states even if they interrupted the kernel during a quiescent state. Add and use ist_enter and ist_exit to track IST context. Even though x86_32 has no IST stacks, we track these interrupts the same way. This fixes two issues: - Scheduling from an IST interrupt handler will now warn. It would previously appear to work as long as we got lucky and nothing overwrote the stack frame. (I don't know of any bugs in this that would trigger the warning, but it's good to be on the safe side.) - RCU handling in IST context was dangerous. As far as I know, only machine checks were likely to trigger this, but it's good to be on the safe side. Note that the machine check handlers appears to have been missing any context tracking at all before this patch. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>	2015-01-02 10:22:46 -08:00
Borislav Petkov	83737691e5	x86, mce: Fix sparse errors Make stuff used in mce.c only, static. Signed-off-by: Borislav Petkov <bp@suse.de>	2014-12-22 21:04:31 +01:00
Andy Lutomirski	6c80f87ed4	x86, mce: Improve timeout error messages There are four different possible types of timeouts. Distinguish them in the logs to help debug them. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/0fa6d2653a54a01c48b43a3583caf950ea99606e.1419178397.git.luto@amacapital.net Signed-off-by: Borislav Petkov <bp@suse.de>	2014-12-22 17:47:45 +01:00
Borislav Petkov	c7c9b3929b	x86/mce: Spell "panicked" correctly We need the additional "k" to make it a hard-c: https://en.wiktionary.org/wiki/panicked Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1417642605-15730-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-12-08 11:12:46 +01:00
Chen Yucong	fa92c58694	x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll Uncorrected no action required (UCNA) - is a uncorrected recoverable machine check error that is not signaled via a machine check exception and, instead, is reported to system software as a corrected machine check error. UCNA errors indicate that some data in the system is corrupted, but the data has not been consumed and the processor state is valid and you may continue execution on this processor. UCNA errors require no action from system software to continue execution. Note that UCNA errors are supported by the processor only when IA32_MCG_CAP[24] (MCG_SER_P) is set. -- Intel SDM Volume 3B Deferred errors are errors that cannot be corrected by hardware, but do not cause an immediate interruption in program flow, loss of data integrity, or corruption of processor state. These errors indicate that data has been corrupted but not consumed. Hardware writes information to the status and address registers in the corresponding bank that identifies the source of the error if deferred errors are enabled for logging. Deferred errors are not reported via machine check exceptions; they can be seen by polling the MCi_STATUS registers. -- AMD64 APM Volume 2 Above two items, both UCNA and Deferred errors belong to detected errors, but they can't be corrected by hardware, and this is very similar to Software Recoverable Action Optional (SRAO) errors. Therefore, we can take some actions that have been used for handling SRAO errors to handle UCNA and Deferred errors. Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Chen Yucong <slaoub@gmail.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2014-11-19 10:56:51 -08:00
Chen Yucong	e3480271f5	x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error Until now, the mce_severity mechanism can only identify the severity of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter out DEFERRED error for AMD platform. This patch extends the mce_severity mechanism for handling UCNA/DEFERRED error. In order to do this, the patch introduces a new severity level - MCE_UCNA/DEFERRED_SEVERITY. In addition, mce_severity is specific to machine check exception, and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity mechanism in non-exception context, the patch also introduces a new argument (is_excp) for mce_severity. `is_excp' is used to explicitly specify the calling context of mce_severity. Reviewed-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Chen Yucong <slaoub@gmail.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2014-11-19 10:55:43 -08:00
Chen Yucong	8dcf32ea22	x86, MCE, AMD: Assign interrupt handler only when bank supports it There are some AMD CPU models which have thresholding banks but which cannot generate a thresholding interrupt. This is denoted by the bit MCi_MISC[IntP]. Make sure to check that bit before assigning the thresholding interrupt handler. Signed-off-by: Chen Yucong <slaoub@gmail.com> [ Boris: save an indentation level and rewrite commit message. ] Link: http://lkml.kernel.org/r/1412662128.28440.18.camel@debian Signed-off-by: Borislav Petkov <bp@suse.de>	2014-11-01 11:28:23 +01:00
Borislav Petkov	a3a529d104	x86, MCE, AMD: Drop software-defined bank in error thresholding Aravind had the good question about why we're assigning a software-defined bank when reporting error thresholding errors instead of simply using the bank which reports the last error causing the overflow. Digging through git history, it pointed to `9526866439` ("[PATCH] x86_64: mce_amd support for family 0x10 processors") which added that functionality. The problem with this, however, is that tools don't know about software-defined banks and get puzzled. So drop that K8_MCE_THRESHOLD_BASE and simply use the hw bank reporting the thresholding interrupt. Save us a couple of MSR reads while at it. Reported-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Link: https://lkml.kernel.org/r/5435B206.60402@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2014-10-21 22:28:48 +02:00
Chen Yucong	69b9575835	x86, MCE, AMD: Move invariant code out from loop body Assigning to mce_threshold_vector is loop-invariant code in mce_amd_feature_init(). So do it only once, out of loop body. Signed-off-by: Chen Yucong <slaoub@gmail.com> Link: http://lkml.kernel.org/r/1412263212.8085.6.camel@debian [ Boris: commit message corrections. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2014-10-21 22:12:56 +02:00
Chen Yucong	44612a3ac6	x86, MCE, AMD: Correct thresholding error logging mce_setup() does not gather the content of IA32_MCG_STATUS, so it should be read explicitly. Moreover, we need to clear IA32_MCx_STATUS to avoid that mce_log() logs the processed threshold event again at next time. But we do the logging ourselves and machine_check_poll() is completely useless there. So kill it. Signed-off-by: Chen Yucong <slaoub@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2014-10-21 22:12:22 +02:00
Chen Yucong	4b737d78a8	x86, MCE, AMD: Use macros to compute bank MSRs Avoid open coded calculations for bank MSRs by hiding the index of higher bank MSRs in well-defined macros. No semantic changes. Signed-off-by: Chen Yucong <slaoub@gmail.com> Link: http://lkml.kernel.org/r/1411438561-24319-1-git-send-email-slaoub@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>	2014-10-21 22:07:24 +02:00
Linus Torvalds	0429fbc0bd	Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu consistent-ops changes from Tejun Heo: "Way back, before the current percpu allocator was implemented, static and dynamic percpu memory areas were allocated and handled separately and had their own accessors. The distinction has been gone for many years now; however, the now duplicate two sets of accessors remained with the pointer based ones - this_cpu_() - evolving various other operations over time. During the process, we also accumulated other inconsistent operations. This pull request contains Christoph's patches to clean up the duplicate accessor situation. __get_cpu_var() uses are replaced with with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr(). Unfortunately, the former sometimes is tricky thanks to C being a bit messy with the distinction between lvalues and pointers, which led to a rather ugly solution for cpumask_var_t involving the introduction of this_cpu_cpumask_var_ptr(). This converts most of the uses but not all. Christoph will follow up with the remaining conversions in this merge window and hopefully remove the obsolete accessors" 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits) irqchip: Properly fetch the per cpu offset percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write. percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t Revert "powerpc: Replace __get_cpu_var uses" percpu: Remove __this_cpu_ptr clocksource: Replace __this_cpu_ptr with raw_cpu_ptr sparc: Replace __get_cpu_var uses avr32: Replace __get_cpu_var with __this_cpu_write blackfin: Replace __get_cpu_var uses tile: Use this_cpu_ptr() for hardware counters tile: Replace __get_cpu_var uses powerpc: Replace __get_cpu_var uses alpha: Replace __get_cpu_var ia64: Replace __get_cpu_var uses s390: cio driver &__get_cpu_var replacements s390: Replace __get_cpu_var uses mips: Replace __get_cpu_var uses MIPS: Replace __get_cpu_var uses in FPU emulator. arm: Replace __this_cpu_ptr with raw_cpu_ptr ...	2014-10-15 07:48:18 +02:00
Rakib Mullick	d286c3af48	x86/mce: Avoid showing repetitive message from intel_init_thermal() intel_init_thermal() is called from a) at the time of system initializing and b) at the time of system resume to initialize thermal monitoring. In case when thermal monitoring is handled by SMI, we get to know it via printk(). Currently it gives the message at both cases, but its okay if we get it only once and no need to get the same message at every time system resumes. So, limit showing this message only at system boot time by avoid showing at system resume and reduce abusing kernel log buffer. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1411068135.5121.10.camel@localhost.localdomain Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-09-19 12:56:05 +02:00
Christoph Lameter	89cbc76768	x86: Replace __get_cpu_var uses __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : #define __get_cpu_var(var) (this_cpu_ptr(&(var))) __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int x = &__get_cpu_var(y); Converts to int x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Acked-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-08-26 13:45:49 -04:00
Daniel Walter	164109e3cd	arch/x86: replace strict_strto calls Replace obsolete strict_strto calls with appropriate kstrto calls Signed-off-by: Daniel Walter <dwalter@google.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-08 15:57:28 -07:00
Thomas Gleixner	ed5c41d30e	x86: MCE: Add raw_lock conversion again Commit `ea431643d6` ("x86/mce: Fix CMCI preemption bugs") breaks RT by the completely unrelated conversion of the cmci_discover_lock to a regular (non raw) spinlock. This lock was annotated in commit `59d958d2c7` ("locking, x86: mce: Annotate cmci_discover_lock as raw") with a proper explanation why. The argument for converting the lock back to a regular spinlock was: - it does percpu ops without disabling preemption. Preemption is not disabled due to the mistaken use of a raw spinlock. Which is complete nonsense. The raw_spinlock is disabling preemption in the same way as a regular spinlock. In mainline spinlock maps to raw_spinlock, in RT spinlock becomes a "sleeping" lock. raw_spinlock has on RT exactly the same semantics as in mainline. And because this lock is taken in non preemptible context it must be raw on RT. Undo the locking brainfart. Reported-by: Clark Williams <williams@redhat.com> Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-05 17:34:33 -07:00
Borislav Petkov	27c934158c	x86, MCE: Robustify mcheck_init_device BorisO reports that misc_register() fails often on xen. The current code unregisters the CPU hotplug notifier in that case. If then a CPU is offlined and onlined back again, we end up with a second timer running on that CPU, leading to soft lockups and system hangs. So let's leave the hotcpu notifier always registered - even if mce_device_create failed for some cores and never unreg it so that we can deal with the timer handling accordingly. Reported-and-Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1403274493-1371-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Borislav Petkov <bp@suse.de>	2014-06-24 15:17:01 +02:00
Borislav Petkov	38356c1fbd	x86, MCE: Kill CPU_POST_DEAD In conjunction with cleaning up CPU hotplug, we want to get rid of CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the end of CPU_DEAD. Link: http://lkml.kernel.org/r/http://lkml.kernel.org/r/1400750624-19238-1-git-send-email-bp@alien8.de Signed-off-by: Borislav Petkov <bp@suse.de>	2014-06-22 18:36:39 +02:00
Chen Yucong	65eb71823b	hwpoison: remove unused global variable in do_machine_check() Remove an unused global variable mce_entry and relative operations in do_machine_check(). Signed-off-by: Chen Yucong <slaoub@gmail.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-04 16:54:11 -07:00
Borislav Petkov	716079f66e	mce: Panic when a core has reached a timeout There is very little and maybe practically nothing we can do to recover from a system where at least one core has reached a timeout during the whole monarch cores gathering. So panic when that happens. Link: http://lkml.kernel.org/r/20140523091041.GA21332@pd.tnic Signed-off-by: Borislav Petkov <bp@suse.de>	2014-05-30 22:05:31 +02:00
Mathieu Souchaud	9c15a24b03	x86/mce: Improve mcheck_init_device() error handling Check return code of every function called by mcheck_init_device(). Signed-off-by: Mathieu Souchaud <mattieu.souchaud@free.fr> Link: http://lkml.kernel.org/r/1399151031-19905-1-git-send-email-mattieu.souchaud@free.fr Signed-off-by: Borislav Petkov <bp@suse.de>	2014-05-30 22:01:40 +02:00
Andi Kleen	2605fc216f	asmlinkage, x86: Add explicit __visible to arch/x86/* As requested by Linus add explicit __visible to the asmlinkage users. This marks all functions visible to assembler. Tree sweep for arch/x86/* Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1398984278-29319-3-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-05 16:07:44 -07:00
Ingo Molnar	ea431643d6	x86/mce: Fix CMCI preemption bugs The following commit: `27f6c573e0` ("x86, CMCI: Add proper detection of end of CMCI storms") Added two preemption bugs: - machine_check_poll() does a get_cpu_var() without a matching put_cpu_var(), which causes preemption imbalance and crashes upon bootup. - it does percpu ops without disabling preemption. Preemption is not disabled due to the mistaken use of a raw spinlock. To fix these bugs fix the imbalance and change cmci_discover_lock to a regular spinlock. Reported-by: Owen Kibel <qmewlo@gmail.com> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Chen, Gong <gong.chen@linux.intel.com> Cc: Josh Boyer <jwboyer@fedoraproject.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Todorov <atodorov@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/n/tip-jtjptvgigpfkpvtQxpEk1at2@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> -- arch/x86/kernel/cpu/mcheck/mce.c \| 4 +--- arch/x86/kernel/cpu/mcheck/mce_intel.c \| 18 +++++++++--------- 2 files changed, 10 insertions(+), 12 deletions(-)	2014-04-17 10:28:42 +02:00
Linus Torvalds	8eab6cd031	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "This is a collection of minor fixes for x86, plus the IRET information leak fix (forbid the use of 16-bit segments in 64-bit mode)" NOTE! We may have to relax the "forbid the use of 16-bit segments in 64-bit mode" part, since there may be people who still run and depend on 16-bit Windows binaries under Wine. But I'm taking this in the current unconditional form for now to see who (if anybody) screams bloody murder. Maybe nobody cares. And maybe we'll have to update it with some kind of runtime enablement (like our vm.mmap_min_addr tunable that people who run dosemu/qemu/wine already need to tweak). * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels efi: Pass correct file handle to efi_file_{read,close} x86/efi: Correct EFI boot stub use of code32_start x86/efi: Fix boot failure with EFI stub x86/platform/hyperv: Handle VMBUS driver being a module x86/apic: Reinstate error IRQ Pentium erratum 3AP workaround x86, CMCI: Add proper detection of end of CMCI storms	2014-04-11 11:58:33 -07:00
Chen, Gong	27f6c573e0	x86, CMCI: Add proper detection of end of CMCI storms When CMCI storm persists for a long time(at least beyond predefined threshold. It's 30 seconds for now), we can watch CMCI storm is detected immediately after it subsides. ... Dec 10 22:04:29 kernel: CMCI storm detected: switching to poll mode Dec 10 22:04:59 kernel: CMCI storm subsided: switching to interrupt mode Dec 10 22:04:59 kernel: CMCI storm detected: switching to poll mode Dec 10 22:05:29 kernel: CMCI storm subsided: switching to interrupt mode ... The problem is that our logic that determines that the storm has ended is incorrect. We announce the end, re-enable interrupts and realize that the storm is still going on, so we switch back to polling mode. Rinse, repeat. When a storm happens we disable signaling of errors via CMCI and begin polling machine check banks instead. If we find any logged errors, then we need to set a per-cpu flag so that our per-cpu tests that check whether the storm is ongoing will see that errors are still being logged independently of whether mce_notify_irq() says that the error has been fully processed. cmci_clear() is not the right tool to disable a bank. It disables the interrupt for the bank as desired, but it also clears the bit for this bank in "mce_banks_owned" so we will skip the bank when polling (so we fail to see that the storm continues because we stop looking). New cmci_storm_disable_banks() just disables the interrupt while allowing polling to continue. Reported-by: William Dauchy <wdauchy@gmail.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2014-03-28 13:40:16 -07:00
Srivatsa S. Bhat	7b7139d4ab	x86, therm_throt.c: Remove unused therm_cpu_lock After fixing the CPU hotplug callback registration code, the callbacks invoked for each online CPU, during the initialization phase in thermal_throttle_init_device(), can no longer race with the actual CPU hotplug notifier callbacks (in thermal_throttle_cpu_callback). Hence the therm_cpu_lock is unnecessary now. Remove it. Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-20 13:43:43 +01:00
Srivatsa S. Bhat	4e6192bbec	x86, therm_throt.c: Fix CPU hotplug callback registration Subsystems that want to register CPU hotplug callbacks, as well as perform initialization for the CPUs that are already online, often do it as shown below: get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); This is wrong, since it is prone to ABBA deadlocks involving the cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently with CPU hotplug operations). Instead, the correct and race-free way of performing the callback registration is: cpu_notifier_register_begin(); for_each_online_cpu(cpu) init_cpu(cpu); /* Note the use of the double underscored version of the API */ __register_cpu_notifier(&foobar_cpu_notifier); cpu_notifier_register_done(); Fix the thermal throttle code in x86 by using this latter form of callback registration. Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-20 13:43:42 +01:00
Srivatsa S. Bhat	82a8f131aa	x86, mce: Fix CPU hotplug callback registration Subsystems that want to register CPU hotplug callbacks, as well as perform initialization for the CPUs that are already online, often do it as shown below: get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); This is wrong, since it is prone to ABBA deadlocks involving the cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently with CPU hotplug operations). Instead, the correct and race-free way of performing the callback registration is: cpu_notifier_register_begin(); for_each_online_cpu(cpu) init_cpu(cpu); /* Note the use of the double underscored version of the API */ __register_cpu_notifier(&foobar_cpu_notifier); cpu_notifier_register_done(); Fix the mce code in x86 by using this latter form of callback registration. Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-20 13:43:42 +01:00
Linus Torvalds	fab5669d55	Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS changes from Ingo Molnar: - SCI reporting for other error types not only correctable ones - GHES cleanups - Add the functionality to override error reporting agents as some machines are sporting a new extended error logging capability which, if done properly in the BIOS, makes a corresponding EDAC module redundant - PCIe AER tracepoint severity levels fix - Error path correction for the mce device init - MCE timer fix - Add more flexibility to the error injection (EINJ) debugfs interface * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, mce: Fix mce_start_timer semantics ACPI, APEI, GHES: Cleanup ghes memory error handling ACPI, APEI: Cleanup alignment-aware accesses ACPI, APEI, GHES: Do not report only correctable errors with SCI ACPI, APEI, EINJ: Changes to the ACPI/APEI/EINJ debugfs interface ACPI, eMCA: Combine eMCA/EDAC event reporting priority EDAC, sb_edac: Modify H/W event reporting policy EDAC: Add an edac_report parameter to EDAC PCI, AER: Fix severity usage in aer trace event x86, mce: Call put_device on device_register failure	2014-01-20 12:10:27 -08:00
Borislav Petkov	4f75d84127	x86, mce: Fix mce_start_timer semantics So mce_start_timer() has a 'cpu' argument which is supposed to mean to start a timer on that cpu. However, the code currently starts a timer on the current cpu the function runs on and causes the sanity-check in mce_timer_fn to fire: WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/mcheck/mce.c:1286 mce_timer_fn because it is running on the wrong cpu. This was triggered by Prarit Bhargava <prarit@redhat.com> by offlining all the cpus in succession. Then, we were fiddling with the CMCI storm settings when starting the timer whereas there's no need for that - if there's storm happening on this newly restarted cpu, we're going to be in normal CMCI mode initially and then when the CMCI interrupt starts firing, we're going to go to the polling mode with the timer real soon. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Prarit Bhargava <prarit@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Reviewed-by: Chen, Gong <gong.chen@linux.intel.com> Link: http://lkml.kernel.org/r/1387722156-5511-1-git-send-email-prarit@redhat.com	2014-01-12 15:22:25 +01:00
Paul Gortmaker	663b55b9b3	x86: Delete non-required instances of include <linux/init.h> None of these files are actually using any __init type directives and hence don't need to include <linux/init.h>. Most are just a left over from __devinit and __cpuinit removal, or simply due to code getting copied from one driver to the next. [ hpa: undid incorrect removal from arch/x86/kernel/head_32.S ] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Link: http://lkml.kernel.org/r/1389054026-12947-1-git-send-email-paul.gortmaker@windriver.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-01-06 21:25:18 -08:00
Chen, Gong	addccbb264	ACPI, APEI, GHES: Do not report only correctable errors with SCI Currently SCI is employed to handle corrected errors - memory corrected errors, more specifically but in fact SCI still can be used to handle any errors, e.g. uncorrected or even fatal ones if enabled by the BIOS. Enable logging for those kinds of errors too. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1385363701-12387-1-git-send-email-gong.chen@linux.intel.com [ Boris: massage commit message, rename function arg. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2013-12-21 13:31:06 +01:00
Levente Kurusa	853d9b18f1	x86, mce: Call put_device on device_register failure This patch adds a call to put_device() when the device_register() call has failed. This is required so that the last reference to the device is given up. Signed-off-by: Levente Kurusa <levex@linux.com> Link: http://lkml.kernel.org/r/5298F900.9000208@linux.com Signed-off-by: Borislav Petkov <bp@suse.de>	2013-11-30 12:14:51 +01:00
Chen, Gong	147de14772	ACPI, APEI, CPER: Add UEFI 2.4 support for memory error In latest UEFI spec(by now it is 2.4) memory error definition for CPER (UEFI 2.4 Appendix N Common Platform Error Record) adds some new fields. These fields help people to locate memory error to an actual DIMM location. Original-author: Tony Luck <tony.luck@intel.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2013-10-23 10:10:20 -07:00
Ingo Molnar	6356bb0ad6	Bit 12 may or may not be set in MCi_STATUS.MCACOD when an uncorrected error is reported. Ignore it when checking error signatures. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJR/912AAoJEKurIx+X31iBz/oP/javBL5Q098ir8qj2GdlkSV0 sPZkJ0Qd4AmmcvYgiYfE3wVL0F8mL2R3gFLcppN7wBTNqAjFOEOJb2wofwByiv+T 0eOM9OWxPZTsoxdiCHd5pvoNuw6lkegUfmxbb9vr9Xs0JaJuhjQY/bbbUbi2ue2V WoghLZvcSyZBGAaFJ3tWNdPYroHp4OulbvVdgbb4+iLgaLz73zzpv1HQvBnw6CLW x+P5guI5FawZnS6FjPwfjIs1mKqU5fdBxGl4vHB55IPyexWOVY6i+zc9VG0EuzDE za+FW2v8ohJzzxNP3/u4W3ousJoXkZSrwlZvDvuEkozrM8izibmDr2JglYqnQ2sK 5WMXL1aXHyTISWpHYiH0/218hljUhxaC5+TfNVeiZYytFULSyZOES8Em3fENx0gN HohTZkyBH0jNd2OZytSqS69/dvPgDIFD7qGR6KB2CaBAU+c/qH9M6g8vfaXt5Y/6 2a0oKrZ+WXiXYgEq3PynnTHTiaHYDo2rjBN+yQDrauomopZv0qpEMEicS66G0lE3 7+zh3CQOiv6WL9pYQxYjeIiP46H7tb+BSpbsDYDQ23++nLj61by1YXrLBJ2MsGep 2xEDkoVE7jW5+vskcSIUfavmp7pNNnIpRsU2cb7bem44iTc2DkuHYXZtHstvQIDN qIwoXyi3JE2siMTs4icc =LbVP -----END PGP SIGNATURE----- Merge tag 'please-pull-mce-f-bit' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull MCE-uncorrected-error fix from Tony Luck: "Bit 12 may or may not be set in MCi_STATUS.MCACOD when an uncorrected error is reported. Ignore it when checking error signatures." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-08-12 19:51:43 +02:00
Ingo Molnar	0237d7f355	Merge branch 'x86/mce' into x86/ras Pursue a single RAS/MCE topic branch on x86. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-08-12 17:54:05 +02:00
Tony Luck	1a7f0e3c4f	x86/mce: Fix mce regression from recent cleanup In commit `33d7885b59` x86/mce: Update MCE severity condition check We simplified the rules to recognise each classification of recoverable machine check combining the instruction and data fetch rules into a single entry based on clarifications in the June 2013 SDM that all recoverable events would be reported on the unaffected processor with MCG_STATUS.EIPV=0 and MCG_STATUS.RIPV=1. Unfortunately the simplified rule has a couple of bugs. Fix them here. Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2013-07-29 11:23:27 -07:00
Paul Gortmaker	148f9bb877	x86: delete __cpuinit usage from all x86 files The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit `5e427ec2d0` ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. Note that some harmless section mismatch warnings may result, since notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c) are flagged as __cpuinit -- so if we remove the __cpuinit from arch specific callers, we will also get section mismatch warnings. As an intermediate step, we intend to turn the linux/init.h cpuinit content into no-ops as early as possible, since that will get rid of these warnings. In any case, they are temporary and harmless. This removes all the arch/x86 uses of the __cpuinit macros from all C files. x86 only had the one __CPUINIT used in assembly files, and it wasn't paired off with a .previous or a __FINIT, so we can delete it directly w/o any corresponding additional change there. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2013-07-14 19:36:56 -04:00
Linus Torvalds	8cbd0eefca	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management updates from Zhang Rui: "There are not too many changes this time, except two new platform thermal drivers, ti-soc-thermal driver and x86_pkg_temp_thermal driver, and a couple of small fixes. Highlights: - move the ti-soc-thermal driver out of the staging tree to the thermal tree. - introduce the x86_pkg_temp_thermal driver. This driver registers CPU digital temperature package level sensor as a thermal zone. - small fixes/cleanups including removing redundant use of platform_set_drvdata() and of_match_ptr for all platform thermal drivers" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (34 commits) thermal: cpu_cooling: fix stub function thermal: ti-soc-thermal: use standard GPIO DT bindings thermal: MAINTAINERS: Add git tree path for SoC specific updates thermal: fix x86_pkg_temp_thermal.c build and Kconfig Thermal: Documentation for x86 package temperature thermal driver Thermal: CPU Package temperature thermal thermal: consider emul_temperature while computing trend thermal: ti-soc-thermal: add DT example for DRA752 chip thermal: ti-soc-thermal: add dra752 chip to device table thermal: ti-soc-thermal: add thermal data for DRA752 chips thermal: ti-soc-thermal: remove usage of IS_ERR_OR_NULL thermal: ti-soc-thermal: freeze FSM while computing trend thermal: ti-soc-thermal: remove external heat while extrapolating hotspot thermal: ti-soc-thermal: update DT reference for OMAP5430 x86, mcheck, therm_throt: Process package thresholds thermal: cpu_cooling: fix 'descend' check in get_property() Thermal: spear: Remove redundant use of of_match_ptr Thermal: kirkwood: Remove redundant use of of_match_ptr Thermal: dove: Remove redundant use of of_match_ptr Thermal: armada: Remove redundant use of of_match_ptr ...	2013-07-11 12:26:08 -07:00
Naveen N. Rao	c3d1fb567a	mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC The Corrected Machine Check structure (CMC) in HEST has a flag which can be set by the firmware to indicate to the OS that it prefers to process the corrected error events first. In this scenario, the OS is expected to not monitor for corrected errors (through CMCI/polling). Instead, the firmware notifies the OS on corrected error events through GHES. Linux already has support for GHES. This patch adds support for parsing CMC structure and to disable CMCI/polling if the firmware first flag is set. Further, the list of machine check bank structures at the end of CMC is used to determine which MCA banks function in FF mode, so that we continue to monitor error events on the other banks. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com>	2013-07-08 11:53:01 -07:00
Linus Torvalds	a9f4a7005f	Thermal limit warnings are too scary and cause unnecessary concern -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJRzhAdAAoJEKurIx+X31iByQEP/1eAEHNiSCrrKUm/ovGCiVBZ w+ZRJkcKYBj+/3cQUeMds7OevmM8fSU5t+4zGpikobQ75xCjz7vJ6nQNLrSylqQG Z4weVme6a529szRwv6H9an+psj5wP0yWAbCgBSw4/O3cogDS0eMD8T6ZhEi8mWJ9 PRSdqOLdxl7AZpJpJpP3GF7VQLJLDbVv56FcKgAZ3mtFM6qr7HFiTtnTc+Jl33V5 bM7v23HXyoyRk+JOUMtO+4BlrkKzbtL4Zh8aGw4s71QX3zZbP0BIw9sKffNMD+2k oYXOcJjTwbtxQqXKoSKOtvLUQzf/SDqzlFf7yU38Spw+nRECPhVVRO0aAXKOmFKD 74qyNuW6ZXMw7xH4M7iG6MuM/cZIvueDtp0RSZVc6Gmb5CGb1aXz4GeotvKF5Bev nVL65UVo6XwfB4m77ayPYqfD6KX0BsWHdCzWABNl8WbgyCY5dXRJJs/c9iLCl05I EXRgWLJK/H/wOY9mj3NGcvRicYJigZP2luJEdwn8mVlpZl+3wstDcgq9Ijxr0LGi bj2H9ro3vSaXpAnJw3FR9QbXgH/8Nbb8sV7yBddy9zntrnfEfvrYtrhOKR+fPrQL C5zXeaXGEnqCEcYNESSjyhDCa1az49BlJC5ebEHHqdviHzRpVVi/EG920GViWw+D z+z25N5hi4yoUF/+1vc/ =Dm06 -----END PGP SIGNATURE----- Merge tag 'please-pull-mce-therm' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull thermal power-limit update from Tony Luck: "Thermal limit warnings are too scary and cause unnecessary concern" * tag 'please-pull-mce-therm' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: x86 thermal: Disable power limit notification interrupt by default x86 thermal: Delete power-limit-notification console messages	2013-07-03 11:16:09 -07:00
Linus Torvalds	96a3d998fb	Merge branch 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 tracing updates from Ingo Molnar: "This tree adds IRQ vector tracepoints that are named after the handler and which output the vector #, based on a zero-overhead approach that relies on changing the IDT entries, by Seiji Aguchi. The new tracepoints look like this: # perf list \| grep -i irq_vector irq_vectors:local_timer_entry [Tracepoint event] irq_vectors:local_timer_exit [Tracepoint event] irq_vectors:reschedule_entry [Tracepoint event] irq_vectors:reschedule_exit [Tracepoint event] irq_vectors:spurious_apic_entry [Tracepoint event] irq_vectors:spurious_apic_exit [Tracepoint event] irq_vectors:error_apic_entry [Tracepoint event] irq_vectors:error_apic_exit [Tracepoint event] [...]" * 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tracing: Add config option checking to the definitions of mce handlers trace,x86: Do not call local_irq_save() in load_current_idt() trace,x86: Move creation of irq tracepoints from apic.c to irq.c x86, trace: Add irq vector tracepoints x86: Rename variables for debugging x86, trace: Introduce entering/exiting_irq() tracing: Add DEFINE_EVENT_FN() macro	2013-07-02 16:31:49 -07:00
Ingo Molnar	fb476cffd5	Changes to simplify the SDM mean that we can also simplify the code for SRAR (software recoverable action required) errors. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJRzKq7AAoJEKurIx+X31iBviQP+gJ6b9BemXj8/TLjWN3P3mAB bQHU9imlSXmggfauC1C4/b60k4G/D8d1hN1Lboak7TbFvY69ONcSodY9w6O9Lknb r83xT0oUIe5IYWqC42+godt8P59ed10lUa8F5eaA9DhhzcsCWO2GG/ITqCmyjEom zJGKlLytq6ViJf+6ig3gTmWgHU4tvlJvOPh/O0WNWXH9kbOtnsLT8ImY9GMO/R1B LdItg2nxbDZmhjLOsKamaj2YpPbZ9Vtvam9CnQwirbUAkdGZfmRsq+K9y8O5MjHJ GRqOKwQ3BdKBUIEu36BxgIGHSg8hWYSefGD8nz8IP+QHwy5fx7SREdgx5M+AMSFt 8DewSUij2wMwH/vuP4KrdYWva6LNMBtR+NItCkzpdY5ykcdoSzo06nCotGcUrxVN iEmYywZzO3CTYJj0lWqdrhhiFln/9ZrhXaWUbRoX5m3HAQK7XeTPT+3O7Vs5fWDN 3Fb/0EzyJjmv+Mysl8/PvR9umJs3ELK5MiVLIAIkuozh8+L33byk/WX+VOfqTsY4 KXYDm6fck3tRfn+PJGrAMSwIEyhgVJArRJC39PydEUIwjlel6DRU0Rs26wQXtFh8 DHxTIiEubmvctvi3DnHOztOI3d3EkAayU6Dk4cNSy46FwujJC0v+07EQav6BLA3e x1HK+3PVVSQYuqFIeyWa =PJXS -----END PGP SIGNATURE----- Merge tag 'please-pull-mce' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull MCE cleanup from Tony Luck: "Changes to simplify the SDM means that we can also simplify the code for SRAR (software recoverable action required) errors." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-06-28 11:09:26 +02:00
Chen Gong	33d7885b59	x86/mce: Update MCE severity condition check Update some SRAR severity conditions check to make it clearer, according to latest Intel SDM Vol 3(June 2013), table 15-20. Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2013-06-27 13:45:52 -07:00
Ingo Molnar	ca02c21674	Better comments so we understand our existing machine check bank bitmaps - prelude to adding another bitmap soon. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJRygQkAAoJEKurIx+X31iB5RsP/R6w/X4wbKlXD8K8e2qhSXU2 oYVtaMN46M+xLRdNDdE1Y+clVKDQTuapfrlOtBLZVHIkNfqF5jAtu2O1wTERpx+F 7lDHuWuFbddbqV2vqE6RJ6ZqsUTsrlzrqLSLWGSoHn36DlnkSm+L5vDDG75QzV+w fPpGpJTM3kRxPUHnmwvfniJxTiBjsli6FDZ1vGq4Ingu+xxOJDU6+FdWts08hn4Y +KLjs1JMJSdo3di05T6t4ARKBgW3B1lJnrIS6fGYMmax+kHQqO/zvzHs3A86LZyN 7L0toEoZHa8VRMN6C1mw0XsFwPmyMOsrHrRpBlFgHpC7QLAzx6LkxchyDBqPL0mo W4bnoyu1QZUvblq1mrsnJa9yeyMjSHAeq2XTBj8Pbv20AKzmCbNl3aJ5tCDhPj4Y A+e8vk/tq8zWCjdf3SV/D+wjgEtkeWALZZj70maufu9AKtKXy5repa6DZCKEhyIC yh72c3gZ25IkxjwcHmJh+CYaKll9MgdW5toZkftBin2iOdWSaHS9QX2r4I5Awvbi m0Iwq5Fg8DSs3AhBLxiJi8L7N+RNa+7GoTH0z6UDtfAY3ExvepfwjPzLxAfTbPw6 oK2wLfuPiNizl2DX1yNlizGD2YSRiWKoap3Iog84A3IMTIe1nJ+97GHvsVcGna8I zXjnZDNQThAgBswtNY1s =U/Kg -----END PGP SIGNATURE----- Merge tag 'please-pull-mce-bitmap-comment' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull MCE updates from Tony Luck: "Better comments so we understand our existing machine check bank bitmaps - prelude to adding another bitmap soon." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-06-26 10:53:45 +02:00
Naveen N. Rao	0644414e62	mce: acpi/apei: Add comments to clarify usage of the various bitfields in the MCA subsystem There is some confusion about the 'mce_poll_banks' and 'mce_banks_owned' per-cpu bitmaps. Provide comments so that we all know exactly what these are used for, and why. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com>	2013-06-25 13:53:27 -07:00
Seiji Aguchi	cf910e83ae	x86, trace: Add irq vector tracepoints [Purpose of this patch] As Vaibhav explained in the thread below, tracepoints for irq vectors are useful. http://www.spinics.net/lists/mm-commits/msg85707.html <snip> The current interrupt traces from irq_handler_entry and irq_handler_exit provide when an interrupt is handled. They provide good data about when the system has switched to kernel space and how it affects the currently running processes. There are some IRQ vectors which trigger the system into kernel space, which are not handled in generic IRQ handlers. Tracing such events gives us the information about IRQ interaction with other system events. The trace also tells where the system is spending its time. We want to know which cores are handling interrupts and how they are affecting other processes in the system. Also, the trace provides information about when the cores are idle and which interrupts are changing that state. <snip> On the other hand, my usecase is tracing just local timer event and getting a value of instruction pointer. I suggested to add an argument local timer event to get instruction pointer before. But there is another way to get it with external module like systemtap. So, I don't need to add any argument to irq vector tracepoints now. [Patch Description] Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events. But there is an above use case to trace specific irq_vector rather than tracing all events. In this case, we are concerned about overhead due to unwanted events. So, add following tracepoints instead of introducing irq_vector_entry/exit. so that we can enable them independently. - local_timer_vector - reschedule_vector - call_function_vector - call_function_single_vector - irq_work_entry_vector - error_apic_vector - thermal_apic_vector - threshold_apic_vector - spurious_apic_vector - x86_platform_ipi_vector Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty makes a zero when tracepoints are disabled. Detailed explanations are as follows. - Create trace irq handlers with entering_irq()/exiting_irq(). - Create a new IDT, trace_idt_table, at boot time by adding a logic to _set_gate(). It is just a copy of original idt table. - Register the new handlers for tracpoints to the new IDT by introducing macros to alloc_intr_gate() called at registering time of irq_vector handlers. - Add checking, whether irq vector tracing is on/off, into load_current_idt(). This has to be done below debug checking for these reasons. - Switching to debug IDT may be kicked while tracing is enabled. - On the other hands, switching to trace IDT is kicked only when debugging is disabled. In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being used for other purposes. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>	2013-06-20 22:25:34 -07:00
Seiji Aguchi	eddc0e922a	x86, trace: Introduce entering/exiting_irq() When implementing tracepoints in interrupt handers, if the tracepoints are simply added in the performance sensitive path of interrupt handers, it may cause potential performance problem due to the time penalty. To solve the problem, an idea is to prepare non-trace/trace irq handers and switch their IDTs at the enabling/disabling time. So, let's introduce entering_irq()/exiting_irq() for pre/post- processing of each irq handler. A way to use them is as follows. Non-trace irq handler: smp_irq_handler() { entering_irq(); /* pre-processing of this handler / __smp_irq_handler(); / * common logic between non-trace and trace handlers * in a vector. / exiting_irq(); / post-processing of this handler / } Trace irq_handler: smp_trace_irq_handler() { entering_irq(); / pre-processing of this handler / trace_irq_entry(); / tracepoint for irq entry / __smp_irq_handler(); / * common logic between non-trace and trace handlers * in a vector. / trace_irq_exit(); / tracepoint for irq exit / exiting_irq(); / post-processing of this handler */ } If tracepoints can place outside entering_irq()/exiting_irq() as follows, it looks cleaner. smp_trace_irq_handler() { trace_irq_entry(); smp_irq_handler(); trace_irq_exit(); } But it doesn't work. The problem is with irq_enter/exit() being called. They must be called before trace_irq_enter/exit(), because of the rcu_irq_enter() must be called before any tracepoints are used, as tracepoints use rcu to synchronize. As a possible alternative, we may be able to call irq_enter() first as follows if irq_enter() can nest. smp_trace_irq_hander() { irq_entry(); trace_irq_entry(); smp_irq_handler(); trace_irq_exit(); irq_exit(); } But it doesn't work, either. If irq_enter() is nested, it may have a time penalty because it has to check if it was already called or not. The time penalty is not desired in performance sensitive paths even if it is tiny. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/51C3238D.9040706@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org>	2013-06-20 22:25:01 -07:00
Fenghua Yu	6bb2ff846f	x86 thermal: Disable power limit notification interrupt by default The package power limit notification interrupt is primarily for system diagnosis, and should not be blindly enabled on every system by default -- particuarly since Linux does nothing in the handler except count how many times it has been called... Add a new kernel cmdline parameter "int_pln_enable" for situations where users want to oberve these events via existing system counters: $ grep TRM /proc/interrupts $ grep . /sys/devices/system/cpu/cpu/thermal_throttle/ https://bugzilla.kernel.org/show_bug.cgi?id=36182 Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2013-06-14 14:49:00 -07:00
Fenghua Yu	c81147483e	x86 thermal: Delete power-limit-notification console messages Package power limits are common on some systems under some conditions -- so printing console messages when limits are reached causes unnecessary customer concern and support calls. Note that even with these console messages gone, the events can still be observed via system counters: $ grep TRM /proc/interrupts Shows total thermal interrupts, which includes both power limit notifications and thermal throttling interrupts. $ grep . /sys/devices/system/cpu/cpu/thermal_throttle/ Will show what caused those interrupts, core and package throttling and power limit notifications. https://bugzilla.kernel.org/show_bug.cgi?id=36182 Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2013-06-14 14:48:37 -07:00
Srinivas Pandruvada	25cdce170d	x86, mcheck, therm_throt: Process package thresholds Added callback registration for package threshold reports. Also added a callback to check the rate control implemented in callback or not. If there is no rate control implemented, then there is a default rate control similar to core threshold notification by delaying for CHECK_INTERVAL (5 minutes) between reports. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>	2013-06-13 09:59:14 +08:00
Mathias Krause	a90936845d	x86, mce: Fix "braodcast" typo Fix the typo in MCJ_IRQ_BRAODCAST. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2013-06-05 11:59:17 +02:00
Ingo Molnar	b6d5278dc8	Clean up cmci_rediscover code to fix problems found by Dave Jones -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJRXI4YAAoJEKurIx+X31iBY1UP/Rq7DkmmffFh2faMVPMdIC/T Z3xwbqmqNpqJOjBojocOkwpG44Da51h4dZNIdm6xtEFojqf9UdaTPobe00jOEotX MPcc4VsuTS2WLd3dCkRHCfHu2nW6z/F4ysEtIirrHVj5JVqqH9atZPkGBB3Wcsic 00QTV/JeOP1QjvGEMxnaW2rNIXSE19NNbiHE0tx9jNYZuecOhll/GH2AOW17+Kvn ocn4dkisbHRZ/YixuwkqXzDABj2+qiwjHZbP4f4jZbdDuuDR2KWeW0Vkoau1vRHg TU3FX2BCBy4foA8St/AQ7WRbmSA/+5obUWvSWevXXZu7A6CxAmkv4s46D3rRuIaB MLDqov0NobqM4vjJn4Y4bcbvYMIv8zJijok6sg2MNFIHUy9iBFTlJHOai9guSdiB kHKTfLsCUpdkiwHMq3rcFCAMZi/DSxjjPbae8IzP7+IJAG1ff1SP6QwM6UOZq9h1 wVYpvavIOwBhKIsjghkqF0Ov/8a+4lyXMksQ3nKleKYD5908hpDYMD/2i6MTCkpy 50k3OVvnpWbGdQmYwbTiGaOW/jQ2tC72v24o/Y9tKYc2Fea0BlWn+yzIGahCn6ok j7oQHAW1OJ2oc3h/RWtDWSejXUpYsyclCZhlFB+TQj1J/2d7VytFxo+g156xD1Rj VJ0Dseml7vA4Orr6jAMh =wipk -----END PGP SIGNATURE----- Merge tag 'please-pull-cmci_rediscover' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull clean up of the cmci_rediscover code to fix problems found by Dave Jones, from Tony Luck. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-08 17:41:50 +02:00
Srivatsa S. Bhat	7a0c819d28	x86/mce: Rework cmci_rediscover() to play well with CPU hotplug Dave Jones reports that offlining a CPU leads to this trace: numa_remove_cpu cpu 1 node 0: mask now 0,2-3 smpboot: CPU 1 is now offline BUG: using smp_processor_id() in preemptible [00000000] code: cpu-offline.sh/10591 caller is cmci_rediscover+0x6a/0xe0 Pid: 10591, comm: cpu-offline.sh Not tainted 3.9.0-rc3+ #2 Call Trace: [<ffffffff81333bbd>] debug_smp_processor_id+0xdd/0x100 [<ffffffff8101edba>] cmci_rediscover+0x6a/0xe0 [<ffffffff815f5b9f>] mce_cpu_callback+0x19d/0x1ae [<ffffffff8160ea66>] notifier_call_chain+0x66/0x150 [<ffffffff8107ad7e>] __raw_notifier_call_chain+0xe/0x10 [<ffffffff8104c2e3>] cpu_notify+0x23/0x50 [<ffffffff8104c31e>] cpu_notify_nofail+0xe/0x20 [<ffffffff815ef082>] _cpu_down+0x302/0x350 [<ffffffff815ef106>] cpu_down+0x36/0x50 [<ffffffff815f1c9d>] store_online+0x8d/0xd0 [<ffffffff813edc48>] dev_attr_store+0x18/0x30 [<ffffffff81226eeb>] sysfs_write_file+0xdb/0x150 [<ffffffff811adfb2>] vfs_write+0xa2/0x170 [<ffffffff811ae16c>] sys_write+0x4c/0xa0 [<ffffffff81613019>] system_call_fastpath+0x16/0x1b However, a look at cmci_rediscover shows that it can be simplified quite a bit, apart from solving the above issue. It invokes functions that take spin locks with interrupts disabled, and hence it can run in atomic context. Also, it is run in the CPU_POST_DEAD phase, so the dying CPU is already dead and out of the cpu_online_mask. So take these points into account and simplify the code, and thereby also fix the above issue. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2013-04-02 14:04:01 -07:00
Boris Ostrovsky	bafcdd3b6c	x86, MCE, AMD: Use MCG_CAP MSR to find out number of banks on AMD Currently number of error reporting register banks is hardcoded to 6 on AMD processors. This may break in virtualized scenarios when a hypervisor prefers to report fewer banks than what the physical HW provides. Since number of supported banks is reported in MSR_IA32_MCG_CAP[7:0] that's what we should use. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1363295441-1859-3-git-send-email-boris.ostrovsky@oracle.com [ reverse NULL ptr test logic ] Signed-off-by: Borislav Petkov <bp@suse.de>	2013-03-22 11:25:01 +01:00
Boris Ostrovsky	c76e81643c	x86, MCE, AMD: Replace shared_bank array with is_shared_bank() helper Use helper function instead of an array to report whether register bank is shared. Currently only bank 4 (northbridge) is shared. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1363295441-1859-2-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Borislav Petkov <bp@suse.de>	2013-03-22 11:25:01 +01:00
Linus Torvalds	9043a2650c	The sweeping change is to make add_taint() explicitly indicate whether to disable lockdep, but it's a mechanical change. Cheers, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJRJAcuAAoJENkgDmzRrbjxsw0P/3eXb+LddYnx0V0uHYdKpCUf 4vdW7X0fX3Z+aUK69IWRL/6ahoO4TpaHYGHBDjEoivyQ0GDq14X7JNWsYYt3LdMf 3wmDgRc2cn/mZOJbFeVpNV8ox5l/xc0CUvV+iQ8tMjfQItXMXgWUFZKMECsXKSO6 eex3lrw9M2jAX2uL8LQPp9W8xtKu24nSZRC6tH5riE/8fCzi1cZPPAqfxP5c8Lee ZXtbCRSyAFENZLpKyMe1PC7HvtJyi5NDn9xwOQiXULZV/VOlvP94DGBLIKCM/6dn 4QvZxpG0P0uOlpCgRAVLyh/z7g4XY4VF/fHopLCmEcqLsvgD+V2LQpQ9zWUalLPC Z+pUpz2vu0gIddPU1nR8R6oGpEdJ8O12aJle62p/RSXWZGx12qUQ+Tamu0tgKcv1 AsiJfbUGNDYfxgU6sHsoQjl2f68LTVckCU1C1LqEbW/S104EIORtGx30CHM4LRiO 32kDC5TtgYDBKQAIqJ4bL48ZMh+9W3uX40p7xzOI5khHQjvswUKa3jcxupU0C1uv lx8KXo7pn8WT33QGysWC782wJCgJuzSc2vRn+KQoqoynuHGM6agaEtR59gil3QWO rQEcxH63BBRDgHlg4FM9IkJwwsnC3PWKL8gbX0uAWXAPMbgapJkuuGZAwt0WDGVK +GszxsFkCjlW0mK0egTb =tiSY -----END PGP SIGNATURE----- Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module update from Rusty Russell: "The sweeping change is to make add_taint() explicitly indicate whether to disable lockdep, but it's a mechanical change." * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: MODSIGN: Add option to not sign modules during modules_install MODSIGN: Add -s <signature> option to sign-file MODSIGN: Specify the hash algorithm on sign-file command line MODSIGN: Simplify Makefile with a Kconfig helper module: clean up load_module a little more. modpost: Ignore ARC specific non-alloc sections module: constify within_module_* taint: add explicit flag to show whether lock dep is still OK. module: printk message when module signature fail taints kernel.	2013-02-25 15:41:43 -08:00
Rusty Russell	373d4d0997	taint: add explicit flag to show whether lock dep is still OK. Fix up all callers as they were before, with make one change: an unsigned module taints the kernel, but doesn't turn off lockdep. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2013-01-21 17:17:57 +10:30
Tejun Heo	4d899be584	x86/mce: don't use [delayed_]work_pending() There's no need to test whether a (delayed) work item in pending before queueing, flushing or cancelling it. Most uses are unnecessary and quite a few of them are buggy. Remove unnecessary pending tests from x86/mce. Only compile tested. v2: Local var work removed from mce_schedule_work() as suggested by Borislav. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac@vger.kernel.org	2012-12-28 13:40:16 -08:00
Linus Torvalds	2d9c8b5d6a	Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS update from Ingo Molnar: "Rework all config variables used throughout the MCA code and collect them together into a mca_config struct. This keeps them tightly and neatly packed together instead of spilled all over the place. Then, convert those which are used as booleans into real booleans and save some space. These bits are exposed via /sys/devices/system/machinecheck/machinecheck/" 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, MCA: Finish mca_config conversion x86, MCA: Convert the next three variables batch x86, MCA: Convert rip_msr, mce_bootlog, monarch_timeout x86, MCA: Convert dont_log_ce, banks and tolerant drivers/base: Add a DEVICE_BOOL_ATTR macro	2012-12-14 09:59:59 -08:00
Ingo Molnar	226f69a4b7	Fix problem in CMCI rediscovery code that was illegally migrating worker threads to other cpus. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJQkEqpAAoJEKurIx+X31iBZk0P/2h4IkLYz7DspI9gxVMXfMEm 0lIWWIEaqAbOkFsi8VuGjlNrgU+7PabKs/2/++tfbq+hJdQYCCxyAKCGeWbdBw/R fUSTiyQYH84DEFySg6G1AJQwVB8nnRLNWm5wrUtMgX9/2E6D5dpFB0F301XLF+kg OMY7RaFPWJRiWwlOnWWnbY3czNMragaTAyHIudj7ZvsgwBNWw3bgGY/sjIjJ3yy5 kyz0gYEsanOizSjT6Udr2MPFY2ol11co1MT6Ro4r7ORCvX2wSUTChUks2kZBzJ7l Jf9g22ymVlvAo2qsCs/DBzRwXw/Ck0MlUMH8QehvMPLD39yoBiUYDeEqRpadmsQE FLDyKBoxaH6nRzGCDJlTzD2FogHnChQaUtQ9nnyoSBNOjYt2lI8Dc3jEnXwWprim 3P2giL10Gf4LRdHSjHZp/6+kXzbTKqNIs1qfSMPz0GDcujAmTYJ8edyHI7fme5So BgoSTBtjorxShNQjtg7fBVl3dp3oOnAFyOxDwToLUHWAVZKcXewQh5HkbgIawul4 YoiAsveP2FBCKbJA2xBEbI2S3hMKgRauAvh33JNucgZOM7RqPwkCpiAARzbD6mpR tDNqhgXJZ+0F/3prIm4MzapaIivrlQ+LLxvVDTOYQtZyJi1Ba914zw+yUY2VMMHM IvWy1qsmB77XxhmvgWj5 =tv13 -----END PGP SIGNATURE----- Merge tag 'please-pull-tangchen' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/urgent Pull MCE fix from Tony Luck: "Fix problem in CMCI rediscovery code that was illegally migrating worker threads to other cpus." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-11-13 19:01:01 +01:00
Tang Chen	85b97637bb	x86/mce: Do not change worker's running cpu in cmci_rediscover(). cmci_rediscover() used set_cpus_allowed_ptr() to change the current process's running cpu, and migrate itself to the dest cpu. But worker processes are not allowed to be migrated. If current is a worker, the worker will be migrated to another cpu, but the corresponding worker_pool is still on the original cpu. In this case, the following BUG_ON in try_to_wake_up_local() will be triggered: BUG_ON(rq != this_rq()); This will cause the kernel panic. The call trace is like the following: [ 6155.451107] ------------[ cut here ]------------ [ 6155.452019] kernel BUG at kernel/sched/core.c:1654! ...... [ 6155.452019] RIP: 0010:[<ffffffff810add15>] [<ffffffff810add15>] try_to_wake_up_local+0x115/0x130 ...... [ 6155.452019] Call Trace: [ 6155.452019] [<ffffffff8166fc14>] __schedule+0x764/0x880 [ 6155.452019] [<ffffffff81670059>] schedule+0x29/0x70 [ 6155.452019] [<ffffffff8166de65>] schedule_timeout+0x235/0x2d0 [ 6155.452019] [<ffffffff810db57d>] ? mark_held_locks+0x8d/0x140 [ 6155.452019] [<ffffffff810dd463>] ? __lock_release+0x133/0x1a0 [ 6155.452019] [<ffffffff81671c50>] ? _raw_spin_unlock_irq+0x30/0x50 [ 6155.452019] [<ffffffff810db8f5>] ? trace_hardirqs_on_caller+0x105/0x190 [ 6155.452019] [<ffffffff8166fefb>] wait_for_common+0x12b/0x180 [ 6155.452019] [<ffffffff810b0b30>] ? try_to_wake_up+0x2f0/0x2f0 [ 6155.452019] [<ffffffff8167002d>] wait_for_completion+0x1d/0x20 [ 6155.452019] [<ffffffff8110008a>] stop_one_cpu+0x8a/0xc0 [ 6155.452019] [<ffffffff810abd40>] ? __migrate_task+0x1a0/0x1a0 [ 6155.452019] [<ffffffff810a6ab8>] ? complete+0x28/0x60 [ 6155.452019] [<ffffffff810b0fd8>] set_cpus_allowed_ptr+0x128/0x130 [ 6155.452019] [<ffffffff81036785>] cmci_rediscover+0xf5/0x140 [ 6155.452019] [<ffffffff816643c0>] mce_cpu_callback+0x18d/0x19d [ 6155.452019] [<ffffffff81676187>] notifier_call_chain+0x67/0x150 [ 6155.452019] [<ffffffff810a03de>] __raw_notifier_call_chain+0xe/0x10 [ 6155.452019] [<ffffffff81070470>] __cpu_notify+0x20/0x40 [ 6155.452019] [<ffffffff810704a5>] cpu_notify_nofail+0x15/0x30 [ 6155.452019] [<ffffffff81655182>] _cpu_down+0x262/0x2e0 [ 6155.452019] [<ffffffff81655236>] cpu_down+0x36/0x50 [ 6155.452019] [<ffffffff813d3eaa>] acpi_processor_remove+0x50/0x11e [ 6155.452019] [<ffffffff813a6978>] acpi_device_remove+0x90/0xb2 [ 6155.452019] [<ffffffff8143cbec>] __device_release_driver+0x7c/0xf0 [ 6155.452019] [<ffffffff8143cd6f>] device_release_driver+0x2f/0x50 [ 6155.452019] [<ffffffff813a7870>] acpi_bus_remove+0x32/0x6d [ 6155.452019] [<ffffffff813a7932>] acpi_bus_trim+0x87/0xee [ 6155.452019] [<ffffffff813a7a21>] acpi_bus_hot_remove_device+0x88/0x16b [ 6155.452019] [<ffffffff813a33ee>] acpi_os_execute_deferred+0x27/0x34 [ 6155.452019] [<ffffffff81090589>] process_one_work+0x219/0x680 [ 6155.452019] [<ffffffff81090528>] ? process_one_work+0x1b8/0x680 [ 6155.452019] [<ffffffff813a33c7>] ? acpi_os_wait_events_complete+0x23/0x23 [ 6155.452019] [<ffffffff810923be>] worker_thread+0x12e/0x320 [ 6155.452019] [<ffffffff81092290>] ? manage_workers+0x110/0x110 [ 6155.452019] [<ffffffff81098396>] kthread+0xc6/0xd0 [ 6155.452019] [<ffffffff8167c4c4>] kernel_thread_helper+0x4/0x10 [ 6155.452019] [<ffffffff81671f30>] ? retint_restore_args+0x13/0x13 [ 6155.452019] [<ffffffff810982d0>] ? __init_kthread_worker+0x70/0x70 [ 6155.452019] [<ffffffff8167c4c0>] ? gs_change+0x13/0x13 This patch removes the set_cpus_allowed_ptr() call, and put the cmci rediscover jobs onto all the other cpus using system_wq. This could bring some delay for the jobs. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2012-10-30 14:38:12 -07:00
Borislav Petkov	e6d41e8c69	x86, AMD: Change Boris' email address Move to private email and put in maintained status. Signed-off-by: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/1351532410-4887-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-30 10:05:50 +01:00
Borislav Petkov	1462594bf2	x86, MCA: Finish mca_config conversion mce_ser, mce_bios_cmci_threshold and mce_disabled are the last three bools which need conversion. Move them to the mca_config struct and adjust usage sites accordingly. Signed-off-by: Borislav Petkov <bp@alien8.de> Acked-by: Tony Luck <tony.luck@intel.com>	2012-10-26 14:37:58 +02:00
Borislav Petkov	7af19e4afd	x86, MCA: Convert the next three variables batch Move them into the mca_config struct and adjust code touching them accordingly. Signed-off-by: Borislav Petkov <bp@alien8.de> Acked-by: Tony Luck <tony.luck@intel.com>	2012-10-26 14:37:57 +02:00
Borislav Petkov	84c2559dee	x86, MCA: Convert rip_msr, mce_bootlog, monarch_timeout Move above configuration variables into struct mca_config and adjust usage places accordingly. Signed-off-by: Borislav Petkov <bp@alien8.de> Acked-by: Tony Luck <tony.luck@intel.com>	2012-10-26 14:37:57 +02:00
Borislav Petkov	d203f0b824	x86, MCA: Convert dont_log_ce, banks and tolerant Move those MCA configuration variables into struct mca_config and adjust the places they're used accordingly. Signed-off-by: Borislav Petkov <bp@alien8.de> Acked-by: Tony Luck <tony.luck@intel.com>	2012-10-26 14:37:56 +02:00
H. Peter Anvin	4533d86270	Merge commit '5bc66170dc486556a1e36fd384463536573f4b82' into x86/urgent From Borislav Petkov <bp@amd64.org>: Below is a RAS fix which reverts the addition of a sysfs attribute which we agreed is not needed, post-factum. And this should go in now because that sysfs attribute is going to end up in 3.7 otherwise and thus exposed to userspace; removing it then would be a lot harder. This is done as a merge rather than a simple patch/cherry-pick since the baseline for this patch was not in the previous x86/urgent. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-10-19 07:55:09 -07:00
Borislav Petkov	5bc66170dc	x86, MCE: Remove bios_cmci_threshold sysfs attribute `450cc20103` ("x86/mce: Provide boot argument to honour bios-set CMCI threshold") added the bios_cmci_threshold sysfs attribute which was supposed to communicate to userspace tools that BIOS CMCI threshold has been honoured. However, this info is not of any importance to userspace - it should rather get the actual error count it has been thresholded already from MCi_STATUS[38:52]. So drop this before it becomes a used interface (good thing we caught this early in 3.7-rc1, right after the merge window closed). Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20121017105940.GA14590@x1.osrc.amd.com Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-10-19 15:22:29 +02:00
Daniel J Blueman	21c5e50e15	x86, amd, mce: Avoid NULL pointer reference on CPU northbridge lookup When booting on a federated multi-server system (NumaScale), the processor Northbridge lookup returns NULL; add guards to prevent this causing an oops. On those systems, the northbridge is accessed through MMIO and the "normal" northbridge enumeration in amd_nb.c doesn't work since we're generating the northbridge ID from the initial APIC ID and the last is not unique on those systems. Long story short, we end up without northbridge descriptors. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Cc: stable@vger.kernel.org # 3.6 Link: http://lkml.kernel.org/r/1349073725-14093-1-git-send-email-daniel@numascale-asia.com [ Boris: beef up commit message ] Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-10-17 11:25:32 -07:00
Naveen N. Rao	450cc20103	x86/mce: Provide boot argument to honour bios-set CMCI threshold The ACPI spec doesn't provide for a way for the bios to pass down recommended thresholds to the OS on a _per-bank_ basis. This patch adds a new boot option, which if passed, tells Linux to use CMCI thresholds set by the bios. As fail-safe, we initialize threshold to 1 if some banks have not been initialized by the bios and warn the user. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2012-09-27 10:08:00 -07:00
Ingo Molnar	f1f6524476	Linux 3.6-rc6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQEcBAABAgAGBQJQVkutAAoJEHm+PkMAQRiGW8sH/36FVQ3zI75QH16AmR++2nMZ BRJGoxcRFMssrXTYVdkMyzygf8b7MZbNEn1qt2g63MNzGaJucPlw5NVL4GLzR+zr x/EglLrTEPCD5el9wJ3ls9iC1soudKQTvC2BjcdUjpoSwHrDM/7GKfbOacE54Wqc C1VHCcg5DWOD7F0RnYT2SQEVCeDODNmcyFdk7Oi4cUicTPJoYWJ9O9MGfBDBok0N M+dXxa9nvsl7EeEKpBKH9vo4TfXn3Gsj6LCRdedvI15ilZjfo8jdHYbSn7KBfQuZ JIKRnqkaQ1JfMFt+M/JJZ1b/+Wrd4HLMmmn5oUmrGGIvhpi32nJfi/97+nSy8iU= =c5gW -----END PGP SIGNATURE----- Merge tag 'v3.6-rc6' into x86/mce Merge Linux v3.6-rc6, to refresh this tree. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-09-19 17:01:25 +02:00
Chen Gong	55babd8f41	x86/mce: Add CMCI poll mode On Intel systems corrected machine check interrupts (CMCI) may be sent to multiple logical processors; possibly to all processors on the affected socket (SDM Volume 3B "15.5.1 CMCI Local APIC Interface"). This means that a persistent error (such as a stuck bit in ECC memory) may cause a storm of interrupts that greatly hinders or prevents forward progress (probably on many processors). To solve this we keep track of the rate at which each processor sees CMCI. If we exceed a threshold, we disable CMCI delivery and switch to polling the machine check banks. If the storm subsides (none of the affected processors see any more errors for a complete poll interval) we re-enable CMCI. [Tony: Added console messages when storm begins/ends and increased storm threshold from 5 to 15 so we have a few more logged entries before we disable interrupts and start dropping reports] Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2012-08-09 11:44:51 -07:00
Tony Luck	4670a300a2	x86/mce: Make cmci_discover() quiet cmci_discover() works out which machine check banks support CMCI, and which of those are shared by multiple logical processors. It uses this information to ensure that exactly one cpu is designated the owner of each bank so that when interrupts are broadcast to multiple cpus, only one of them will look in a shared bank to log the error and clear the bank. At boot time cmci_discover() performs this task silently. But during certain cpu hotplug operations it prints out a set of summary lines like this: CPU 35 MCA banks CMCI:0 CMCI:1 CMCI:3 CMCI:5 CMCI:6 CMCI:7 CMCI:8 CMCI:9 CMCI:10 CMCI:11 CPU 1 MCA banks CMCI:0 CMCI:1 CMCI:3 CPU 39 MCA banks CMCI:0 CMCI:1 CMCI:3 CPU 38 MCA banks CMCI:0 CMCI:1 CMCI:3 CPU 32 MCA banks CMCI:0 CMCI:1 CMCI:3 CPU 37 MCA banks CMCI:0 CMCI:1 CMCI:3 CPU 36 MCA banks CMCI:0 CMCI:1 CMCI:3 CPU 34 MCA banks CMCI:0 CMCI:1 CMCI:3 The value of these messages seems very low. A user might painstakingly cross-check against the data sheet for a processor to ensure that all CMCI supported banks are correctly reported, but this seems improbable. If users really wanted to do this, we should print the information at boot time too. Remove the messages. Signed-off-by: Tony Luck <tony.luck@intel.com>	2012-08-09 10:59:21 -07:00
Thomas Gleixner	1a65f970d1	x86: mce: Remove the frozen cases in the hotplug code No point in having double cases if we can simply mask the FROZEN bit out. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2012-08-03 11:46:50 -07:00
Thomas Gleixner	26c3c283c5	x86: mce: Split timer init Split timer init function into the init and the start part, so the start part can replace the open coded version in CPU_DOWN_FAILED. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2012-08-03 11:46:29 -07:00
Thomas Gleixner	b5975917a3	x86: mce: Serialize mce injection raise_mce() fiddles with global state, but lacks any kind of serialization. Add a mutex around the raise_mce() call, so concurrent writers do not stomp on each other toes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2012-08-03 11:45:56 -07:00

1 2 3 4 5 ...

585 Commits