Assume IP on the stack is valid when either EIPV or RIPV are set.
This influences whether the machine check exception handler decides
to return or panic.
This fixes a test case in the mce-test suite and is more compliant
to the specification.
This currently only makes a difference in a artificial testing
scenario with the mce-test test suite.
Also in addition do not force the EIPV to be valid with the exact
register MSRs, and keep in trust the CS value on stack even if MSR
is available.
[AK: combination of patches from Huang Ying and Hidetoshi Seto, with
new description by me]
[add some description, no code changed - HS]
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
... instead of "Machine check". This is for consistency with the Monarch
panic message.
Based on a report from Ying Huang.
v2: But add a descriptive postfix so that the test suite can distingush.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
On Intel platforms machine check exceptions are always broadcast to
all CPUs. This patch makes the machine check handler synchronize all
these machine checks, elect a Monarch to handle the event and collect
the worst event from all CPUs and then process it first.
This has some advantages:
- When there is a truly data corrupting error the system panics as
quickly as possible. This improves containment of corrupted
data and makes sure the corrupted data never hits stable storage.
- The panics are synchronized and do not reenter the panic code
on multiple CPUs (which currently does not handle this well).
- All the errors are reported. Currently it often happens that
another CPU happens to do the panic first, but reports useless
information (empty machine check) because the real error
happened on another CPU which came in later.
This is a big advantage on Nehalem where the 8 threads per CPU
lead to often the wrong CPU winning the race and dumping
useless information on a machine check. The problem also occurs
in a less severe form on older CPUs.
- The system can detect when no CPUs detected a machine check
and shut down the system. This can happen when one CPU is so
badly hung that that it cannot process a machine check anymore
or when some external agent wants to stop the system by
asserting the machine check pin. This follows Intel hardware
recommendations.
- This matches the recommended error model by the CPU designers.
- The events can be output in true severity order
- When a panic happens on another CPU it makes sure to be actually
be able to process the stop IPI by enabling interrupts.
The code is extremly careful to handle timeouts while waiting
for other CPUs. It can't rely on the normal timing mechanisms
(jiffies, ktime_get) because of its asynchronous/lockless nature,
so it uses own timeouts using ndelay() and a "SPINUNIT"
The timeout is configurable. By default it waits for upto one
second for the other CPUs. This can be also disabled.
From some informal testing AMD systems do not see to broadcast
machine checks, so right now it's always disabled by default on
non Intel CPUs or also on very old Intel systems.
Includes fixes from Ying Huang
Fixed a "ecception" in a comment (H.Seto)
Moved global_nwo reset later based on suggestion from H.Seto
v2: Avoid duplicate messages
[ Impact: feature, fixes long standing problems. ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
In some circumstances multiple CPUs can enter mce_panic() in parallel.
This gives quite confused output because they will all dump the same
machine check buffer.
The other problem is that they would all panic in parallel, but not
process each other's shutdown IPIs because interrupts are disabled.
Detect this situation early on in mce_panic(). On the first CPU
entering will do the panic, the others will just wait to be killed.
For paranoia reasons in case the other CPU dies during the MCE I added
a 5 seconds timeout. If it expires each CPU will panic on its own again.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Machine checks support waking up the mcelog daemon quickly.
The original wake up code for this was pretty ugly, relying on
a idle notifier and a special process flag. The reason it did
it this way is that the machine check handler is not subject
to normal interrupt locking rules so it's not safe
to call wake_up(). Instead it set a process flag
and then either did the wakeup in the syscall return
or in the idle notifier.
This patch adds a new "bootstraping" method as replacement.
The idea is that the handler checks if it's in a state where
it is unsafe to call wake_up(). If it's safe it calls it directly.
When it's not safe -- that is it interrupted in a critical
section with interrupts disables -- it uses a new "self IPI" to trigger
an IPI to its own CPU. This can be done safely because IPI
triggers are atomic with some care. The IPI is raised
once the interrupts are reenabled and can then safely call
wake_up().
When APICs are disabled the event is just queued and will be picked up
eventually by the next polling timer. I think that's a reasonable
compromise, since it should only happen quite rarely.
Contains fixes from Ying Huang.
[ solve conflict on irqinit, make it work on 32bit (entry_arch.h) - HS ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The exception handler should behave differently if the exception is
fatal versus one that can be returned from. In the first case it should
never clear any registers because these need to be preserved
for logging after the next boot. Otherwise it should clear them
on each CPU step by step so that other CPUs sharing the same bank don't
see duplicate events. Otherwise we risk reporting events multiple
times on any CPUs which have shared machine check banks, which
is a common problem on Intel Nehalem which has both SMT (two
CPU threads sharing banks) and shared machine check banks in the uncore.
Determine early in a special pass if any event requires a panic.
This uses the mce_severity() function added earlier.
This is needed for the next patch.
Also fixes a problem together with an earlier patch
that corrected events weren't logged on a fatal MCE.
[ Impact: Feature ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The machine check grading (as in deciding what should be done for a given
register value) has to be done multiple times soon and it's also getting
more complicated.
So it makes sense to consolidate it into a single function. To get smaller
and more straight forward and possibly more extensible code I opted towards
a new table driven method. The various rules are put into a table
when is then executed by a very simple interpreter.
The grading engine is in a new file mce-severity.c. I also added a private
include file mce-internal.h, because mce.h is already a bit too cluttered.
This is dead code right now, but will be used in followon patches.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Previously mce_panic used a simple heuristic to avoid printing
old so far unreported machine check events on a mce panic. This worked
by comparing the TSC value at the start of the machine check handler
with the event time stamp and only printing newer ones.
This has a couple of issues, in particular on systems where the TSC
is not fully synchronized between CPUs it could lose events or print
old ones.
It is also problematic with full system synchronization as it is
added by the next patch.
Remove the TSC heuristic and instead replace it with a simple heuristic
to print corrected errors first and after that uncorrected errors
and finally the worst machine check as determined by the machine
check handler.
This simplifies the code because there is no need to pass the
original TSC value around.
Contains fixes from Ying Huang
[ Impact: bug fix, cleanup ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Ying Huang <ying.huang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Normally the machine check handler ignores corrected errors and leaves
them to machine_check_poll(). But when panicing mcp won't run, so
log all errors.
Note: this can still miss some cases until the "early no way out"
patch later is applied too.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Experience has shown that struct mce which is used to pass an machine
check to the user space daemon currently a few limitations. Also some
data which is useful to print at panic level is also missing.
This patch addresses most of them. The same information is also
printed out together with mce panic.
struct mce can be painlessly extended in a compatible way, the mcelog
user space code just ignores additional fields with a warning.
- It doesn't provide a wall time timestamp. There have been a few
complaints about that. Fix that by adding a 64bit time_t
- It doesn't provide the exact CPU identification. This makes
it awkward for mcelog to decode the event correctly, especially
when there are variations in the supported MCE codes on different
CPU models or when mcelog is running on a different host after a panic.
Previously the administrator had to specify the correct CPU
when mcelog ran on a different host, but with the more variation
in machine checks now it's better to auto detect that.
It's also useful for more detailed analysis of CPU events.
Pass CPUID 1.EAX and the cpu vendor (as encoded in processor.h) instead.
- Socket ID and initial APIC ID are useful to report because they
allow to identify the failing CPU in some (not all) cases.
This is also especially useful for the panic situation.
This addresses one of the complaints from Thomas Gleixner earlier.
- The MCG capabilities MSR needs to be reported for some advanced
error processing in mcelog
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The old struct mce had a limitation to 256 CPUs. But x86 Linux supports
more than that now with x2apic. Add a new field extcpu to report the
extended number.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This makes it easier for tools who want to extract the mcelog out of
crash images or memory dumps to adapt to changing struct mce size.
The length field replaces padding, so it's fully compatible.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Keep a count of the machine check polls (or CMCI events) in
/proc/interrupts.
Andi needs this for debugging, but it's also useful in general
to see what's going in by the kernel.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Useful for debugging, but it's also good general policy
to have a counter for all special interrupts there. This makes it easier
to diagnose where a CPU is spending its time.
[ Impact: feature, debugging tool ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Merge reason: arch/x86/kernel/irqinit_{32,64}.c unified in irq/numa
and modified in x86/mce3; this merge resolves the conflict.
Conflicts:
arch/x86/kernel/irqinit.c
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Merge reason: irq/numa didnt build because this commit:
2759c32: x86: don't call read_apic_id if !cpu_has_apic
Had a dependency on x86/cpufeature changes. Pull in that
(small) branch to fix the dependency.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/mips/sibyte/bcm1480/irq.c
arch/mips/sibyte/sb1250/irq.c
Merge reason: we gathered a few conflicts plus update to latest upstream fixes.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add fan_max description.
Add fan limit alarm 'max_alarm' to the alarm section.
Signed-off-by: Christian Engelmayer <christian.engelmayer@frequentis.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
The remove function uses __devexit, so the .remove assignment needs
__devexit_p() to fix a build error with hotplug disabled.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Several EISA device IDs for 3c509 family network cards are missing from
the driver, making the cards unusable in their EISA mode. Here's a fix to
add them based on the EISA configuration files distributed by 3Com and our
eisa.ids database.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds me as the maintainer of the CPMAC (AR7)
Ethernet driver.
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
fix the following 'make headers_check' warnings:
usr/include/linux/net_dropmon.h:7: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
fix the following 'make headers_check' warnings:
usr/include/linux/auto_fs.h:17: include of <linux/types.h> is preferred over <asm/types.h>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
We need to explicitly mark words 85-87 as valid ones since
firmware doesn't do it.
This should fix support for LBA48 and FLUSH CACHE [EXT] command
which stopped working after we applied more strict checking of
identify words in:
commit 942dcd85bf
("ide: idedisk_supports_lba48() -> ata_id_lba48_enabled()")
and
commit 4b58f17d7c
("ide: ide_id_has_flush_cache() -> ata_id_flush_enabled()")
Reported-and-tested-by: "Trevor Hemsley" <trevor.hemsley@ntlworld.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
fsldma: Fix compile warnings
fsldma: fix memory leak on error path in fsl_dma_prep_memcpy()
fsldma: snooping is not enabled for last entry in descriptor chain
fsldma: fix infinite loop on multi-descriptor DMA chain completion
fsldma: fix "DMA halt timeout!" errors
fsldma: fix check on potential fdev->chan[] overflow
fsldma: update mailling list address in MAINTAINERS
The nilfs_cpfile_delete_checkpoints() wrongly skips brelse() for the
header block of checkpoint file in case of errors. This fixes the
leak bug.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Gary Lin reports that a new device id needs to be added to the atl1e in
order to get some new Asus hardware to work properly.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the transmit queue gets full we enable interrupts for TX completions
There was a race that we handled the TX queue both from the interrupt context
and from the transmit function. Using "spin_trylock_irq()" ensures this
doesn't happen.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/built-in.o: In function `intel_opregion_init':
(.text+0x9d540): undefined reference to `acpi_video_register'
v2: move under DRM_I915 from DRM_I915_KMS
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Commit 4973b22a ("ACPI processor: reset the throttling state once it's
invalid") introduced a new warning which prints a spurious newline.
The ACPI_WARNING macro that is used already takes care of adding a
newline, after adding ACPI_CA_VERSION to the message. Remove the newline
to avoid the message getting split into two lines.
Signed-off-by: Frans Pop <elendil@planet.nl>
Signed-off-by: Len Brown <len.brown@intel.com>
Currently acpi_video_exit() is exported as well as using __exit which causes:
WARNING: drivers/acpi/video.o(__ksymtab+0x0): Section mismatch in reference from the variable __ksymtab_acpi_video_exit to the function .exit.text:acpi_video_exit()
The symbol acpi_video_exit is exported and annotated __exit
Fix this by removing the __exit annotation of acpi_video_exit or drop the export.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
When BIOS SETUP is changed to disable EIST, some BIOS
hand the OS an un-initialized _PSS:
Name (_PSS, Package (0x06)
{
Package (0x06)
{
0x80000000, // frequency [MHz]
0x80000000, // power [mW]
0x80000000, // latency [us]
0x80000000, // BM latency [us]
0x80000000, // control
0x80000000 // status
},
...
These are outrageous values for frequency,
power and latency, raising the question where to draw
the line between legal and illegal. We tend to survive
garbage in the power and latency fields, but we can BUG_ON
when garbage is in the frequency field.
Cpufreq multiplies the frequency by 1000 and stores it in a u32 KHz.
So disregard a _PSS with a frequency so large
that it can't be represented by cpufreq.
https://bugzilla.redhat.com/show_bug.cgi?id=500311
Signed-off-by: Len Brown <len.brown@intel.com>
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] update mach-types
[ARM] Add cmpxchg support for ARMv6+ systems (v5)
[ARM] barriers: improve xchg, bitops and atomic SMP barriers
Gemini: Fix SRAM/ROM location after memory swap
MAINTAINER: Add F: entries for Gemini and FA526
[ARM] disable NX support for OABI-supporting kernels
[ARM] add coherent DMA mask for mv643xx_eth
[ARM] pxa/palm: fix PalmLD/T5/TX AC97 MFP
[ARM] pxa: add parameter to clksrc_read() for pxa168/910
[ARM] pxa: fix the incorrectly defined drive strength macros for pxa{168,910}
[ARM] Orion: Remove explicit name for platform device resources
[ARM] Kirkwood: Correct MPP for SATA activity/presence LEDs of QNAP TS-119/TS-219.
[ARM] pxa/ezx: fix pin configuration for low power mode
[ARM] pxa/spitz: provide spitz_ohci_exit() that unregisters USB_HOST GPIO
[ARM] pxa: enable GPIO receivers after configuring pins
[ARM] pxa: allow gpio_reset drive high during normal work
[ARM] pxa: save/restore PGSR on suspend/resume.
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6:
Revert "USB: Correct Makefile to make isp1760 buildable"
usb-serial: fix crash when sub-driver updates firmware
USB: isp1760: urb_dequeue doesn't always find the urbs
USB: Yet another Conexant Clone to add to cdc-acm.c
USB: atmel_usb_udc: Use kzalloc() to allocate ep structures
USB: atmel-usba-udc : fix control out requests.