In removing the presence of <linux/module.h> from some of the
more common <linux/something.h> files, this implict include
of <linux/topology.h> was uncovered.
CC arch/x86/kernel/vsyscall_64.o
arch/x86/kernel/vsyscall_64.c: In function ‘vsyscall_set_cpu’:
arch/x86/kernel/vsyscall_64.c:259: error: implicit declaration of function ‘cpu_to_node’
Explicitly call it out so the cleanup can take place.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Drop the edac_mce custom hook in favor of the generic notifier
mechanism. Also, do not log the error to mcelog if the notified agent
was able to decode it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* 'x86-rdrand-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, random: Verify RDRAND functionality and allow it to be disabled
x86, random: Architectural inlines to get random integers with RDRAND
random: Add support for architectural random hooks
Fix up trivial conflicts in drivers/char/random.c: the architectural
random hooks touched "get_random_int()" that was simplified to use MD5
and not do the keyptr thing any more (see commit 6e5714eaf77d: "net:
Compute protocol sequence numbers and fragment IDs using MD5").
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, microcode, AMD: Add microcode revision to /proc/cpuinfo
x86, microcode: Correct microcode revision format
coretemp: Get microcode revision from cpu_data
x86, intel: Use c->microcode for Atom errata check
x86, intel: Output microcode revision in /proc/cpuinfo
x86, microcode: Don't request microcode from userspace unnecessarily
Fix up trivial conflicts in arch/x86/kernel/cpu/amd.c (conflict between
moving AMD BSP code to cpu_dev helper function and adding AMD microcode
revision to /proc/cpuinfo code)
* 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Hyper-V: Integrate the clocksource with Hyper-V detection code
Fix up conflicts in drivers/staging/hv/Makefile manually (some of the hv
code has moved out of staging to drivers/hv/)
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, amd: Include linux/elf.h since we use stuff from asm/elf.h
x86: cache_info: Update calculation of AMD L3 cache indices
x86: cache_info: Kill the atomic allocation in amd_init_l3_cache()
x86: cache_info: Kill the moronic shadow struct
x86: cache_info: Remove bogus free of amd_l3_cache data
x86, amd: Include elf.h explicitly, prepare the code for the module.h split
x86-32, amd: Move va_align definition to unbreak 32-bit build
x86, amd: Move BSP code to cpu_dev helper
x86: Add a BSP cpu_dev helper
x86, amd: Avoid cache aliasing penalties on AMD family 15h
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86-64, unistd: Remove bogus __IGNORE_getcpu
x86, mm, trivial: Remove unnecessary get_order() in free_thread_info()
x86, cleanup: Remove unneeded version.h include from arch/x86/
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86-64: Fix CFI data for interrupt frames
x86-64: Don't apply destructive erratum workaround on unaffected CPUs
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/irq: Standardize on CONFIG_SPARSE_IRQ=y
x86, ioapic: Clean up ioapic/apic_id usage
x86, ioapic: Factor out print_IO_APIC() to only print one io apic
x86, ioapic: Print out irte with right ioapic index
x86, ioapic: Split up setup_ioapic_entry()
x86, ioapic: Pass struct irq_attr * to setup_ioapic_irq()
apic, i386/bigsmp: Fix false warnings regarding logical APIC ID mismatches
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (121 commits)
perf symbols: Increase symbol KSYM_NAME_LEN size
perf hists browser: Refuse 'a' hotkey on non symbolic views
perf ui browser: Use libslang to read keys
perf tools: Fix tracing info recording
perf hists browser: Elide DSO column when it is set to just one DSO, ditto for threads
perf hists: Don't consider filtered entries when calculating column widths
perf hists: Don't decay total_period for filtered entries
perf hists browser: Honour symbol_conf.show_{nr_samples,total_period}
perf hists browser: Do not exit on tab key with single event
perf annotate browser: Don't change selection line when returning from callq
perf tools: handle endianness of feature bitmap
perf tools: Add prelink suggestion to dso update message
perf script: Fix unknown feature comment
perf hists browser: Apply the dso and thread filters when merging new batches
perf hists: Move the dso and thread filters from hist_browser
perf ui browser: Honour the xterm colors
perf top tui: Give color hints just on the percentage, like on --stdio
perf ui browser: Make the colors configurable and change the defaults
perf tui: Remove unneeded call to newtCls on startup
perf hists: Don't format the percentage on hist_entry__snprintf
...
Fix up conflicts in arch/x86/kernel/kprobes.c manually.
Ingo's tree did the insane "add volatile to const array", which just
doesn't make sense ("volatile const"?). But we could remove the const
*and* make the array volatile to make doubly sure that gcc doesn't
optimize it away..
Also fix up kernel/trace/ring_buffer.c non-data-conflicts manually: the
reader_lock has been turned into a raw lock by the core locking merge,
and there was a new user of it introduced in this perf core merge. Make
sure that new use also uses the raw accessor functions.
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
rtmutex: Add missing rcu_read_unlock() in debug_rt_mutex_print_deadlock()
lockdep: Comment all warnings
lib: atomic64: Change the type of local lock to raw_spinlock_t
locking, lib/atomic64: Annotate atomic64_lock::lock as raw
locking, x86, iommu: Annotate qi->q_lock as raw
locking, x86, iommu: Annotate irq_2_ir_lock as raw
locking, x86, iommu: Annotate iommu->register_lock as raw
locking, dma, ipu: Annotate bank_lock as raw
locking, ARM: Annotate low level hw locks as raw
locking, drivers/dca: Annotate dca_lock as raw
locking, powerpc: Annotate uic->lock as raw
locking, x86: mce: Annotate cmci_discover_lock as raw
locking, ACPI: Annotate c3_lock as raw
locking, oprofile: Annotate oprofilefs lock as raw
locking, video: Annotate vga console lock as raw
locking, latencytop: Annotate latency_lock as raw
locking, timer_stats: Annotate table_lock as raw
locking, rwsem: Annotate inner lock as raw
locking, semaphores: Annotate inner lock as raw
locking, sched: Annotate thread_group_cputimer as raw
...
Fix up conflicts in kernel/posix-cpu-timers.c manually: making
cputimer->cputime a raw lock conflicted with the ABBA fix in commit
bcd5cff721 ("cputimer: Cure lock inversion").
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, ioapic: Consolidate the explicit EOI code
x86, ioapic: Restore the mask bit correctly in eoi_ioapic_irq()
x86, kdump, ioapic: Reset remote-IRR in clear_IO_APIC
iommu: Rename the DMAR and INTR_REMAP config options
x86, ioapic: Define irq_remap_modify_chip_defaults()
x86, msi, intr-remap: Use the ioapic set affinity routine
iommu: Cleanup ifdefs in detect_intel_iommu()
iommu: No need to set dmar_disabled in check_zero_address()
iommu: Move IOMMU specific code to intel-iommu.c
intr_remap: Call dmar_dev_scope_init() explicitly
x86, x2apic: Enable the bios request for x2apic optout
This allows jump-label entries to be cheaply updated on code which is
not yet live.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
It is no longer used.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (59 commits)
MAINTAINERS: linux-m32r is moderated for non-subscribers
linux@lists.openrisc.net is moderated for non-subscribers
Drop default from "DM365 codec select" choice
parisc: Kconfig: cleanup Kernel page size default
Kconfig: remove redundant CONFIG_ prefix on two symbols
cris: remove arch/cris/arch-v32/lib/nand_init.S
microblaze: add missing CONFIG_ prefixes
h8300: drop puzzling Kconfig dependencies
MAINTAINERS: microblaze-uclinux@itee.uq.edu.au is moderated for non-subscribers
tty: drop superfluous dependency in Kconfig
ARM: mxc: fix Kconfig typo 'i.MX51'
Fix file references in Kconfig files
aic7xxx: fix Kconfig references to READMEs
Fix file references in drivers/ide/
thinkpad_acpi: Fix printk typo 'bluestooth'
bcmring: drop commented out line in Kconfig
btmrvl_sdio: fix typo 'btmrvl_sdio_sd6888'
doc: raw1394: Trivial typo fix
CIFS: Don't free volume_info->UNC until we are entirely done with it.
treewide: Correct spelling of successfully in comments
...
When compiling an i386_defconfig kernel with gcc-4.6.1-9.fc15.i686, I
noticed a warning about the asm operand for test_bit in kprobes'
can_boost. I discovered that this caused only the first long of
twobyte_is_boostable[] to be output.
Jakub filed and fixed gcc PR50571 to correct the warning and this output
issue. But to solve it for less current gcc, we can make kprobes'
twobyte_is_boostable[] non-const, and it won't be optimized out.
Before:
CC arch/x86/kernel/kprobes.o
In file included from include/linux/bitops.h:22:0,
from include/linux/kernel.h:17,
from [...]/arch/x86/include/asm/percpu.h:44,
from [...]/arch/x86/include/asm/current.h:5,
from [...]/arch/x86/include/asm/processor.h:15,
from [...]/arch/x86/include/asm/atomic.h:6,
from include/linux/atomic.h:4,
from include/linux/mutex.h:18,
from include/linux/notifier.h:13,
from include/linux/kprobes.h:34,
from arch/x86/kernel/kprobes.c:43:
[...]/arch/x86/include/asm/bitops.h: In function ‘can_boost.part.1’:
[...]/arch/x86/include/asm/bitops.h:319:2: warning: use of memory input
without lvalue in asm operand 1 is deprecated [enabled by default]
$ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt
551: 0f a3 05 00 00 00 00 bt %eax,0x0
554: R_386_32 .rodata.cst4
$ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o
arch/x86/kernel/kprobes.o: file format elf32-i386
Contents of section .data:
0000 48000000 00000000 00000000 00000000 H...............
Contents of section .rodata.cst4:
0000 4c030000 L...
Only a single long of twobyte_is_boostable[] is in the object file.
After, without the const on twobyte_is_boostable:
$ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt
551: 0f a3 05 20 00 00 00 bt %eax,0x20
554: R_386_32 .data
$ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o
arch/x86/kernel/kprobes.o: file format elf32-i386
Contents of section .data:
0000 48000000 00000000 00000000 00000000 H...............
0010 00000000 00000000 00000000 00000000 ................
0020 4c030000 0f000200 ffff0000 ffcff0c0 L...............
0030 0000ffff 3bbbfff8 03ff2ebb 26bb2e77 ....;.......&..w
Now all 32 bytes are output into .data instead.
Signed-off-by: Josh Stone <jistone@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Enable microcode revision output for AMD after 506ed6b53e ("x86,
intel: Output microcode revision in /proc/cpuinfo") did it for Intel.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
506ed6b53e ("x86, intel: Output microcode revision in /proc/cpuinfo")
added microcode revision format to /proc/cpuinfo and the MCE handler in
decimal format but both AMD and Intel patch levels are handled as hex
numbers. Fix it.
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
When compiling an i386_defconfig kernel with
gcc-4.6.1-9.fc15.i686, I noticed a warning about the asm operand
for test_bit in kprobes' can_boost. I discovered that this
caused only the first long of twobyte_is_boostable[] to be
output.
Jakub filed and fixed gcc PR50571 to correct the warning and
this output issue. But to solve it for less current gcc, we can
make kprobes' twobyte_is_boostable[] volatile, and it won't be
optimized out.
Before:
CC arch/x86/kernel/kprobes.o
In file included from include/linux/bitops.h:22:0,
from include/linux/kernel.h:17,
from [...]/arch/x86/include/asm/percpu.h:44,
from [...]/arch/x86/include/asm/current.h:5,
from [...]/arch/x86/include/asm/processor.h:15,
from [...]/arch/x86/include/asm/atomic.h:6,
from include/linux/atomic.h:4,
from include/linux/mutex.h:18,
from include/linux/notifier.h:13,
from include/linux/kprobes.h:34,
from arch/x86/kernel/kprobes.c:43:
[...]/arch/x86/include/asm/bitops.h: In function ‘can_boost.part.1’:
[...]/arch/x86/include/asm/bitops.h:319:2: warning: use of memory input without lvalue in asm operand 1 is deprecated [enabled by default]
$ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt
551: 0f a3 05 00 00 00 00 bt %eax,0x0
554: R_386_32 .rodata.cst4
$ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o
arch/x86/kernel/kprobes.o: file format elf32-i386
Contents of section .data:
0000 48000000 00000000 00000000 00000000 H...............
Contents of section .rodata.cst4:
0000 4c030000 L...
Only a single long of twobyte_is_boostable[] is in the object
file.
After, with volatile:
$ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt
551: 0f a3 05 20 00 00 00 bt %eax,0x20
554: R_386_32 .data
$ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o
arch/x86/kernel/kprobes.o: file format elf32-i386
Contents of section .data:
0000 48000000 00000000 00000000 00000000 H...............
0010 00000000 00000000 00000000 00000000 ................
0020 4c030000 0f000200 ffff0000 ffcff0c0 L...............
0030 0000ffff 3bbbfff8 03ff2ebb 26bb2e77 ....;.......&..w
Now all 32 bytes are output into .data instead.
Signed-off-by: Josh Stone <jistone@redhat.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Link: http://lkml.kernel.org/r/1318899645-4068-1-git-send-email-jistone@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that the cpu update level is available the Atom PSE errata
check can use it directly without reading the MSR again.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1318466795-7393-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I got a request to make it easier to determine the microcode
update level on Intel CPUs. This patch adds a new "microcode"
field to /proc/cpuinfo.
The microcode level is also outputed on fatal machine checks
together with the other CPUID model information.
I removed the respective code from the microcode update driver,
it just reads the field from cpu_data. Also when the microcode
is updated it fills in the new values too.
I had to add a memory barrier to native_cpuid to prevent it
being optimized away when the result is not used.
This turns out to clean up further code which already got this
information manually. This is done in followon patches.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1318466795-7393-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Requesting the microcode from userspace *every time* when onlining CPUs
(during a CPU hotplug operation) is unnecessary. Thus, ensure that
once the kernel gets the microcode after booting, it is not freed nor
invalidated when a CPU goes offline, so that it can be reused when that
CPU comes back online, without requesting userspace for it again. As a
result, the CPU hotplug operations become faster as well.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Link: http://lkml.kernel.org/r/4E91F908.5010006@linux.vnet.ibm.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Sparseirq got introduced in v2.6.28 and Thomas did a huge cleanup
around v2.6.38 that eliminated basically all disadvantages
of it.
So we can remove non-sparseirq support now and simplify
our IRQ degrees of freedom a bit.
Suggested-and-acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/4E95E21D.6090200@oracle.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
While looking at the code, apic_id sometime is referred to index
of ioapic, but sometime is used for phys apic id. and some even
use apic for real apic id. It is very confusing.
So try to limit apic_id or ioapic_id to be real apic id for
ioapic, and use ioapic_idx for ioapic index in the array.
-v2: Suggested by Ingo, use ioapic_idx consistently, instead of ioapic
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/4E9542DC.3090509@oracle.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It is getting too big after the interrupt remaping entries debug
print out was added.
Original print_IO_APIC() becomes print_IO_APICs().
New print_IO_APIC() will only print one ioapic's registers
As a side-effect this clean-up also made checkpatch.pl happier.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/4E9542D3.5000008@oracle.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo pointed out that setup_ioapic_entry() is way too big now.
Split the intr-remap code out into setup_ir_ioapic_entry().
Also pass struct io_apic_irq_attr * instead of 5 parameters
in those two functions.
At last in setup_ir_ioapic_entry() we don't need to panic.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/4E9542BB.4070807@oracle.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Do not expand that struct, and just pass pointer to reduce the
number of parameters in related functions.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/4E9542B1.7050800@oracle.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This UML breakage:
linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790
linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790
Is caused by commit 3ae36655 ("x86-64: Rework vsyscall emulation and add
vsyscall= parameter") - the vsyscall emulation code is not fully cooked
yet as UML relies on some rather fragile SIGSEGV semantics.
Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default
to vsyscall=native for now, this patch implements that.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Andrew Lutomirski <luto@mit.edu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/20111005214047.GE14406@localhost.pp.htv.fi
Signed-off-by: Ingo Molnar <mingo@elte.hu>
nmi.c needs an #include <linux/mca.h>:
arch/x86/kernel/nmi.c: In function ‘unknown_nmi_error’:
arch/x86/kernel/nmi.c:286:6: error: ‘MCA_bus’ undeclared (first use in this function)
arch/x86/kernel/nmi.c:286:6: note: each undeclared identifier is reported only once for each function it appears in
Another one is the hpwdt driver:
drivers/watchdog/hpwdt.c:507:9: error: ‘NMI_DONE’ undeclared (first use in this function)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch implements IBS feature detection and initialzation. The
code is shared between perf and oprofile. If IBS is available on the
system for perf, a pmu is setup.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1316597423-25723-3-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Moving IBS macros from oprofile to <asm/perf_event.h> to make it
available to perf. No additional changes.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1316597423-25723-2-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that the NMI handler are broken into lists, increment the appropriate
stats for each list. This allows us to see what is going on when they
get printed out in the next patch.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1317409584-23662-6-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Previous patches allow the NMI subsystem to process multipe NMI events
in one NMI. As previously discussed this can cause issues when an event
triggered another NMI but is processed in the current NMI. This causes the
next NMI to go unprocessed and become an 'unknown' NMI.
To handle this, we first have to flag whether or not the NMI handler handled
more than one event or not. If it did, then there exists a chance that
the next NMI might be already processed. Once the NMI is flagged as a
candidate to be swallowed, we next look for a back-to-back NMI condition.
This is determined by looking at the %rip from pt_regs. If it is the same
as the previous NMI, it is assumed the cpu did not have a chance to jump
back into a non-NMI context and execute code and instead handled another NMI.
If both of those conditions are true then we will swallow any unknown NMI.
There still exists a chance that we accidentally swallow a real unknown NMI,
but for now things seem better.
An optimization has also been added to the nmi notifier rountine. Because x86
can latch up to one NMI while currently processing an NMI, we don't have to
worry about executing _all_ the handlers in a standalone NMI. The idea is
if multiple NMIs come in, the second NMI will represent them. For those
back-to-back NMI cases, we have the potentail to drop NMIs. Therefore only
execute all the handlers in the second half of a detected back-to-back NMI.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1317409584-23662-5-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Just convert all the files that have an nmi handler to the new routines.
Most of it is straight forward conversion. A couple of places needed some
tweaking like kgdb which separates the debug notifier from the nmi handler
and mce removes a call to notify_die.
[Thanks to Ying for finding out the history behind that mce call
https://lkml.org/lkml/2010/5/27/114
And Boris responding that he would like to remove that call because of it
https://lkml.org/lkml/2011/9/21/163]
The things that get converted are the registeration/unregistration routines
and the nmi handler itself has its args changed along with code removal
to check which list it is on (most are on one NMI list except for kgdb
which has both an NMI routine and an NMI Unknown routine).
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Corey Minyard <minyard@acm.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: Jack Steiner <steiner@sgi.com>
Link: http://lkml.kernel.org/r/1317409584-23662-4-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The NMI handlers used to rely on the notifier infrastructure. This worked
great until we wanted to support handling multiple events better.
One of the key ideas to the nmi handling is to process _all_ the handlers for
each NMI. The reason behind this switch is because NMIs are edge triggered.
If enough NMIs are triggered, then they could be lost because the cpu can
only latch at most one NMI (besides the one currently being processed).
In order to deal with this we have decided to process all the NMI handlers
for each NMI. This allows the handlers to determine if they recieved an
event or not (the ones that can not determine this will be left to fend
for themselves on the unknown NMI list).
As a result of this change it is now possible to have an extra NMI that
was destined to be received for an already processed event. Because the
event was processed in the previous NMI, this NMI gets dropped and becomes
an 'unknown' NMI. This of course will cause printks that scare people.
However, we prefer to have extra NMIs as opposed to losing NMIs and as such
are have developed a basic mechanism to catch most of them. That will be
a later patch.
To accomplish this idea, I unhooked the nmi handlers from the notifier
routines and created a new mechanism loosely based on doIRQ. The reason
for this is the notifier routines have a couple of shortcomings. One we
could't guarantee all future NMI handlers used NOTIFY_OK instead of
NOTIFY_STOP. Second, we couldn't keep track of the number of events being
handled in each routine (most only handle one, perf can handle more than one).
Third, I wanted to eventually display which nmi handlers are registered in
the system in /proc/interrupts to help see who is generating NMIs.
The patch below just implements the new infrastructure but doesn't wire it up
yet (that is the next patch). Its design is based on doIRQ structs and the
atomic notifier routines. So the rcu stuff in the patch isn't entirely untested
(as the notifier routines have soaked it) but it should be double checked in
case I copied the code wrong.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1317409584-23662-3-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The nmi stuff is changing a lot and adding more functionality. Split it
out from the traps.c file so it doesn't continue to pollute that file.
This makes it easier to find and expand all the future nmi related work.
No real functional changes here.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1317409584-23662-2-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Intel does not have guest/host-only bit in perf counters like AMD
does. To support GO/HO bits KVM needs to switch EVENTSELn values
(or PERF_GLOBAL_CTRL if available) at a guest entry. If a counter is
configured to count only in a guest mode it stays disabled in a host,
but VMX is configured to switch it to enabled value during guest entry.
This patch adds GO/HO tracking to Intel perf code and provides interface
for KVM to get a list of MSRs that need to be switched on a guest entry.
Only cpus with architectural PMU (v1 or later) are supported with this
patch. To my knowledge there is not p6 models with VMX but without
architectural PMU and p4 with VMX are rare and the interface is general
enough to support them if need arise.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1317816084-18026-7-git-send-email-gleb@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The AMD perf-counters support counting in guest or host-mode
only. Make use of that feature when user-space specified
guest/host-mode only counting.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1317816084-18026-3-git-send-email-gleb@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The patch titled "x86: Don't use frame pointer to save old stack
on irq entry" did not properly adjust CFI directives, so this
patch is a follow-up to that one.
With the old stack pointer no longer stored in a callee-saved
register (plus some offset), we now have to use a CFA expression
to describe the memory location where it is being found. This
requires the use of .cfi_escape (allowing arbitrary byte streams
to be emitted into .eh_frame), as there is no
.cfi_def_cfa_expression (which also cannot reasonably be
expected, as it would require a full expression parser).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/4E8360200200007800058467@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
These warnings (generally one per CPU) are a result of
initializing x86_cpu_to_logical_apicid while apic_default is
still in use, but the check in setup_local_APIC() being done
when apic_bigsmp was already used as an override in
default_setup_apic_routing():
Overriding APIC driver with bigsmp
Enabling APIC mode: Physflat. Using 5 I/O APICs
------------[ cut here ]------------
WARNING: at .../arch/x86/kernel/apic/apic.c:1239
...
CPU 1 irqstacks, hard=f1c9a000 soft=f1c9c000
Booting Node 0, Processors #1
smpboot cpu 1: start_ip = 9e000
Initializing CPU#1
------------[ cut here ]------------
WARNING: at .../arch/x86/kernel/apic/apic.c:1239
setup_local_APIC+0x137/0x46b() Hardware name: ...
CPU1 logical APIC ID: 2 != 8
...
Fix this (for the time being, i.e. until
x86_32_early_logical_apicid() will get removed again, as Tejun
says ought to be possible) by overriding the previously stored
values at the point where the APIC driver gets overridden.
v2: Move this and the pre-existing override logic into
arch/x86/kernel/apic/bigsmp_32.c.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: <stable@kernel.org> (2.6.39 and onwards)
Link: http://lkml.kernel.org/r/4E835D16020000780005844C@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After merging the moduleh tree, today's linux-next build (x86_64
allmodconfig) failed like this:
arch/x86/kernel/sys_x86_64.c:28:10: warning: 'enum align_flags' declared inside parameter list
arch/x86/kernel/sys_x86_64.c:28:10: warning: its scope is only this definition or declaration, which is probably not what you
want arch/x86/kernel/sys_x86_64.c:28:22: error: parameter 3 ('flags') has incomplete type
[...]
Presumably caused by the module.h split interacting with a
new commit dfb09f9b7a ("x86, amd: Avoid cache aliasing penalties
on AMD family 15h") from the x8 tree.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/20110928174214.17a58be15d84d67c185930e1@canb.auug.org.au
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix (rare) build error by adding <asm/apicdef.h> header file:
arch/x86/kernel/cpu/perf_event_amd.c:350:2: error: 'BAD_APICID' undeclared (first use in this function)
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Andre Przywara <andre.przywara@amd.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: http://lkml.kernel.org/r/4E820138.90301@xenotime.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There are numerous broken references to Documentation files (in other
Documentation files, in comments, etc.). These broken references are
caused by typo's in the references, and by renames or removals of the
Documentation files. Some broken references are simply odd.
Fix these broken references, sometimes by dropping the irrelevant text
they were part of.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The CPU support for perf events on x86 was implemented via included C files
with #ifdefs. Clean this up by creating a new header file and compiling
the vendor-specific files as needed.
Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1314747665-2090-1-git-send-email-kjwinchester@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
A deadlock was introduced on x86 in commit ef68c8f87e ("x86:
Serialize EFI time accesses on rtc_lock") because efi_get_time()
and friends can be called with rtc_lock already held by
read_persistent_time(), e.g.:
timekeeping_init()
read_persistent_clock() <-- acquire rtc_lock
efi_get_time()
phys_efi_get_time() <-- acquire rtc_lock <DEADLOCK>
To fix this let's push the locking down into the get_wallclock()
and set_wallclock() implementations. Only the clock
implementations that access the x86 RTC directly need to acquire
rtc_lock, so it makes sense to push the locking down into the
rtc, vrtc and efi code.
The virtualization implementations don't require rtc_lock to be
held because they provide their own serialization.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Acked-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Avi Kivity <avi@redhat.com> [for the virtualization aspect]
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is a workaround for a UV2 hub bug that affects the format of system
global addresses.
The GRU API for UV2 was inadvertently broken by a hardware change. The
format of the physical address used for TLB dropins and for addresses used
with instructions running in unmapped mode has changed. This change was
not documented and became apparent only when diags failed running on
system simulators.
For UV1, TLB and GRU instruction physical addresses are identical to
socket physical addresses (although high NASID bits must be OR'ed into the
address).
For UV2, socket physical addresses need to be converted. The NODE portion
of the physical address needs to be shifted so that the low bit is in bit
39 or bit 40, depending on an MMR value.
It is not yet clear if this bug will be fixed in a silicon respin. If it
is fixed, the hub revision will be incremented & the workaround disabled.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
For older IO-APIC's, we were clearing the remote-IRR by changing
the RTE trigger mode to edge and then back to level. We wanted
to mask the RTE during this process, so we were essentially
doing mask+edge and then to unmask+level.
As part of the commit ca64c47cec,
we moved this EOI process earlier where the IO-APIC RTE is
masked. So we were wrongly unmasking it in the eoi_ioapic_irq().
So change the remote-IRR clear sequence in eoi_ioapic_irq() to
mask + edge and then restore the previous RTE entry which will
restore the mask status as well as the level trigger.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Rafael Wysocki <rjw@novell.com>
Cc: lchiquitto@novell.com
Cc: jbeulich@novell.com
Cc: yinghai@kernel.org
Link: http://lkml.kernel.org/r/20110825190657.210286410@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In the kdump scenario mentioned below, we can have a case where
the device using level triggered interrupt will not generate any
interrupts in the kdump kernel.
1. IO-APIC sends a level triggered interrupt to the CPU's local APIC.
2. Kernel crashed before the CPU services this interrupt, leaving
the remote-IRR in the IO-APIC set.
3. kdump kernel boot sequence does clear_IO_APIC() as part of IO-APIC
initialization. But this fails to reset remote-IRR bit of the
IO-APIC RTE as the remote-IRR bit is read-only.
4. Device using that level triggered entry can't generate any
more interrupts because of the remote-IRR bit.
In clear_IO_APIC_pin(), check if the remote-IRR bit is set and if
so do an explicit attempt to clear it (by doing EOI write on
modern io-apic's and changing trigger mode to edge/level on
older io-apic's). Also before doing the explicit EOI to the
io-apic, ensure that the trigger mode is indeed set to level.
This will enable the explicit EOI to the io-apic to reset the
remote-IRR bit.
Tested-by: Leonardo Chiquitto <lchiquitto@novell.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Fixes: https://bugzilla.novell.com/show_bug.cgi?id=701686
Cc: Rafael Wysocki <rjw@novell.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Thomas Renninger <trenn@suse.de>
Cc: jbeulich@novell.com
Cc: yinghai@kernel.org
Link: http://lkml.kernel.org/r/20110825190657.157502602@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On the platforms which are x2apic and interrupt-remapping
capable, Linux kernel is enabling x2apic even if the BIOS
doesn't. This is to take advantage of the features that x2apic
brings in.
Some of the OEM platforms are running into issues because of
this, as their bios is not x2apic aware. For example, this was
resulting in interrupt migration issues on one of the platforms.
Also if the BIOS SMI handling uses APIC interface to send SMI's,
then the BIOS need to be aware of x2apic mode that OS has
enabled.
On some of these platforms, BIOS doesn't have a HW mechanism to
turnoff the x2apic feature to prevent OS from enabling it.
To resolve this mess, recent changes to the VT-d2 specification:
http://download.intel.com/technology/computing/vptech/Intel(r)_VT_for_Direct_IO.pdf
includes a mechanism that provides BIOS a way to request system
software to opt out of enabling x2apic mode.
Look at the x2apic optout flag in the DMAR tables before
enabling the x2apic mode in the platform. Also print a warning
that we have disabled x2apic based on the BIOS request.
Kernel boot parameter "intremap=no_x2apic_optout" can be used to
override the BIOS x2apic optout request.
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: yinghai@kernel.org
Cc: joerg.roedel@amd.com
Cc: tony.luck@intel.com
Cc: dwmw2@infradead.org
Link: http://lkml.kernel.org/r/20110824001456.171766616@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch fixes the typo in parameters passed to
x86_32 switch_to() description.
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
del_timer_sync() can cause a deadlock when called in interrupt context.
It is used with on_each_cpu() in some parts for sysfs files like bank*,
check_interval, cmci_disabled and ignore_ce.
However, use of on_each_cpu() results in calling the function passed
as the argument in interrupt context. This causes a flood of nested
warnings from del_timer_sync() (it runs on each CPU) caused even by a
simple file access like:
$ echo 300 > /sys/devices/system/machinecheck/machinecheck0/check_interval
Fortunately, these MCE-specific files are rarely used and AFAIK only few
MCE geeks experience this warning.
To remove the warning, move timer deletion outside of the interrupt
context.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
The cmci_discover_lock can be taken in atomic context (cpu bring
up sequence) and therefore cannot be preempted on -rt.
In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
L3 subcaches 0 and 1 of AMD Family 15h CPUs can have a size of 2MB.
Update the calculation routine for the number of L3 indices to
reflect that.
Signed-off-by: Frank Arnold <frank.arnold@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Rosenfeld Hans <Hans.Rosenfeld@amd.com>
Cc: Herrmann3 Andreas <Andreas.Herrmann3@amd.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Frank Arnold <Frank.Arnold@amd.com>
Link: http://lkml.kernel.org/r/20110726170449.GB32536@aftab
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It's not a good reason to allocate memory in the smp function call
just because someone thought it's the most conveniant place.
The AMD L3 data is coupled to the northbridge info by a pointer to the
corresponding north bridge data. So allocating it with the northbridge
data and referencing the northbridge in the cache_info code instead
uses less memory and gets rid of that atomic allocation hack in the
smp function call.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20110723212626.688229918@linutronix.de
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit f9b90566c ("x86: reduce stack usage in init_intel_cacheinfo")
introduced a shadow structure to reduce the stack usage on large
machines instead of making the smaller structure embedded into the
large one. That's definitely a candidate for the bad taste award.
Move the small struct into the large one and get rid of the ugly type
casts.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20110723212626.625651773@linutronix.de
Signed-off-by: Ingo Molnar <mingo@elte.hu>
free_cache_attributes() kfree's:
per_cpu(ici_cpuid4_info, cpu)->l3
which is a pointer to memory which was allocated as a block in
amd_init_l3_cache(). l3 of a particular cpu points to a part of this
memory blob. The part and the rest of the blob are still referenced by
other cpus.
As far as I can tell from the git history this is a leftover from the
conversion from per cpu to node data with commit ba06edb63(x86,
cacheinfo: Make L3 cache info per node) and the following commit
f658bcfb2(x86, cacheinfo: Cleanup L3 cache index disable support)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20110723212626.550539989@linutronix.de
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The nfsservctl system call is now gone, so we should remove all
linkage for it.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
entry_32.S contained a hardcoded alternative instruction entry, and the
format changed in commit 59e97e4d6f ("x86: Make alternative
instruction pointers relative").
Replace the hardcoded entry with the altinstruction_entry macro. This
fixes the 32-bit boot with CONFIG_X86_INVD_BUG=y.
Reported-and-tested-by: Arnaud Lacombe <lacombar@gmail.com>
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While removing custom rendezvous code and switching to stop_machine,
commit 192d885742 ("x86, mtrr: use stop_machine APIs for doing MTRR
rendezvous") completely dropped mtrr setting code on !CONFIG_SMP
breaking MTRR settting on UP.
Fix it by removing the incorrect CONFIG_SMP.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Anders Eriksson <aeriksson@fastmail.fm>
Tested-and-acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-32, vdso: On system call restart after SYSENTER, use int $0x80
x86, UV: Remove UV delay in starting slave cpus
x86, olpc: Wait for last byte of EC command to be accepted
Because THREAD_SIZE is defined as PAGE_SIZE << THREAD_ORDER on x86, the
call of get_order(THREAD_SIZE) can be replaced with THREAD_ORDER.
Signed-off-by: Zhao Jin <cronozhj@gmail.com>
Link: http://lkml.kernel.org/r/4E4FB5A9.700@gmail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On -rt kfree() can schedule, but CPU_STARTING is before the CPU is
fully up and running. These are contradictory, so avoid it. Instead
push the kfree() to CPU_ONLINE where we're free to schedule.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-kwd4j6ayld5thrscvaxgjquv@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-tip:
x86-64: Rework vsyscall emulation and add vsyscall= parameter
x86-64: Wire up getcpu syscall
x86: Remove unnecessary compile flag tweaks for vsyscall code
x86-64: Add vsyscall:emulate_vsyscall trace event
x86-64: Add user_64bit_mode paravirt op
x86-64, xen: Enable the vvar mapping
x86-64: Work around gold bug 13023
x86-64: Move the "user" vsyscall segment out of the data segment.
x86-64: Pad vDSO to a page boundary
There are three choices:
vsyscall=native: Vsyscalls are native code that issues the
corresponding syscalls.
vsyscall=emulate (default): Vsyscalls are emulated by instruction
fault traps, tested in the bad_area path. The actual contents of
the vsyscall page is the same as the vsyscall=native case except
that it's marked NX. This way programs that make assumptions about
what the code in the page does will not be confused when they read
that code.
vsyscall=none: Trying to execute a vsyscall will segfault.
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/8449fb3abf89851fd6b2260972666a6f82542284.1312988155.git.luto@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
As of commit 98d0ac38ca
Author: Andy Lutomirski <luto@mit.edu>
Date: Thu Jul 14 06:47:22 2011 -0400
x86-64: Move vread_tsc and vread_hpet into the vDSO
user code no longer directly calls into code in arch/x86/kernel/, so
we don't need compile flag hacks to make it safe. All vdso code is
in the vdso directory now.
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/835cd05a4c7740544d09723d6ba48f4406f9826c.1312988155.git.luto@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
When the moduleu.h splitting tree is merged to the latest
tip:x86/cpu tree, the x86_64 allmodconfig build fails like this:
arch/x86/kernel/cpu/amd.c: In function 'bsp_init_amd':
arch/x86/kernel/cpu/amd.c:437:3: error: 'va_align' undeclared (first use in this function)
arch/x86/kernel/cpu/amd.c:438:23: error: 'ALIGN_VA_32' undeclared (first use in this function)
arch/x86/kernel/cpu/amd.c:438:37: error: 'ALIGN_VA_64' undeclared (first use in this function)
This is caused by the module.h split up intreacting with commit
dfb09f9b7a ("x86, amd: Avoid cache aliasing penalties on AMD
family 15h") from the tip:x86/cpu tree.
I have added the following patch for today (this, or something
similar, could be applied to the tip tree directly - the
export.h include below was added by the module.h splitup).
So include elf.h to use va_align and remove this implicit
dependency on module.h doing it for us.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/20110810114956.238d66772883636e3040d29f@canb.auug.org.au
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add support to Romely-EP SandyBridge.
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Anhua Xu <anhua.xu@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1312264895-2010-1-git-send-email-youquan.song@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
hpa reported that dfb09f9b7a breaks 32-bit
builds with the following error message:
/home/hpa/kernel/linux-tip.cpu/arch/x86/kernel/cpu/amd.c:437: undefined
reference to `va_align'
/home/hpa/kernel/linux-tip.cpu/arch/x86/kernel/cpu/amd.c:436: undefined
reference to `va_align'
This is due to the fact that va_align is a global in a 64-bit only
compilation unit. Move it to mmap.c where it is visible to both
subarches.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1312633899-1131-1-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Delete the 10 msec delay between the INIT and SIPI when starting
slave cpus. I can find no requirement for this delay. BIOS also
has similar code sequences without the delay.
Removing the delay reduces boot time by 40 sec. Every bit helps.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/20110805140900.GA6774@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move code which is run once on the BSP during boot into the cpu_dev
helper.
[ hpa: removed bogus cpu_has -> static_cpu_has conversion ]
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/20110805180409.GC26217@aftab
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Add a function ptr to struct cpu_dev which is destined to be run only
once on the BSP during boot.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/20110805180116.GB26217@aftab
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This patch provides performance tuning for the "Bulldozer" CPU. With its
shared instruction cache there is a chance of generating an excessive
number of cache cross-invalidates when running specific workloads on the
cores of a compute module.
This excessive amount of cross-invalidations can be observed if cache
lines backed by shared physical memory alias in bits [14:12] of their
virtual addresses, as those bits are used for the index generation.
This patch addresses the issue by clearing all the bits in the [14:12]
slice of the file mapping's virtual address at generation time, thus
forcing those bits the same for all mappings of a single shared library
across processes and, in doing so, avoids instruction cache aliases.
It also adds the command line option "align_va_addr=(32|64|on|off)" with
which virtual address alignment can be enabled for 32-bit or 64-bit x86
individually, or both, or be completely disabled.
This change leaves virtual region address allocation on other families
and/or vendors unaffected.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1312550110-24160-2-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Three places in the kernel assume that the only long mode CPL 3
selector is __USER_CS. This is not true on Xen -- Xen's sysretq
changes cs to the magic value 0xe033.
Two of the places are corner cases, but as of "x86-64: Improve
vsyscall emulation CS and RIP handling"
(c9712944b2), vsyscalls will segfault
if called with Xen's extra CS selector. This causes a panic when
older init builds die.
It seems impossible to make Xen use __USER_CS reliably without
taking a performance hit on every system call, so this fixes the
tests instead with a new paravirt op. It's a little ugly because
ptrace.h can't include paravirt.h.
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/f4fcb3947340d9e96ce1054a432f183f9da9db83.1312378163.git.luto@mit.edu
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Gold has trouble assigning numbers to the location counter inside of
an output section description. The bug was triggered by
9fd67b4ed0, which consolidated all of
the vsyscall sections into a single section. The workaround is IMO
still nicer than the old way of doing it.
This produces an apparently valid kernel image and passes my vdso
tests on both GNU ld version 2.21.51.0.6-2.fc15 20110118 and GNU
gold (version 2.21.51.0.6-2.fc15 20110118) 1.10 as distributed by
Fedora 15.
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/0b260cb806f1f9a25c00ce8377a5f035d57f557a.1312378163.git.luto@mit.edu
Reported-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
cpuidle: stop depending on pm_idle
x86 idle: move mwait_idle_with_hints() to where it is used
cpuidle: replace xen access to x86 pm_idle and default_idle
cpuidle: create bootparam "cpuidle.off=1"
mrst_pmu: driver for Intel Moorestown Power Management Unit
cpuidle users should call cpuidle_call_idle() directly
rather than via (pm_idle)() function pointer.
Architecture may choose to continue using (pm_idle)(),
but cpuidle need not depend on it:
my_arch_cpu_idle()
...
if(cpuidle_call_idle())
pm_idle();
cc: Kevin Hilman <khilman@deeprootsystems.com>
cc: Paul Mundt <lethal@linux-sh.org>
cc: x86@kernel.org
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
...and make it static
no functional change
cc: x86@kernel.org
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
If the CPU declares that RDRAND is available, go through a guranteed
reseed sequence, and make sure that it is actually working (producing
data.) If it does not, disable the CPU feature flag.
Allow RDRAND to be disabled on the command line (as opposed to at
compile time) for a user who has special requirements with regards to
random numbers.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Theodore Ts'o" <tytso@mit.edu>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
fs: Merge split strings
treewide: fix potentially dangerous trailing ';' in #defined values/expressions
uwb: Fix misspelling of neighbourhood in comment
net, netfilter: Remove redundant goto in ebt_ulog_packet
trivial: don't touch files that are removed in the staging tree
lib/vsprintf: replace link to Draft by final RFC number
doc: Kconfig: `to be' -> `be'
doc: Kconfig: Typo: square -> squared
doc: Konfig: Documentation/power/{pm => apm-acpi}.txt
drivers/net: static should be at beginning of declaration
drivers/media: static should be at beginning of declaration
drivers/i2c: static should be at beginning of declaration
XTENSA: static should be at beginning of declaration
SH: static should be at beginning of declaration
MIPS: static should be at beginning of declaration
ARM: static should be at beginning of declaration
rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check
Update my e-mail address
PCIe ASPM: forcedly -> forcibly
gma500: push through device driver tree
...
Fix up trivial conflicts:
- arch/arm/mach-ep93xx/dma-m2p.c (deleted)
- drivers/gpio/gpio-ep93xx.c (renamed and context nearby)
- drivers/net/r8169.c (just context changes)
This patch removes all the module loader hook implementations in the
architecture specific code where the functionality is the same as that
now provided by the recently added default hooks.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This patch implements the kvm bits of the steal time infrastructure.
The most important part of it, is the steal time clock. It is an
continuous clock that shows the accumulated amount of steal time
since vcpu creation. It is supposed to survive cpu offlining/onlining.
[marcelo: fix build with CONFIG_KVM_GUEST=n]
Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Eric B Munson <emunson@mgebm.net>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Avi Kivity <avi@redhat.com>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* 'x86-detect-hyper-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, hyper: Change hypervisor detection order
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-32, fpu: Fix DNA exception during check_fpu()
* 'x86-kexec-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
kexec, x86: Fix incorrect jump back address if not preserving context
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, config: Introduce an INTEL_MID configuration
* 'x86-quirks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, quirks: Use pci_dev->revision
* 'x86-tsc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: tsc: Remove unneeded DMI-based blacklisting
* 'x86-smpboot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, boot: Wait for boot cpu to show up if nr_cpus limit is about to hit
* 'timers-clocksource-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
clocksource: apb: Share APB timer code with other platforms
* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-64, vdso: Do not allocate memory for the vDSO
clocksource: Change __ARCH_HAS_CLOCKSOURCE_DATA to a CONFIG option
x86, vdso: Drop now wrong comment
Document the vDSO and add a reference parser
ia64: Replace clocksource.fsys_mmio with generic arch data
x86-64: Move vread_tsc and vread_hpet into the vDSO
clocksource: Replace vread with generic arch data
x86-64: Add --no-undefined to vDSO build
x86-64: Allow alternative patching in the vDSO
x86: Make alternative instruction pointers relative
x86-64: Improve vsyscall emulation CS and RIP handling
x86-64: Emulate legacy vsyscalls
x86-64: Fill unused parts of the vsyscall page with 0xcc
x86-64: Remove vsyscall number 3 (venosys)
x86-64: Map the HPET NX
x86-64: Remove kernel.vsyscall64 sysctl
x86-64: Give vvars their own page
x86-64: Document some of entry_64.S
x86-64: Fix alignment of jiffies variable
* 'x86-signal-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Kill handle_signal()->set_fs()
x86, do_signal: Simplify the TS_RESTORE_SIGMASK logic
x86, signals: Convert the X86_32 code to use set_current_blocked()
x86, signals: Convert the IA32_EMULATION code to use set_current_blocked()
* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, mce: Use mce_sysdev_ prefix to group functions
x86, mce: Use mce_chrdev_ prefix to group functions
x86, mce: Cleanup mce_read()
x86, mce: Cleanup mce_create()/remove_device()
x86, mce: Check the result of ancient_init()
x86, mce: Introduce mce_gather_info()
x86, mce: Replace MCM_ with MCI_MISC_
x86, mce: Replace MCE_SELF_VECTOR by irq_work
x86, mce, severity: Clean up trivial coding style problems
x86, mce, severity: Cleanup severity table
x86, mce, severity: Make formatting a bit more readable
x86, mce, severity: Fix two severities table signatures
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, smpboot: Mark the names[] array in __inquire_remote_apic() as const
x86: Convert vmalloc()+memset() to vzalloc()
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, ioapic: Print IR_IO_APIC_route_entry when IR is enabled
x86, ioapic: Print IRTE when IR is enabled
x86, x2apic: Preserve high 32-bits of IA32_APIC_BASE MSR
x86, ioapic: Also print Dest field
x86, ioapic: Format clean up for IOAPIC output
x86: print APIC data a little later during boot
* 'timers-cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
mips: Fix i8253 clockevent fallout
i8253: Cleanup outb/inb magic
arm: Footbridge: Use common i8253 clockevent
mips: Use common i8253 clockevent
x86: Use common i8253 clockevent
i8253: Create common clockevent implementation
i8253: Export i8253_lock unconditionally
pcpskr: MIPS: Make config dependencies finer grained
pcspkr: Cleanup Kconfig dependencies
i8253: Move remaining content and delete asm/i8253.h
i8253: Consolidate definitions of PIT_LATCH
x86: i8253: Consolidate definitions of global_clock_event
i8253: Alpha, PowerPC: Remove unused asm/8253pit.h
alpha: i8253: Cleanup remaining users of i8253pit.h
i8253: Remove I8253_LOCK config
i8253: Make pcsp sound driver use the shared i8253_lock
i8253: Make pcspkr input driver use the shared i8253_lock
i8253: Consolidate all kernel definitions of i8253_lock
i8253: Unify all kernel declarations of i8253_lock
i8253: Create linux/i8253.h and use it in all 8253 related files
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (123 commits)
perf: Remove the nmi parameter from the oprofile_perf backend
x86, perf: Make copy_from_user_nmi() a library function
perf: Remove perf_event_attr::type check
x86, perf: P4 PMU - Fix typos in comments and style cleanup
perf tools: Make test use the preset debugfs path
perf tools: Add automated tests for events parsing
perf tools: De-opt the parse_events function
perf script: Fix display of IP address for non-callchain path
perf tools: Fix endian conversion reading event attr from file header
perf tools: Add missing 'node' alias to the hw_cache[] array
perf probe: Support adding probes on offline kernel modules
perf probe: Add probed module in front of function
perf probe: Introduce debuginfo to encapsulate dwarf information
perf-probe: Move dwarf library routines to dwarf-aux.{c, h}
perf probe: Remove redundant dwarf functions
perf probe: Move strtailcmp to string.c
perf probe: Rename DIE_FIND_CB_FOUND to DIE_FIND_CB_END
tracing/kprobe: Update symbol reference when loading module
tracing/kprobes: Support module init function probing
kprobes: Return -ENOENT if probe point doesn't exist
...
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
iommu/core: Fix build with INTR_REMAP=y && CONFIG_DMAR=n
iommu/amd: Don't use MSI address range for DMA addresses
iommu/amd: Move missing parts to drivers/iommu
iommu: Move iommu Kconfig entries to submenu
x86/ia64: intel-iommu: move to drivers/iommu/
x86: amd_iommu: move to drivers/iommu/
msm: iommu: move to drivers/iommu/
drivers: iommu: move to a dedicated folder
x86/amd-iommu: Store device alias as dev_data pointer
x86/amd-iommu: Search for existind dev_data before allocting a new one
x86/amd-iommu: Allow dev_data->alias to be NULL
x86/amd-iommu: Use only dev_data in low-level domain attach/detach functions
x86/amd-iommu: Use only dev_data for dte and iotlb flushing routines
x86/amd-iommu: Store ATS state in dev_data
x86/amd-iommu: Store devid in dev_data
x86/amd-iommu: Introduce global dev_data_list
x86/amd-iommu: Remove redundant device_flush_dte() calls
iommu-api: Add missing header file
Fix up trivial conflicts (independent additions close to each other) in
drivers/Makefile and include/linux/pci.h
* 'of-pci' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
pci/of: Consolidate pci_bus_to_OF_node()
pci/of: Consolidate pci_device_to_OF_node()
x86/devicetree: Use generic PCI <-> OF matching
microblaze/pci: Move the remains of pci_32.c to pci-common.c
microblaze/pci: Remove powermac originated cruft
pci/of: Match PCI devices to OF nodes dynamically
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1287 commits)
icmp: Fix regression in nexthop resolution during replies.
net: Fix ppc64 BPF JIT dependencies.
acenic: include NET_SKB_PAD headroom to incoming skbs
ixgbe: convert to ndo_fix_features
ixgbe: only enable WoL for magic packet by default
ixgbe: remove ifdef check for non-existent define
ixgbe: Pass staterr instead of re-reading status and error bits from descriptor
ixgbe: Move interrupt related values out of ring and into q_vector
ixgbe: add structure for containing RX/TX rings to q_vector
ixgbe: inline the ixgbe_maybe_stop_tx function
ixgbe: Update ATR to use recorded TX queues instead of CPU for routing
igb: Fix for DH89xxCC near end loopback test
e1000: always call e1000_check_for_link() on e1000_ce4100 MACs.
netxen: add fw version compatibility check
be2net: request native mode each time the card is reset
ipv4: Constrain UFO fragment sizes to multiples of 8 bytes
virtio_net: Fix panic in virtnet_remove
ipv6: make fragment identifications less predictable
ipv6: unshare inetpeers
can: make function can_get_bittiming static
...
The Host used to create some page tables for the Guest to use at the
top of Guest memory; it would then tell the Guest where this was. In
particular, it created linear mappings for 0 and 0xC0000000 addresses
because lguest used to switch to its real page tables quite late in
boot.
However, since d50d8fe19 Linux initialized boot page tables in
head_32.S even before the "are we lguest?" boot jump. So, now we can
simplify things: the Host pagetable code assumes 1:1 linear mapping
until it first calls the LHCALL_NEW_PGTABLE hypercall, which we now do
before we reach C code.
This also means that the Host doesn't need to know anything about the
Guest's PAGE_OFFSET. (Non-Linux guests might not even have such a
thing).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Yet another variant of the Dell Latitude series which requires
reboot=pci.
From the E5420 bug report by Daniel J Blueman:
> The E6420 is affected also (same platform, different casing and
> features), which provides an external confirmation of the issue; I can
> submit a patch for that later or include it if you prefer:
> http://linux.koolsolutions.com/2009/08/04/howto-fix-linux-hangfreeze-during-reboots-and-restarts/
Reported-by: Daniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org>
Rebooting on the Dell E5420 often hangs with the keyboard or ACPI
methods, but is reliable via the PCI method.
[ hpa: this was deferred because we believed for a long time that the
recent reshuffling of the boot priorities in commit
660e34cebf fixed this platform.
Unfortunately that turned out to be incorrect. ]
Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
Link: http://lkml.kernel.org/r/1305248699-2347-1-git-send-email-daniel.blueman@gmail.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org>
copy_from_user_nmi() is used in oprofile and perf. Moving it to other
library functions like copy_from_user(). As this is x86 code for 32
and 64 bits, create a new file usercopy.c for unified code.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110607172413.GJ20052@erda.amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch:
- fixes typos in comments and clarifies the text
- renames obscure p4_event_alias::original and ::alter members to
::original and ::alternative as appropriate
- drops parenthesis from the return of p4_get_alias_event()
No functional changes.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Link: http://lkml.kernel.org/r/20110721160625.GX7492@sun
Signed-off-by: Ingo Molnar <mingo@elte.hu>
All these are instances of
#define NAME value;
or
#define NAME(params_opt) value;
These of course fail to build when used in contexts like
if(foo $OP NAME)
while(bar $OP NAME)
and may silently generate the wrong code in contexts such as
foo = NAME + 1; /* foo = value; + 1; */
bar = NAME - 1; /* bar = value; - 1; */
baz = NAME & quux; /* baz = value; & quux; */
Reported on comp.lang.c,
Message-ID: <ab0d55fe-25e5-482b-811e-c475aa6065c3@c29g2000yqd.googlegroups.com>
Initial analysis of the dangers provided by Keith Thompson in that thread.
There are many more instances of more complicated macros having unnecessary
trailing semicolons, but this pile seems to be all of the cases of simple
values suffering from the problem. (Thus things that are likely to be found
in one of the contexts above, more complicated ones aren't.)
Signed-off-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
In kexec jump support, jump back address passed to the kexeced
kernel via function calling ABI, that is, the function call
return address is the jump back entry.
Furthermore, jump back entry == 0 should be used to signal that
the jump back or preserve context is not enabled in the original
kernel.
But in the current implementation the stack position used for
function call return address is not cleared context
preservation is disabled. The patch fixes this bug.
Reported-and-tested-by: Yin Kangkai <kangkai.yin@intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1310607277-25029-1-git-send-email-ying.huang@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This code uses PCI_CLASS_REVISION instead of PCI_REVISION_ID, so
it wasn't converted by commit 44c10138fd ("PCI: Change all
drivers to use pci_device->revision") before being moved to arch/x86/...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/r/201107111901.39281.sshtylyov@ru.mvista.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix the printk_once() so that it actually prints (didn't print before
due to a stray comma.)
[ hpa: changed to an incremental patch and adjusted the description
accordingly. ]
Signed-off-by: Len Brown <len.brown@intel.com>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1107151732480.18606@x980
Cc: <table@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
handle_signal()->set_fs() has a nice comment which explains what
set_fs() is, but it doesn't explain why it is needed and why it
depends on CONFIG_X86_64.
Afaics, the history of this confusion is:
1. I guess today nobody can explain why it was needed
in arch/i386/kernel/signal.c, perhaps it was always
wrong. This predates 2.4.0 kernel.
2. then it was copy-and-past'ed to the new x86_64 arch.
3. then it was removed from i386 (but not from x86_64)
by b93b6ca3 "i386: remove unnecessary code".
4. then it was reintroduced under CONFIG_X86_64 when x86
unified i386 and x86_64, because the patch above didn't
touch x86_64.
Remove it. ->addr_limit should be correct. Even if it was possible
that it is wrong, it is too late to fix it after setup_rt_frame().
Linus commented in:
http://lkml.kernel.org/r/alpine.LFD.0.999.0707170902570.19166@woody.linux-foundation.org
... about the equivalent bit from i386:
Heh. I think it's entirely historical.
Please realize that the whole reason that function is called "set_fs()" is
that it literally used to set the %fs segment register, not
"->addr_limit".
So I think the "set_fs(USER_DS)" is there _only_ to match the other
regs->xds = __USER_DS;
regs->xes = __USER_DS;
regs->xss = __USER_DS;
regs->xcs = __USER_CS;
things, and never mattered. And now it matters even less, and has been
copied to all other architectures where it is just totally insane.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20110710164424.GA20261@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
1. do_signal() looks at TS_RESTORE_SIGMASK and calculates the
mask which should be stored in the signal frame, then it
passes "oldset" to the callees, down to setup_rt_frame().
This is ugly, setup_rt_frame() can do this itself and nobody
else needs this sigset_t. Move this code into setup_rt_frame.
2. do_signal() also clears TS_RESTORE_SIGMASK if handle_signal()
succeeds.
We can move this to setup_rt_frame() as well, this avoids the
unnecessary checks and makes the logic more clear.
3. use set_current_blocked() instead of sigprocmask(SIG_SETMASK),
sigprocmask() should be avoided.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20110710182203.GA27979@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
sys_sigsuspend() and sys_sigreturn() change ->blocked directly.
This is not correct, see the changelog in e6fa16ab
"signal: sigprocmask() should do retarget_shared_pending()"
Change them to use set_current_blocked().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20110710192727.GA31759@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Instead of hw_nmi_watchdog_set_attr() weak function
and appropriate x86_pmu::hw_watchdog_set_attr() call
we introduce even alias mechanism which allow us
to drop this routines completely and isolate quirks
of Netburst architecture inside P4 PMU code only.
The main idea remains the same though -- to allow
nmi-watchdog and perf top run simultaneously.
Note the aliasing mechanism applies to generic
PERF_COUNT_HW_CPU_CYCLES event only because arbitrary
event (say passed as RAW initially) might have some
additional bits set inside ESCR register changing
the behaviour of event and we can't guarantee anymore
that alias event will give the same result.
P.S. Thanks a huge to Don and Steven for for testing
and early review.
Acked-by: Don Zickus <dzickus@redhat.com>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Ingo Molnar <mingo@elte.hu>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Stephane Eranian <eranian@google.com>
CC: Lin Ming <ming.m.lin@intel.com>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20110708201712.GS23657@sun
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Since 2.6.36 (23016bf0d2), Linux prints the existence of "epb" in /proc/cpuinfo,
Since 2.6.38 (d5532ee7b4), the x86_energy_perf_policy(8) utility has
been available in-tree to update MSR_IA32_ENERGY_PERF_BIAS.
However, the typical BIOS fails to initialize the MSR, presumably
because this is handled by high-volume shrink-wrap operating systems...
Linux distros, on the other hand, do not yet invoke x86_energy_perf_policy(8).
As a result, WSM-EP, SNB, and later hardware from Intel will run in its
default hardware power-on state (performance), which assumes that users
care for performance at all costs and not for energy efficiency.
While that is fine for performance benchmarks, the hardware's intended default
operating point is "normal" mode...
Initialize the MSR to the "normal" by default during kernel boot.
x86_energy_perf_policy(8) is available to change the default after boot,
should the user have a different preference.
Signed-off-by: Len Brown <len.brown@intel.com>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1107140051020.18606@x980
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@kernel.org>
Other than sanity check and debug message, the x86 specific version of
memblock reserve/free functions are simple wrappers around the generic
versions - memblock_reserve/free().
This patch adds debug messages with caller identification to the
generic versions and replaces x86 specific ones and kills them.
arch/x86/include/asm/memblock.h and arch/x86/mm/memblock.c are empty
after this change and removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310462166-31469-14-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
memblock_find_dma_reserve() wants to find out how much memory is
reserved under MAX_DMA_PFN. memblock_x86_memory_[free_]in_range() are
used to find out the amounts of all available and free memory in the
area, which are then subtracted to find out the amount of reservation.
memblock_x86_memblock_[free_]in_range() are implemented using
__memblock_x86_memory_in_range() which builds ranges from memblock and
then count them, which is rather unnecessarily complex.
This patch open codes the counting logic directly in
memblock_find_dma_reserve() using memblock iterators and removes now
unused __memblock_x86_memory_in_range() and find_range_array().
Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310462166-31469-11-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
setup_bios_corruption_check() and memtest do_one_pass() open code
memblock free area iteration using memblock_x86_find_in_range_size().
Convert them to use for_each_free_mem_range() instead.
This leaves memblock_x86_find_in_range_size() and
memblock_x86_check_reserved_size() unused. Kill them.
Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310462166-31469-8-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
early_reserve_e820() implements its own ad-hoc early allocator using
memblock_x86_find_in_range_size(). Use __memblock_alloc_base()
instead and remove the unnecessary @startt parameter (it's top-down
allocation anyway).
Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310462166-31469-6-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This patch adds a function pointer in one of the many paravirt_ops
structs, to allow guests to register a steal time function. Besides
a steal time function, we also declare two jump_labels. They will be
used to allow the steal time code to be easily bypassed when not
in use.
Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Eric B Munson <emunson@mgebm.net>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
25818f0f28 (memblock: Make MEMBLOCK_ERROR be 0) thankfully made
MEMBLOCK_ERROR 0 and there already are codes which expect error return
to be 0. There's no point in keeping MEMBLOCK_ERROR around. End its
misery.
Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310457490-3356-6-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The vread field was bloating struct clocksource everywhere except
x86_64, and I want to change the way this works on x86_64, so let's
split it out into per-arch data.
Cc: x86@kernel.org
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: linux-ia64@vger.kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/3ae5ec76a168eaaae63f08a2a1060b91aa0b7759.1310563276.git.luto@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Three fixes here:
- Send SIGSEGV if called from compat code or with a funny CS.
- Don't BUG on impossible addresses.
- Add a missing local_irq_disable.
This patch also removes an unused variable.
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/6fb2b13ab39b743d1e4f466eef13425854912f7f.1310563276.git.luto@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
When "apic=debug" is used as a boot parameter, Linux prints the IOAPIC routing
entries in "dmesg". Below is output from IOAPIC whose apic_id is 8:
# dmesg | grep "routing entry"
IOAPIC[8]: Set routing entry (8-1 -> 0x31 -> IRQ 1 Mode:0 Active:0 Dest:0)
IOAPIC[8]: Set routing entry (8-2 -> 0x30 -> IRQ 0 Mode:0 Active:0 Dest:0)
IOAPIC[8]: Set routing entry (8-3 -> 0x33 -> IRQ 3 Mode:0 Active:0 Dest:0)
...
Similarly, when IR (interrupt remapping) is enabled, and the IRTE
(interrupt remapping table entry) is set up we should display it.
After the fix:
# dmesg | grep IRTE
IOAPIC[8]: Set IRTE entry (P:1 FPD:0 Dst_Mode:0 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:31 Dest:00000000 SID:00F1 SQ:0 SVT:1)
IOAPIC[8]: Set IRTE entry (P:1 FPD:0 Dst_Mode:0 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:30 Dest:00000000 SID:00F1 SQ:0 SVT:1)
IOAPIC[8]: Set IRTE entry (P:1 FPD:0 Dst_Mode:0 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:33 Dest:00000000 SID:00F1 SQ:0 SVT:1)
...
The IRTE is defined in Sec 9.5 of the Intel VT-d Specification.
Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Link: http://lkml.kernel.org/r/20110712211704.2939.71291.sendpatchset@nchumbalkar.americas.cpqcorp.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The code in setup_ioapic_irq() determines the Destination Field,
so why not also include it in the debug printk output that gets
displayed when the boot parameter "apic=debug" is used.
Before the change, "dmesg" will show:
IOAPIC[0]: Set routing entry (8-1 -> 0x31 -> IRQ 1 Mode:0 Active:0)
IOAPIC[0]: Set routing entry (8-2 -> 0x30 -> IRQ 0 Mode:0 Active:0)
IOAPIC[0]: Set routing entry (8-3 -> 0x33 -> IRQ 3 Mode:0 Active:0) ...
After the change, you will see:
IOAPIC[0]: Set routing entry (8-1 -> 0x31 -> IRQ 1 Mode:0 Active:0 Dest:0)
IOAPIC[0]: Set routing entry (8-2 -> 0x30 -> IRQ 0 Mode:0 Active:0 Dest:0)
IOAPIC[0]: Set routing entry (8-3 -> 0x33 -> IRQ 3 Mode:0 Active:0 Dest:0) ...
Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Link: http://lkml.kernel.org/r/20110708184603.2734.91071.sendpatchset@nchumbalkar.americas.cpqcorp.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When IOAPIC data is displayed in "dmesg" with the help of the
boot parameter "apic=debug" certain values are not formatted
correctly wrt their size.
In the "dmesg" snippet below, note that the output for "max
redirection entries", and "IO APIC version" which are each
defined to be just 8-bits long are displayed as 2 bytes in
length. Similarly, "Dst" under the "IRQ redirection table"
should only be 8-bits long.
IO APIC #0......
...
...
.... register #01: 00170020
....... : max redirection entries: 0017
....... : PRQ implemented: 0
....... : IO APIC version: 0020
...
...
.... IRQ redirection table:
NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:
00 000 1 0 0 0 0 0 0 00
01 000 0 0 0 0 0 0 0 31
02 000 0 0 0 0 0 0 0 30
03 000 1 0 0 0 0 0 0 33
...
...
Do some formatting clean up, so you will see output like below:
IO APIC #0......
...
...
.... register #01: 00170020
....... : max redirection entries: 17
....... : PRQ implemented: 0
....... : IO APIC version: 20
...
...
.... IRQ redirection table:
NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:
00 00 1 0 0 0 0 0 0 00
01 00 0 0 0 0 0 0 0 31
02 00 0 0 0 0 0 0 0 30
03 00 1 0 0 0 0 0 0 33
...
...
Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Link: http://lkml.kernel.org/r/20110708184557.2734.61830.sendpatchset@nchumbalkar.americas.cpqcorp.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Detect Xen before HyperV because in Viridian compatibility mode Xen
presents itself as HyperV. Move Xen to the top since it seems more
likely that Xen would emulate VMware than vice versa.
Signed-off-by: Anupam Chanda <achanda@nicira.com>
Link: http://lkml.kernel.org/r/1310150570-26810-1-git-send-email-achanda@nicira.com
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
nr_cpus allows one to specify number of possible cpus in the system.
Current assumption seems to be that first cpu to show up is boot cpu
and this assumption will be broken in kdump scenario where we can be
booting on a non boot cpu with nr_cpus=1.
It might happen that first cpu we parse is not the cpu we boot on and
later we ignore boot cpu. Though code later seems to recognize this
anomaly and forcibly sets boot cpu in physical cpu map with following
warning.
if (!physid_isset(hard_smp_processor_id(), phys_cpu_present_map)) {
printk(KERN_WARNING
"weird, boot CPU (#%d) not listed by the BIOS.\n",
hard_smp_processor_id());
physid_set(hard_smp_processor_id(), phys_cpu_present_map);
}
This patch waits for boot cpu to show up and starts ignoring the cpus
once we have hit (nr_cpus - 1) number of cpus. So effectively we are
reserving one slot out of nr_cpus for boot cpu explicitly.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20110708171926.GF2930@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Some BIOSes will reset the Intel MISC_ENABLE MSR (specifically the
XD_DISABLE bit) when resuming from S3, which can interact poorly with
ebba638ae7. In 32bit PAE mode, this can
lead to a fault when EFER is restored by the kernel wakeup routines,
due to it setting the NX bit for a CPU that (thanks to the BIOS reset)
now incorrectly thinks it lacks the NX feature. (64bit is not affected
because it uses a common CPU bring-up that specifically handles the
XD_DISABLE bit.)
The need for MISC_ENABLE being restored so early is specific to the S3
resume path. Normally, MISC_ENABLE is saved in save_processor_state(),
but this happens after the resume header is created, so just reproduce
the logic here. (acpi_suspend_lowlevel() creates the header, calls
do_suspend_lowlevel, which calls save_processor_state(), so the saved
processor context isn't available during resume header creation.)
[ hpa: Consider for stable if OK in mainline ]
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Link: http://lkml.kernel.org/r/20110707011034.GA8523@outflux.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: <stable@kernel.org> 2.6.38+
Since git commit
660e34cebf x86: reorder reboot method
preferences,
my Acer Aspire One hangs on reboot. It appears that its ACPI method
for rebooting is broken. The attached patch adds a quirk so that the
machine will reboot via the BIOS.
[ hpa: verified that the ACPI control on this machine is just plain broken. ]
Signed-off-by: Peter Chubb <peter.chubb@nicta.com.au>
Link: http://lkml.kernel.org/r/w439iki5vl.wl%25peter@chubb.wattle.id.au
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
rbp is used in SAVE_ARGS_IRQ to save the old stack pointer
in order to restore it later in ret_from_intr.
It is convenient because we save its value in the irq regs
and it's easily restored using the leave instruction.
However this is a kind of abuse of the frame pointer which
role is to help unwinding the kernel by chaining frames
together, each node following the return address to the
previous frame.
But although we are breaking the frame by changing the stack
pointer, there is no preceding return address before the new
frame. Hence using the frame pointer to link the two stacks
breaks the stack unwinders that find a random value instead of
a return address here.
There is no workaround that can work in every case. We are using
the fixup_bp_irq_link() function to dereference that abused frame
pointer in the case of non nesting interrupt (which means stack
changed).
But that doesn't fix the case of interrupts that don't change the
stack (but we still have the unconditional frame link), which is
the case of hardirq interrupting softirq. We have no way to detect
this transition so the frame irq link is considered as a real frame
pointer and the return address is dereferenced but it is still a
spurious one.
There are two possible results of this: either the spurious return
address, a random stack value, luckily belongs to the kernel text
and then the unwinding can continue and we just have a weird entry
in the stack trace. Or it doesn't belong to the kernel text and
unwinding stops there.
This is the reason why stacktraces (including perf callchains) on
irqs that interrupted softirqs don't work very well.
To solve this, we don't save the old stack pointer on rbp anymore
but we save it to a scratch register that we push on the new
stack and that we pop back later on irq return.
This preserves the whole frame chain without spurious return addresses
in the middle and drops the need for the horrid fixup_bp_irq_link()
workaround.
And finally irqs that interrupt softirq are sanely unwinded.
Before:
99.81% perf [kernel.kallsyms] [k] perf_pending_event
|
--- perf_pending_event
irq_work_run
smp_irq_work_interrupt
irq_work_interrupt
|
|--41.60%-- __read
| |
| |--99.90%-- create_worker
| | bench_sched_messaging
| | cmd_bench
| | run_builtin
| | main
| | __libc_start_main
| --0.10%-- [...]
After:
1.64% swapper [kernel.kallsyms] [k] perf_pending_event
|
--- perf_pending_event
irq_work_run
smp_irq_work_interrupt
irq_work_interrupt
|
|--95.00%-- arch_irq_work_raise
| irq_work_queue
| __perf_event_overflow
| perf_swevent_overflow
| perf_swevent_event
| perf_tp_event
| perf_trace_softirq
| __do_softirq
| call_softirq
| do_softirq
| irq_exit
| |
| |--73.68%-- smp_apic_timer_interrupt
| | apic_timer_interrupt
| | |
| | |--96.43%-- amd_e400_idle
| | | cpu_idle
| | | start_secondary
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
The unwinder backlink in interrupt entry is very useless.
It's actually not part of the stack frame chain and thus is
never used.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
Just for clarity in the code. Have a first block that handles
the frame pointer and a separate one that handles pt_regs
pointer and its use.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
The save_regs function that saves the regs on low level
irq entry is complicated because of the fact it changes
its stack in the middle and also because it manipulates
data allocated in the caller frame and accesses there
are directly calculated from callee rsp value with the
return address in the middle of the way.
This complicates the static stack offsets calculation and
require more dynamic ones. It also needs a save/restore
of the function's return address.
To simplify and optimize this, turn save_regs() into a
macro.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
When regs are passed to dump_stack(), we fetch the frame
pointer from the regs but the stack pointer is taken from
the current frame.
Thus the frame and stack pointers may not come from the same
context. For example this can result in the unwinder to
think the context is in irq, due to the current value of
the stack, but the frame pointer coming from the regs points
to a frame from another place. It then tries to fix up
the irq link but ends up dereferencing a random frame
pointer that doesn't belong to the irq stack:
[ 9131.706906] ------------[ cut here ]------------
[ 9131.707003] WARNING: at arch/x86/kernel/dumpstack_64.c:129 dump_trace+0x2aa/0x330()
[ 9131.707003] Hardware name: AMD690VM-FMH
[ 9131.707003] Perf: bad frame pointer = 0000000000000005 in callchain
[ 9131.707003] Modules linked in:
[ 9131.707003] Pid: 1050, comm: perf Not tainted 3.0.0-rc3+ #181
[ 9131.707003] Call Trace:
[ 9131.707003] <IRQ> [<ffffffff8104bd4a>] warn_slowpath_common+0x7a/0xb0
[ 9131.707003] [<ffffffff8104be21>] warn_slowpath_fmt+0x41/0x50
[ 9131.707003] [<ffffffff8178b873>] ? bad_to_user+0x6d/0x10be
[ 9131.707003] [<ffffffff8100c2da>] dump_trace+0x2aa/0x330
[ 9131.707003] [<ffffffff810107d3>] ? native_sched_clock+0x13/0x50
[ 9131.707003] [<ffffffff8101b164>] perf_callchain_kernel+0x54/0x70
[ 9131.707003] [<ffffffff810d391f>] perf_prepare_sample+0x19f/0x2a0
[ 9131.707003] [<ffffffff810d546c>] __perf_event_overflow+0x16c/0x290
[ 9131.707003] [<ffffffff810d5430>] ? __perf_event_overflow+0x130/0x290
[ 9131.707003] [<ffffffff810107d3>] ? native_sched_clock+0x13/0x50
[ 9131.707003] [<ffffffff8100fbb9>] ? sched_clock+0x9/0x10
[ 9131.707003] [<ffffffff810752e5>] ? T.375+0x15/0x90
[ 9131.707003] [<ffffffff81084da4>] ? trace_hardirqs_on_caller+0x64/0x180
[ 9131.707003] [<ffffffff810817bd>] ? trace_hardirqs_off+0xd/0x10
[ 9131.707003] [<ffffffff810d5764>] perf_event_overflow+0x14/0x20
[ 9131.707003] [<ffffffff810d588c>] perf_swevent_hrtimer+0x11c/0x130
[ 9131.707003] [<ffffffff817821a1>] ? error_exit+0x51/0xb0
[ 9131.707003] [<ffffffff81072e93>] __run_hrtimer+0x83/0x1e0
[ 9131.707003] [<ffffffff810d5770>] ? perf_event_overflow+0x20/0x20
[ 9131.707003] [<ffffffff81073256>] hrtimer_interrupt+0x106/0x250
[ 9131.707003] [<ffffffff812a3bfd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 9131.707003] [<ffffffff81024833>] smp_apic_timer_interrupt+0x53/0x90
[ 9131.707003] [<ffffffff81789053>] apic_timer_interrupt+0x13/0x20
[ 9131.707003] <EOI> [<ffffffff817821a1>] ? error_exit+0x51/0xb0
[ 9131.707003] [<ffffffff8178219c>] ? error_exit+0x4c/0xb0
[ 9131.707003] ---[ end trace b2560d4876709347 ]---
Fix this by simply taking the stack pointer from regs->sp
when regs are provided.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
This code uses PCI_CLASS_REVISION instead of PCI_REVISION_ID, so
it wasn't converted by commit 44c10138fd ("PCI: Change all
drivers to use pci_device->revision") before being moved to
arch/x86/...
Do it now at last -- and save one level of indentation...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/201107012242.08347.sshtylyov@ru.mvista.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The v1 PMU does not have any fixed counters. Using the v2 constraints,
which do have fixed counters, causes an additional choice to be present
in the weight calculation, but not when actually scheduling the event,
leading to an event being not scheduled at all.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1309362157-6596-3-git-send-email-avi@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The perf_event overflow handler does not receive any caller-derived
argument, so many callers need to resort to looking up the perf_event
in their local data structure. This is ugly and doesn't scale if a
single callback services many perf_events.
Fix by adding a context parameter to perf_event_create_kernel_counter()
(and derived hardware breakpoints APIs) and storing it in the perf_event.
The field can be accessed from the callback as event->overflow_handler_context.
All callers are updated.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1309362157-6596-2-git-send-email-avi@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add a NODE level to the generic cache events which is used to measure
local vs remote memory accesses. Like all other cache events, an
ACCESS is HIT+MISS, if there is no way to distinguish between reads
and writes do reads only etc..
The below needs filling out for !x86 (which I filled out with
unsupported events).
I'm fairly sure ARM can leave it like that since it doesn't strike me as
an architecture that even has NUMA support. SH might have something since
it does appear to have some NUMA bits.
Sparc64, PowerPC and MIPS certainly want a good look there since they
clearly are NUMA capable.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: David Miller <davem@davemloft.net>
Cc: Anton Blanchard <anton@samba.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1303508226.4865.8.camel@laptop
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since the OFFCORE registers are fully symmetric, try the other one
when the specified one is already in use.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1306141897.18455.8.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds Intel Sandy Bridge offcore_response support by
providing the low-level constraint table for those events.
On Sandy Bridge, there are two offcore_response events. Each uses
its own dedictated extra register. But those registers are NOT shared
between sibling CPUs when HT is on unlike Nehalem/Westmere. They are
always private to each CPU. But they still need to be controlled within
an event group. All events within an event group must use the same
value for the extra MSR. That's not controlled by the second patch in
this series.
Furthermore on Sandy Bridge, the offcore_response events have NO
counter constraints contrary to what the official documentation
indicates, so drop the events from the contraint table.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110606145712.GA7304@quad
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The validate_group() function needs to validate events with
extra shared regs. Within an event group, only events with
the same value for the extra reg can co-exist. This was not
checked by validate_group() because it was missing the
shared_regs logic.
This patch changes the allocation of the fake cpuc used for
validation to also point to a fake shared_regs structure such
that group events be properly testing.
It modifies __intel_shared_reg_get_constraints() to use
spin_lock_irqsave() to avoid lockdep issues.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110606145708.GA7279@quad
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch improves the code managing the extra shared registers
used for offcore_response events on Intel Nehalem/Westmere. The
idea is to use static allocation instead of dynamic allocation.
This simplifies greatly the get and put constraint routines for
those events.
The patch also renames per_core to shared_regs because the same
data structure gets used whether or not HT is on. When HT is
off, those events still need to coordination because they use
a extra MSR that has to be shared within an event group.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110606145703.GA7258@quad
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since only samples call perf_output_sample() its much saner (and more
correct) to put the sample logic in there than in the
perf_output_begin()/perf_output_end() pair.
Saves a useless argument, reduces conditionals and shrinks
struct perf_output_handle, win!
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-2crpvsx3cqu67q3zqjbnlpsc@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The nmi parameter indicated if we could do wakeups from the current
context, if not, we would set some state and self-IPI and let the
resulting interrupt do the wakeup.
For the various event classes:
- hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
the PMI-tail (ARM etc.)
- tracepoint: nmi=0; since tracepoint could be from NMI context.
- software: nmi=[0,1]; some, like the schedule thing cannot
perform wakeups, and hence need 0.
As one can see, there is very little nmi=1 usage, and the down-side of
not using it is that on some platforms some software events can have a
jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).
The up-side however is that we can remove the nmi parameter and save a
bunch of conditionals in fast paths.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Due to restriction and specifics of Netburst PMU we need a separated
event for NMI watchdog. In particular every Netburst event
consumes not just a counter and a config register, but also an
additional ESCR register.
Since ESCR registers are grouped upon counters (i.e. if ESCR is occupied
for some event there is no room for another event to enter until its
released) we need to pick up the "least" used ESCR (or the most available
one) for nmi-watchdog purposes -- so MSR_P4_CRU_ESCR2/3 was chosen.
With this patch nmi-watchdog and perf top should be able to run simultaneously.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Lin Ming <ming.m.lin@intel.com>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
Tested-and-reviewed-by: Don Zickus <dzickus@redhat.com>
Tested-and-reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110623124918.GC13050@sun
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the hysterical outb/inb_pit defines and use outb_p/inb_p in the
code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20110609130622.348437125@linutronix.de
Before check_fpu() is called, we have cr0.TS bit set and hence the floating
point code to check the FDIV bug was generating a DNA exception.
Use kernel_fpu_begin()/kernel_fpu_end() around the floating point
code to avoid this unnecessary device not available exception during
boot.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1309479572.2665.1372.camel@sbsiddha-MOBL3.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
MTRR rendezvous sequence is not implemened using stop_machine() before, as this
gets called both from the process context aswell as the cpu online paths
(where the cpu has not come online and the interrupts are disabled etc).
Now that we have a new stop_machine_from_inactive_cpu() API, use it for
rendezvous during mtrr init of a logical processor that is coming online.
For the rest (runtime MTRR modification, system boot, resume paths), use
stop_machine() to implement the rendezvous sequence. This will consolidate and
cleanup the code.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/20110623182057.076997177@sbsiddha-MOBL3.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The APB timers are an IP block from Synopsys (DesignWare APB timers)
and are also found in other systems including ARM SoC's. This patch
adds functions for creating clock_event_devices and clocksources from
APB timers but does not do the resource allocation. This is handled
in a higher layer to allow the timers to be created from multiple
methods such as platform_devices.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
MTRR rendezvous sequence using stop_one_cpu_nowait() can potentially
happen in parallel with another system wide rendezvous using
stop_machine(). This can lead to deadlock (The order in which
works are queued can be different on different cpu's. Some cpu's
will be running the first rendezvous handler and others will be running
the second rendezvous handler. Each set waiting for the other set to join
for the system wide rendezvous, leading to a deadlock).
MTRR rendezvous sequence is not implemented using stop_machine() as this
gets called both from the process context aswell as the cpu online paths
(where the cpu has not come online and the interrupts are disabled etc).
stop_machine() works with only online cpus.
For now, take the stop_machine mutex in the MTRR rendezvous sequence that
gets called from an online cpu (here we are in the process context
and can potentially sleep while taking the mutex). And the MTRR rendezvous
that gets triggered during cpu online doesn't need to take this stop_machine
lock (as the stop_machine() already ensures that there is no cpu hotplug
going on in parallel by doing get_online_cpus())
TBD: Pursue a cleaner solution of extending the stop_machine()
infrastructure to handle the case where the calling cpu is
still not online and use this for MTRR rendezvous sequence.
fixes: https://bugzilla.novell.com/show_bug.cgi?id=672008
Reported-by: Vadim Kotelnikov <vadimuzzz@inbox.ru>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/20110623182056.807230326@sbsiddha-MOBL3.sc.intel.com
Cc: stable@kernel.org # 2.6.35+, backport a week or two after this gets more testing in mainline
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Remove linux/mm.h inclusion from netdevice.h -- it's unused (I've checked manually).
To prevent mm.h inclusion via other channels also extract "enum dma_data_direction"
definition into separate header. This tiny piece is what gluing netdevice.h with mm.h
via "netdevice.h => dmaengine.h => dma-mapping.h => scatterlist.h => mm.h".
Removal of mm.h from scatterlist.h was tried and was found not feasible
on most archs, so the link was cutoff earlier.
Hope people are OK with tiny include file.
Note, that mm_types.h is still dragged in, but it is a separate story.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A few parts of the driver were missing in drivers/iommu.
Move them there to have the complete driver in that
directory.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This should ease finding similarities with different platforms,
with the intention of solving problems once in a generic framework
which everyone can use.
Compile-tested on x86_64.
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The ucode size check has to take the section header size into account
too when sanity checking the section length. Shorten and clarify define
names, while at it.
Caught-by: Ben Hutchings <ben@decadent.org.uk>
Link: http://lkml.kernel.org/r/1302752223.5282.674.camel@localhost
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
There are many functions named mce_* so use a new prefix for the subset
of functions related to sysfs support.
And since f3c6ea1b06 introduces
syscore_ops, use the prefix mce_syscore for some functions related to
power management which were in sysdev_class before.
Before: After:
mce_device mce_sysdev
mce_sysclass mce_sysdev_class
mce_attrs mce_sysdev_attrs
mce_dev_initialized mce_sysdev_initialized
mce_create_device mce_sysdev_create
mce_remove_device mce_sysdev_remove
mce_suspend mce_syscore_suspend
mce_shutdown mce_syscore_shutdown
mce_resume mce_syscore_resume
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/4DEED81B.8020506@jp.fujitsu.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
There are many functions named mce_* so use a new prefix for the subset
of functions dealing with the character device /dev/mcelog.
This change doesn't impact the mce-inject module because the exported
symbol mce_chrdev_ops already has the prefix, therefore it is left
unchanged.
Before: After:
mce_wait mce_chrdev_wait
mce_state_lock mce_chrdev_state_lock
open_count mce_chrdev_open_count
open_exclu mce_chrdev_open_exclu
mce_open mce_chrdev_open
mce_release mce_chrdev_release
mce_read_mutex mce_chrdev_read_mutex
mce_read mce_chrdev_read
mce_poll mce_chrdev_poll
mce_ioctl mce_chrdev_ioctl
mce_log_device mce_chrdev_device
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/4DEED7CD.3040500@jp.fujitsu.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Use a temporary local variable m to simplify the code. No change in
logic.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/4DEED7A8.8020307@jp.fujitsu.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Use temporary local variable sysdev to simplify the code. No change in
logic.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/4DEED777.7080205@jp.fujitsu.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Because "ancient CPUs" like p5 and winchip don't have X86_FEATURE_MCA
(I suppose so), mcheck_cpu_init() on such CPUs will return at check of
mce_available() after __mcheck_cpu_ancient_init().
It is hard to know this implicit behavior without knowing the CPUs
well. So make it clear that we leave mcheck_cpu_init() when the CPU is
initialized in __mcheck_cpu_ancient_init().
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/4DEED74B.20502@jp.fujitsu.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This patch introduces mce_gather_info() which is to be called at the
beginning of error handling and gathers minimum error information from
proper error registers (and saved registers).
As the result of mce_get_rip() is integrated, unnecessary zeroing
is removed. This also takes care of saving RIP which is required to
make some decision about error severity for SRAR errors, instead of
retrieving it later in the handler.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/4DEED71A.1060906@jp.fujitsu.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
The MCE handler uses a special vector for self IPI to invoke
post-emergency processing in an interrupt context, e.g. call an
NMI-unsafe function, wakeup loggers, schedule time-consuming work for
recovery, etc.
This mechanism is now generalized by the following commit:
> e360adbe29
> Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Thu Oct 14 14:01:34 2010 +0800
>
> irq_work: Add generic hardirq context callbacks
>
> Provide a mechanism that allows running code in IRQ context. It is
> most useful for NMI code that needs to interact with the rest of the
> system -- like wakeup a task to drain buffers.
:
So change to use provided generic mechanism.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/4DEED6B2.6080005@jp.fujitsu.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
More specifically:
- sort bits in the macros
- use BITCLR/BITSET
- coordinate message pattern
- use m for struct mce
- cleanup for severities_debugfs_init()
No functional change.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/4DEED679.9090503@jp.fujitsu.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
The current format of an item in this table is:
condition(param, ..., level, message [, condition2 ...])
So we have to check both an item's head and tail to find the conditions
which match the item.
Format them in a more straight forward manner:
item(level, message, condition [, condition2 ...])
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/4DEED61F.5010502@jp.fujitsu.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
The table looks very complicated and hard to read for people other than
skilled developers. So let's clean it up a bit. At first, change format
to ease reading elements in the table.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/4DEED5EB.6050400@jp.fujitsu.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
The "Spurious not enabled" entry is redundant: the "Not enabled" entry
earlier in the table will cover this case.
The "Action required; unknown MCACOD" entry shouldn't specify MCACOD in
the .mask field. Current code will only match for mcacod==0 rather than
all AR=1 entries.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Link: http://lkml.kernel.org/r/4DEED5BC.8030703@jp.fujitsu.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Both the equivalence table and the microcode patch types are u32. Access
them properly through the buf-ptr.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Swap the 1st and 2nd parameters of save_stack_trace_regs()
as same as the parameters of save_stack_trace_tsk().
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Namhyung Kim <namhyung@gmail.com>
Link: http://lkml.kernel.org/r/20110608070921.17777.31103.stgit@fedora15
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[ Also from Ben Hutchings <ben@decadent.org.uk> and Vitaliy Ivanov
<vitalivanov@gmail.com> ]
Commit 06ae40ce07 ("x86 idle: EXPORT_SYMBOL(default_idle, pm_idle)
only when APM demands it") removed the export for pm_idle/default_idle
unless the apm module was modularised and CONFIG_APM_CPU_IDLE was set.
But the apm module uses pm_idle/default_idle unconditionally,
CONFIG_APM_CPU_IDLE only affects the bios idle threshold. Adjust the
export accordingly.
[ Used #ifdef instead of #if defined() as it's shorter, and what both
Ben and Vitaliy used.. Andy, you're out-voted ;) - Linus ]
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm: Compare only lower 32 bits of framebuffer map offsets
drm/i915: Don't leak in i915_gem_shmem_pread_slow()
drm/radeon/kms: do bounds checking for 3D_LOAD_VBPNTR and bump array limit
drm/radeon/kms: fix mac g5 quirk
x86/uv/x2apic: update for change in pci bridge handling.
alpha, drm: Remove obsolete Alpha support in MGA DRM code
alpha/drm: Cleanup Alpha support in DRM generic code
savage: remove unnecessary if statement
drm/radeon: fix GUI idle IH debug statements
drm/radeon/kms: check modes against max pixel clock
drm: fix fbs in DRM_IOCTL_MODE_GETRESOURCES ioctl
This finally allows PCI-Device-IDs to be handled by the
IOMMU driver that have no corresponding struct device
present in the system.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Search for existing dev_data first will allow to switch
dev_data->alias to just store dev_data instead of struct
device.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
With this patch the low-level attach/detach functions only
work on dev_data structures. This allows to remove the
dev_data->dev pointer.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch make the functions flushing the DTE and IOTLBs
only take the dev_data structure instead of the struct
device directly.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This list keeps all allocated iommu_dev_data structs in a
list together. This is needed for instances that have no
associated device.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Remove these function calls from places where the function
has already been called by another function.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
When I added 3448a19da4
I forgot about the special uv handling code for this, so this
patch fixes it up.
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Ingo Molnar
Signed-off-by: Dave Airlie <airlied@redhat.com>
Several fixes as well where the +1 was missing.
Done via coccinelle scripts like:
@@
struct resource *ptr;
@@
- ptr->end - ptr->start + 1
+ resource_size(ptr)
and some grep and typing.
Mostly uncompiled, no cross-compilers.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Unconditionally changing the address limit to USER_DS and not restoring
it to its old value in the error path is wrong because it prevents us
using kernel memory on repeated calls to this function. This, in fact,
breaks the fallback of hard coded paths to the init program from being
ever successful if the first candidate fails to load.
With this patch applied switching to USER_DS is delayed until the point
of no return is reached which makes it possible to have a multi-arch
rootfs with one arch specific init binary for each of the (hard coded)
probed paths.
Since the address limit is already set to USER_DS when start_thread()
will be invoked, this redundancy can be safely removed.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes the following build failure:
drivers/built-in.o: In function `early_init_dt_check_for_initrd':
/home/florian/dev/kernel/x86/linux-2.6-x86/drivers/of/fdt.c:571:
undefined reference to `early_init_dt_setup_initrd_arch'
make: *** [.tmp_vmlinux1] Error 1
which happens as soon as we enable initrd support on a x86 devicetree
platform such as Intel CE4100.
Signed-off-by: Florian Fainelli <ffainelli@freebox.fr>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Maxime Bizon <mbizon@freebox.fr>
Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: stable@kernel.org # 2.6.39
Link: http://lkml.kernel.org/r/201106061015.50039.ffainelli@freebox.fr
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There are multiple declarations of global_clock_event in header files
specific to particular clock event implementations. Consolidate them
in <asm/time.h> and make sure all users include that header.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Venkatesh Pallipadi (Venki) <venki@google.com>
Link: http://lkml.kernel.org/r/20110601180610.762763451@duck.linux-mips.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Move them to drivers/clocksource/i8253.c and remove the
implementations in arch/
[ tglx: Avoid the extra file in lib - folded arch patches in. The
export will become conditional in a later step ]
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Link: http://lkml.kernel.org/r/20110601180610.221426078@duck.linux-mips.net
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
After a newly plugged CPU sets the cpu_online bit it enables
interrupts and goes idle. The cpu which brought up the new cpu waits
for the cpu_online bit and when it observes it, it sets the cpu_active
bit for this cpu. The cpu_active bit is the relevant one for the
scheduler to consider the cpu as a viable target.
With forced threaded interrupt handlers which imply forced threaded
softirqs we observed the following race:
cpu 0 cpu 1
bringup(cpu1);
set_cpu_online(smp_processor_id(), true);
local_irq_enable();
while (!cpu_online(cpu1));
timer_interrupt()
-> wake_up(softirq_thread_cpu1);
-> enqueue_on(softirq_thread_cpu1, cpu0);
^^^^
cpu_notify(CPU_ONLINE, cpu1);
-> sched_cpu_active(cpu1)
-> set_cpu_active((cpu1, true);
When an interrupt happens before the cpu_active bit is set by the cpu
which brought up the newly onlined cpu, then the scheduler refuses to
enqueue the woken thread which is bound to that newly onlined cpu on
that newly onlined cpu due to the not yet set cpu_active bit and
selects a fallback runqueue. Not really an expected and desirable
behaviour.
So far this has only been observed with forced hard/softirq threading,
but in theory this could happen without forced threaded hard/softirqs
as well. It's probably unobservable as it would take a massive
interrupt storm on the newly onlined cpu which causes the softirq loop
to wake up the softirq thread and an even longer delay of the cpu
which waits for the cpu_online bit.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@kernel.org # 2.6.39
Instead of walking the whole PCI tree to update the of_node's for
PCI busses and devices after the fact, enable the new generic core
code for doing so by providing the proper device nodes for the
PCI host bridges
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Some PCIe cards ship with a PCI-PCIe bridge which is not
visible as a PCI device in Linux. But the device-id of the
bridge is present in the IOMMU tables which causes a boot
crash in the IOMMU driver.
This patch fixes by removing these cards from the IOMMU
handling. This is a pure -stable fix, a real fix to handle
this situation appriatly will follow for the next merge
window.
Cc: stable@kernel.org # > 2.6.32
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
There's a fair amount of code in the vsyscall page. It contains
a syscall instruction (in the gettimeofday fallback) and who
knows what will happen if an exploit jumps into the middle of
some other code.
Reduce the risk by replacing the vsyscalls with short magic
incantations that cause the kernel to emulate the real
vsyscalls. These incantations are useless if entered in the
middle.
This causes vsyscalls to be a little more expensive than real
syscalls. Fortunately sensible programs don't use them.
The only exception is time() which is still called by glibc
through the vsyscall - but calling time() millions of times
per second is not sensible. glibc has this fixed in the
development tree.
This patch is not perfect: the vread_tsc and vread_hpet
functions are still at a fixed address. Fixing that might
involve making alternative patching work in the vDSO.
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/e64e1b3c64858820d12c48fa739efbd1485e79d5.1307292171.git.luto@mit.edu
[ Removed the CONFIG option - it's simpler to just do it unconditionally. Tidied up the code as well. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Unfortunatly there are systems where the AMD IOMMU does not
cover all devices. This breaks with the current driver as it
initializes the global dma_ops variable. This patch limits
the AMD IOMMU to the devices listed in the IVRS table fixing
DMA for devices not covered by the IOMMU.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The driver contains several loops counting on an u16 value
where the exit-condition is checked against variables that
can have values up to 0xffff. In this case the loops will
never exit. This patch fixed 3 such loops.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Jumping to 0x00 might do something depending on the following
bytes. Jumping to 0xcc is a trap. So fill the unused parts of
the vsyscall page with 0xcc to make it useless for exploits to
jump there.
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/ed54bfcfbe50a9070d20ec1edbe0d149e22a4568.1307292171.git.luto@mit.edu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It just segfaults since April 2008 (a4928cff), so I'm pretty
sure that nothing uses it. And having an empty section makes
the linker script a bit fragile.
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/4a4abcf47ecadc269f2391a313576fe6d06acef7.1307292171.git.luto@mit.edu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently the HPET mapping is a user-accessible syscall
instruction at a fixed address some of the time.
A sufficiently determined hacker might be able to guess when.
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/ab41b525a4ca346b1ca1145d16fb8d181861a8aa.1307292171.git.luto@mit.edu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It's unnecessary overhead in code that's supposed to be highly
optimized. Removing it allows us to remove one of the two
syscall instructions in the vsyscall page.
The only sensible use for it is for UML users, and it doesn't
fully address inconsistent vsyscall results on UML. The real
fix for UML is to stop using vsyscalls entirely.
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/973ae803fe76f712da4b2740e66dccf452d3b1e4.1307292171.git.luto@mit.edu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move vvars out of the vsyscall page into their own page and mark
it NX.
Without this patch, an attacker who can force a daemon to call
some fixed address could wait until the time contains, say,
0xCD80, and then execute the current time.
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Jesper Juhl <jj@chaosbits.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: pageexec@freemail.hu
Link: http://lkml.kernel.org/r/b1460f81dc4463d66ea3f2b5ce240f58d48effec.1307292171.git.luto@mit.edu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The blacklist was added in response to my bug report
(http://lkml.org/lkml/2006/1/19/362) and has never
contained more than the one entry describing my old
now dead ThinkPad 380XD laptop. As found out later
(http://lkml.org/lkml/2007/11/29/50), this special
treatment has been unnecessary for a long time, so
it can be removed.
Signed-off-by: Tero Roponen <tero.roponen@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix mwait_play_dead() faulting on mwait-incapable cpus
x86 idle: Fix mwait deprecation warning message
Evil merge to remove extra quote noticed by Joe Perches
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Put back -pg to tsc.o and add no GCOV to vread_tsc_64.o
A logic error in mwait_play_dead() causes the kernel to use
mwait even on cpus which don't support it, such as KVM virtual
cpus.
Introduced by:
349c004e3d31: x86: A fast way to check capabilities of the current cpu
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=36222
Reported-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1306758237-9327-1-git-send-email-avi@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param
x86 idle: deprecate "no-hlt" cmdline param
x86 idle APM: deprecate CONFIG_APM_CPU_IDLE
x86 idle floppy: deprecate disable_hlt()
x86 idle: EXPORT_SYMBOL(default_idle, pm_idle) only when APM demands it
x86 idle: clarify AMD erratum 400 workaround
idle governor: Avoid lock acquisition to read pm_qos before entering idle
cpuidle: menu: fixed wrapping timers at 4.294 seconds
mwait_idle() is a C1-only idle loop intended to be more efficient
than HLT on SMP hardware that supports it.
But mwait_idle() has been replaced by the more general
mwait_idle_with_hints(), which handles both C1 and deeper C-states.
ACPI uses only mwait_idle_with_hints(), and never uses mwait_idle().
Deprecate mwait_idle() and the "idle=mwait" cmdline param
to simplify the x86 idle code.
After this change, kernels configured with
(!CONFIG_ACPI=n && !CONFIG_INTEL_IDLE=n) when run on hardware
that support MWAIT will simply use HLT. If MWAIT is desired
on those systems, cpuidle and the cpuidle drivers above
can be used.
cc: x86@kernel.org
cc: stable@kernel.org # .39.x
Signed-off-by: Len Brown <len.brown@intel.com>
We'd rather that modern machines not check if HLT works on
every entry into idle, for the benefit of machines that had
marginal electricals 15-years ago. If those machines are still running
the upstream kernel, they can use "idle=poll". The only difference
will be that they'll now invoke HLT in machine_hlt().
cc: x86@kernel.org # .39.x
Signed-off-by: Len Brown <len.brown@intel.com>
We don't want to export the pm_idle function pointer to modules.
Currently CONFIG_APM_CPU_IDLE w/ CONFIG_APM_MODULE forces us to.
CONFIG_APM_CPU_IDLE is of dubious value, it runs only on 32-bit
uniprocessor laptops that are over 10 years old. It calls into
the BIOS during idle, and is known to cause a number of machines
to fail.
Removing CONFIG_APM_CPU_IDLE and will allow us to stop exporting
pm_idle. Any systems that were calling into the APM BIOS
at run-time will simply use HLT instead.
cc: x86@kernel.org
cc: Jiri Kosina <jkosina@suse.cz>
cc: stable@kernel.org # .39.x
Signed-off-by: Len Brown <len.brown@intel.com>
In the long run, we don't want default_idle() or (pm_idle)() to
be exported outside of process.c. Start by not exporting them
to modules, unless the APM build demands it.
cc: x86@kernel.org
cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Len Brown <len.brown@intel.com>
The workaround for AMD erratum 400 uses the term "c1e" falsely suggesting:
1. Intel C1E is somehow involved
2. All AMD processors with C1E are involved
Use the string "amd_c1e" instead of simply "c1e" to clarify that
this workaround is specific to AMD's version of C1E.
Use the string "e400" to clarify that the workaround is specific
to AMD processors with Erratum 400.
This patch is text-substitution only, with no functional change.
cc: x86@kernel.org
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, asm: Clean up desc.h a bit
x86, amd: Do not enable ARAT feature on AMD processors below family 0x12
x86: Move do_page_fault()'s error path under unlikely()
x86, efi: Retain boot service code until after switching to virtual mode
x86: Remove unnecessary check in detect_ht()
x86: Reorder mm_context_t to remove x86_64 alignment padding and thus shrink mm_struct
x86, UV: Clean up uv_tlb.c
x86, UV: Add support for SGI UV2 hub chip
x86, cpufeature: Update CPU feature RDRND to RDRAND
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (25 commits)
perf: Fix SIGIO handling
perf top: Don't stop if no kernel symtab is found
perf top: Handle kptr_restrict
perf top: Remove unused macro
perf events: initialize fd array to -1 instead of 0
perf tools: Make sure kptr_restrict warnings fit 80 col terms
perf tools: Fix build on older systems
perf symbols: Handle /proc/sys/kernel/kptr_restrict
perf: Remove duplicate headers
ftrace: Add internal recursive checks
tracing: Update btrfs's tracepoints to use u64 interface
tracing: Add __print_symbolic_u64 to avoid warnings on 32bit machine
ftrace: Set ops->flag to enabled even on static function tracing
tracing: Have event with function tracer check error return
ftrace: Have ftrace_startup() return failure code
jump_label: Check entries limit in __jump_label_update
ftrace/recordmcount: Avoid STT_FUNC symbols as base on ARM
scripts/tags.sh: Add magic for trace-events for etags too
scripts/tags.sh: Fix ctags for DEFINE_EVENT()
x86/ftrace: Fix compiler warning in ftrace.c
...