when building 32-bit, I see this ..
arch/x86/kernel/pvclock.c:63:7: warning: "__x86_64__" is not defined
Signed-off-by: Dave Jones <davej@redhat.com>
LKML-Reference: <20090713201437.GA12165@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The variable apic_numaq placed in noninit section references the
function wakeup_secondary_cpu_via_nmi(), which is in __cpuinit
section. Thus causes a section mismatch warning. To avoid such
mismatch we mark apic_numaq as __refdata.
We were warned by the following warning:
WARNING: arch/x86/kernel/built-in.o(.data+0x932c): Section mismatch in
reference from the variable apic_numaq to the function
.cpuinit.text:wakeup_secondary_cpu_via_nmi()
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
LKML-Reference: <b9df5fa10907120407p6b4f67dtf4d563155488188a@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The variable apic_es7000_cluster references the function __cpuinit
wakeup_secondary_cpu_via_mip() from a noninit section. So we've been
warned by the following warning. To avoid possible collision between
init/noninit, its best to mark the variable as __refdata.
We were warned by the following warning:
LD arch/x86/kernel/apic/built-in.o
WARNING: arch/x86/kernel/apic/built-in.o(.data+0x198c): Section
mismatch in reference from the variable apic_es7000_cluster to the
function .cpuinit.text:wakeup_secondary_cpu_via_mip()
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
LKML-Reference: <b9df5fa10907120404k6279a10ch5e9682432272706f@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I've attached a patch to remove the Pentium M special casing of
EMON and as noticed at least with my Pentium M the hardware PMU
now works:
Performance counter stats for '/bin/ls /var/tmp':
1.809988 task-clock-msecs # 0.125 CPUs
1 context-switches # 0.001 M/sec
0 CPU-migrations # 0.000 M/sec
224 page-faults # 0.124 M/sec
1425648 cycles # 787.656 M/sec
912755 instructions # 0.640 IPC
Vince suggested that this code was trying to address erratum
Y17 in Pentium-M's:
http://download.intel.com/support/processors/mobile/pm/sb/25266532.pdf
But that erratum (related to IA32_MISC_ENABLES.7) does not
affect perfcounters as we dont use this toggle to disable RDPMC
and WRMSR/RDMSR access to performance counters. We keep cr4's
bit 8 (X86_CR4_PCE) clear so unprivileged RDPMC access is not
allowed anyway.
Cc: Vince Weaver <vince@deater.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@googlemail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits)
perf report: Add "Fractal" mode output - support callchains with relative overhead rate
perf_counter tools: callchains: Manage the cumul hits on the fly
perf report: Change default callchain parameters
perf report: Use a modifiable string for default callchain options
perf report: Warn on callchain output request from non-callchain file
x86: atomic64: Inline atomic64_read() again
x86: atomic64: Clean up atomic64_sub_and_test() and atomic64_add_negative()
x86: atomic64: Improve atomic64_xchg()
x86: atomic64: Export APIs to modules
x86: atomic64: Improve atomic64_read()
x86: atomic64: Code atomic(64)_read and atomic(64)_set in C not CPP
x86: atomic64: Fix unclean type use in atomic64_xchg()
x86: atomic64: Make atomic_read() type-safe
x86: atomic64: Reduce size of functions
x86: atomic64: Improve atomic64_add_return()
x86: atomic64: Improve cmpxchg8b()
x86: atomic64: Improve atomic64_read()
x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file
x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too
perf report: Annotate variable initialization
...
Stephen reported that his DL585 G2 needed noapic after 2.6.22 (?)
Dann bisected it down to:
commit 30a18d6c3f
Date: Tue Feb 19 03:21:20 2008 -0800
x86: multi pci root bus with different io resource range, on
64-bit
It turns out that:
1. that AMD-based systems have two HT chains.
2. BIOS doesn't allocate resources for BAR 6 of devices under 8132 etc
3. that multi-peer-root patch will try to split root resources to peer
root resources according to PCI conf of NB
4. PCI core assigns unassigned resources, but they overlap with BARs
that are used by ioapic addr of io4 and 8132.
The reason: at that point ioapic address are not inserted yet. Solution
is to insert ioapic resources into the tree a bit earlier.
Reported-by: Stephen Frost <sfrost@snowman.net>
Reported-and-Tested-by: dann frazier <dannf@hp.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jesse Barnes <jbarnes@jbarnes-g45.(none)>
Ingo noticed that both AMD and P6 call
x86_pmu_disable_counter() on *_pmu_enable_counter(). This is
because we rely on the side effect of that call to program
the event config but not touch the EN bit.
We change that for AMD by having enable_all() simply write
the full config in, and for P6 by explicitly coding it.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The P6 doesn't seem to support cache ref/hit/miss counts, so
we extend the generic hardware event codes to have 0 and -1
mean the same thing as for the generic cache events.
Furthermore, it turns out the 0 event does not count
(that is, its reported that on PPro it actually does count
something), therefore use a event configuration that's
specified not to count to disable the counters.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add basic P6 PMU support. The P6 uses the EVNTSEL0 EN bit to
enable/disable both its counters. We use this for the
global enable/disable, and clear all config bits (except EN)
to disable individual counters.
Actual ia32 hardware doesn't support lfence, so use a locked
op without side-effect to implement a full barrier.
perf stat and perf record seem to function correctly.
[a.p.zijlstra@chello.nl: cleanups and complete the enable/disable code]
Signed-off-by: Vince Weaver <vince@deater.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <Pine.LNX.4.64.0907081718450.2715@pianoman.cluster.toy>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit 5fd29d6ccb ("printk: clean up
handling of log-levels and newlines") changed printk semantics. printk
lines with multiple KERN_<level> prefixes are no longer emitted as
before the patch.
<level> is now included in the output on each additional use.
Remove all uses of multiple KERN_<level>s in formats.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide support for family 0xf processors with 2 P-states
below the elevator voltage. Remove the checks that prevent
this configuration from being supported and increase the
transition voltage to prevent errors during the transition.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: fix usage of bios intcall()
x86: Remove unused function lapic_watchdog_ok()
x86: Remove unused variable disable_x2apic
x86, kvm: Fix section mismatches in kvm.c
x86: Add missing annotation to arch/x86/lib/copy_user_64.S::copy_to_user
x86: Fix fixmap page order for FIX_TEXT_POKE0,1
amd-iommu: set evt_buf_size correctly
amd-iommu: handle alias entries correctly in init code
x86: Fix printk call in print_local_apic()
x86: Declare check_efer() before it gets used
x86: Mark device_nb as static and fix NULL noise
x86: Remove double declaration of MSR_P6_EVNTSEL0 and MSR_P6_EVNTSEL1
xen: Use kcalloc() in xen_init_IRQ()
x86: Fix fixmap ordering
x86: Fix symbol annotation for arch/x86/lib/clear_page_64.S::clear_page_c
lapic_watchdog_ok() is a global function but no one is using it.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1246554335.2242.29.camel@jaswinder.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
setup_nox2apic() is writing 1 to disable_x2apic but no one is reading it.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
LKML-Reference: <1246554239.2242.27.camel@jaswinder.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The function paravirt_ops_setup() has been refering the
variable no_timer_check, which is a __initdata. Thus generates
the following warning. paravirt_ops_setup() function is called
from kvm_guest_init() which is a __init function. So to fix
this we mark paravirt_ops_setup as __init.
The sections-check output that warned us about this was:
LD arch/x86/built-in.o
WARNING: arch/x86/built-in.o(.text+0x166ce): Section mismatch in
reference from the function paravirt_ops_setup() to the variable
.init.data:no_timer_check
The function paravirt_ops_setup() references
the variable __initdata no_timer_check.
This is often because paravirt_ops_setup lacks a __initdata
annotation or the annotation of no_timer_check is wrong.
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Acked-by: Avi Kivity <avi@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <b9df5fa10907012240y356427b8ta4bd07f0efc6a049@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* git://git.infradead.org/iommu-2.6: (38 commits)
intel-iommu: Don't keep freeing page zero in dma_pte_free_pagetable()
intel-iommu: Introduce first_pte_in_page() to simplify PTE-setting loops
intel-iommu: Use cmpxchg64_local() for setting PTEs
intel-iommu: Warn about unmatched unmap requests
intel-iommu: Kill superfluous mapping_lock
intel-iommu: Ensure that PTE writes are 64-bit atomic, even on i386
intel-iommu: Make iommu=pt work on i386 too
intel-iommu: Performance improvement for dma_pte_free_pagetable()
intel-iommu: Don't free too much in dma_pte_free_pagetable()
intel-iommu: dump mappings but don't die on pte already set
intel-iommu: Combine domain_pfn_mapping() and domain_sg_mapping()
intel-iommu: Introduce domain_sg_mapping() to speed up intel_map_sg()
intel-iommu: Simplify __intel_alloc_iova()
intel-iommu: Performance improvement for domain_pfn_mapping()
intel-iommu: Performance improvement for dma_pte_clear_range()
intel-iommu: Clean up iommu_domain_identity_map()
intel-iommu: Remove last use of PHYSICAL_PAGE_MASK, for reserving PCI BARs
intel-iommu: Make iommu_flush_iotlb_psi() take pfn as argument
intel-iommu: Change aligned_size() to aligned_nrpages()
intel-iommu: Clean up intel_map_sg(), remove domain_page_mapping()
...
fix hang with HIGHMEM_64G and 32bit resource. According to hpa and
Linus, use (resource_size_t)-1 to fend off big ranges.
Analyzed by hpa
Reported-and-tested-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The setting of this variable got lost during the suspend/resume
implementation. But keeping this variable zero causes a divide-by-zero
error in the interrupt handler. This patch fixes this.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
An alias entry in the ACPI table means that the device can send requests to the
IOMMU with both device ids, its own and the alias. This is not handled properly
in the ACPI init code. This patch fixes the issue.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
About every callchains recorded with perf record are filled up
including the internal perfcounter nmi frame:
perf_callchain
perf_counter_overflow
intel_pmu_handle_irq
perf_counter_nmi_handler
notifier_call_chain
atomic_notifier_call_chain
notify_die
do_nmi
nmi
We want ignore this frame as it's not interesting for
instrumentation. To solve this, we simply ignore every frames
from nmi context.
New example of "perf report -s sym -c" after this patch:
9.59% [k] search_by_key
4.88%
search_by_key
reiserfs_read_locked_inode
reiserfs_iget
reiserfs_lookup
do_lookup
__link_path_walk
path_walk
do_path_lookup
user_path_at
vfs_fstatat
vfs_lstat
sys_newlstat
system_call_fastpath
__lxstat
0x406fb1
3.19%
search_by_key
search_by_entry_key
reiserfs_find_entry
reiserfs_lookup
do_lookup
__link_path_walk
path_walk
do_path_lookup
user_path_at
vfs_fstatat
vfs_lstat
sys_newlstat
system_call_fastpath
__lxstat
0x406fb1
[...]
For now this patch only solves the problem in x86-64.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246474930-6088-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This sparse warning:
arch/x86/kernel/amd_iommu.c:1195:23: warning: symbol 'device_nb' was not declared. Should it be static?
triggers because device_nb is global but is only used in a
single .c file. change device_nb to static to fix that - this
also addresses the sparse warning.
This sparse warning:
arch/x86/kernel/amd_iommu.c:1766:10: warning: Using plain integer as NULL pointer
triggers because plain integer 0 is used in place of a NULL
pointer. change 0 to NULL to fix that - this also address the
sparse warning.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
LKML-Reference: <1246458194.6940.20.camel@hpdv5.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (47 commits)
perf report: Add --symbols parameter
perf report: Add --comms parameter
perf report: Add --dsos parameter
perf_counter tools: Adjust only prelinked symbol's addresses
perf_counter: Provide a way to enable counters on exec
perf_counter tools: Reduce perf stat measurement overhead/skew
perf stat: Use percentages for scaling output
perf_counter, x86: Update x86_pmu after WARN()
perf stat: Micro-optimize the code: memcpy is only required if no event is selected and !null_run
perf stat: Improve output
perf stat: Fix multi-run stats
perf stat: Add -n/--null option to run without counters
perf_counter tools: Remove dead code
perf_counter: Complete counter swap
perf report: Print sorted callchains per histogram entries
perf_counter tools: Prepare a small callchain framework
perf record: Fix unhandled io return value
perf_counter tools: Add alias for 'l1d' and 'l1i'
perf-report: Add bare minimum PERF_EVENT_READ parsing
perf-report: Add modes for inherited stats and no-samples
...
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
Revert "x86: cap iomem_resource to addressable physical memory"
The print out should read the value before changing the value.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4A487017.4090007@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, delay: tsc based udelay should have rdtsc_barrier
x86, setup: correct include file in <asm/boot.h>
x86, setup: Fix typo "CONFIG_x86_64" in <asm/boot.h>
x86, mce: percpu mcheck_timer should be pinned
x86: Add sysctl to allow panic on IOCK NMI error
x86: Fix uv bau sending buffer initialization
x86, mce: Fix mce resume on 32bit
x86: Move init_gbpages() to setup_arch()
x86: ensure percpu lpage doesn't consume too much vmalloc space
x86: implement percpu_alloc kernel parameter
x86: fix pageattr handling for lpage percpu allocator and re-enable it
x86: reorganize cpa_process_alias()
x86: prepare setup_pcpu_lpage() for pageattr fix
x86: rename remap percpu first chunk allocator to lpage
x86: fix duplicate free in setup_pcpu_remap() failure path
percpu: fix too lazy vunmap cache flushing
x86: Set cpu_llc_id on AMD CPUs
This reverts commit 95ee14e437.
Mikael Petterson <mikepe@it.uu.se> reported that at least one of his
systems will not boot as a result. We have ruled out the detection
algorithm malfunctioning, so it is not a matter of producing the
incorrect bitmasks; rather, something in the application of them
fails.
Revert the commit until we can root cause and correct this problem.
-stable team: this means the underlying commit should be rejected.
Reported-and-isolated-by: Mikael Petterson <mikpe@it.uu.se>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <200906261559.n5QFxJH8027336@pilspetsen.it.uu.se>
Cc: stable@kernel.org
Cc: Grant Grundler <grundler@parisc-linux.org>
If CONFIG_NO_HZ + CONFIG_SMP, timer added via add_timer() might
be migrated on other cpu. Use add_timer_on() instead.
Avoids the following failure:
Maciej Rutecki wrote:
> > After normal boot I try:
> >
> > echo 1 > /sys/devices/system/machinecheck/machinecheck0/check_interval
> >
> > I found this in dmesg:
> >
> > [ 141.704025] ------------[ cut here ]------------
> > [ 141.704039] WARNING: at arch/x86/kernel/cpu/mcheck/mce.c:1102
> > mcheck_timer+0xf5/0x100()
Reported-by: Maciej Rutecki <maciej.rutecki@gmail.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Tested-by: Maciej Rutecki <maciej.rutecki@gmail.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This patch introduces a new sysctl:
/proc/sys/kernel/panic_on_io_nmi
which defaults to 0 (off).
When enabled, the kernel panics when the kernel receives an NMI
caused by an IO error.
The IO error triggered NMI indicates a serious system
condition, which could result in IO data corruption. Rather
than contiuing, panicing and dumping might be a better choice,
so one can figure out what's causing the IO error.
This could be especially important to companies running IO
intensive applications where corruption must be avoided, e.g. a
bank's databases.
[ SuSE has been shipping it for a while, it was done at the
request of a large database vendor, for their users. ]
Signed-off-by: Kurt Garloff <garloff@suse.de>
Signed-off-by: Roberto Angelino <robertangelino@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
LKML-Reference: <20090624213211.GA11291@kroah.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Update the mmap control page with the needed information to
use the userspace RDPMC instruction for self monitoring.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (72 commits)
asus-laptop: remove EXPERIMENTAL dependency
asus-laptop: use pr_fmt and pr_<level>
eeepc-laptop: cpufv updates
eeepc-laptop: sync eeepc-laptop with asus_acpi
asus_acpi: Deprecate in favor of asus-laptop
acpi4asus: update MAINTAINER and KConfig links
asus-laptop: platform dev as parent for led and backlight
eeepc-laptop: enable camera by default
ACPI: Rename ACPI processor device bus ID
acerhdf: Acer Aspire One fan control
ACPI: video: DMI workaround broken Acer 7720 BIOS enabling display brightness
ACPI: run ACPI device hot removal in kacpi_hotplug_wq
ACPI: Add the reference count to avoid unloading ACPI video bus twice
ACPI: DMI to disable Vista compatibility on some Sony laptops
ACPI: fix a deadlock in hotplug case
Show the physical device node of backlight class device.
ACPI: pdc init related memory leak with physical CPU hotplug
ACPI: pci_root: remove unused dev/fn information
ACPI: pci_root: simplify list traversals
ACPI: pci_root: use driver data rather than list lookup
...
The initialization of the UV Broadcast Assist Unit's sending
buffers was making an invalid assumption about the
initialization of an MMR that defines its address.
The BIOS will not be providing that MMR. So
uv_activation_descriptor_init() should unconditionally set it.
Tested on UV simulator.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org> # for v2.6.30.x
LKML-Reference: <E1MJTfj-0005i1-W8@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Previous code made an assumption that the power on value of global
control MSR has enabled all fixed and general purpose counters properly.
However, this is not the case for certain Intel processors, such as
Atom - and it might also be firmware dependent.
Each enable bit in IA32_PERF_GLOBAL_CTRL is AND'ed with the
enable bits for all privilege levels in the respective IA32_PERFEVTSELx
or IA32_PERF_FIXED_CTR_CTRL MSRs to start/stop the counting of
respective counters. Counting is enabled if the AND'ed results is true;
counting is disabled when the result is false.
The end result is that all fixed counters are always disabled on Atom
processors because the assumption is just invalid.
Fix this by not initializing the ctrl-mask out of the global MSR,
but setting it to perf_counter_mask.
Reported-by: Stephane Eranian <eranian@googlemail.com>
Signed-off-by: Yong Wang <yong.y.wang@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20090624021324.GA2788@ywang-moblin2.bj.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To support domain-isolation usages, the platform hardware must be
capable of uniquely identifying the requestor (source-id) for each
interrupt message. Without source-id checking for interrupt remapping
, a rouge guest/VM with assigned devices can launch interrupt attacks
to bring down anothe guest/VM or the VMM itself.
This patch adds source-id checking for interrupt remapping, and then
really isolates interrupts for guests/VMs with assigned devices.
Because PCI subsystem is not initialized yet when set up IOAPIC
entries, use read_pci_config_byte to access PCI config space directly.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The init_gbpages() function is conditionally called from
init_memory_mapping() function. There are two call-sites where
this 'after_bootmem' condition can be true: setup_arch() and
mem_init() via pci_iommu_alloc().
Therefore, it's safe to move the call to init_gbpages() to
setup_arch() as it's always called before mem_init().
This removes an after_bootmem use - paving the way to remove
all uses of that state variable.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <Pine.LNX.4.64.0906221731210.19474@melkki.cs.Helsinki.FI>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* git://git.infradead.org/~dwmw2/iommu-2.6.31:
intel-iommu: Fix one last ia64 build problem in Pass Through Support
VT-d: support the device IOTLB
VT-d: cleanup iommu_flush_iotlb_psi and flush_unmaps
VT-d: add device IOTLB invalidation support
VT-d: parse ATSR in DMA Remapping Reporting Structure
PCI: handle Virtual Function ATS enabling
PCI: support the ATS capability
intel-iommu: dmar_set_interrupt return error value
intel-iommu: Tidy up iommu->gcmd handling
intel-iommu: Fix tiny theoretical race in write-buffer flush.
intel-iommu: Clean up handling of "caching mode" vs. IOTLB flushing.
intel-iommu: Clean up handling of "caching mode" vs. context flushing.
VT-d: fix invalid domain id for KVM context flush
Fix !CONFIG_DMAR build failure introduced by Intel IOMMU Pass Through Support
Intel IOMMU Pass Through Support
Fix up trivial conflicts in drivers/pci/{intel-iommu.c,intr_remapping.c}
On extreme configuration (e.g. 32bit 32-way NUMA machine), lpage
percpu first chunk allocator can consume too much of vmalloc space.
Make it fall back to 4k allocator if the consumption goes over 20%.
[ Impact: add sanity check for lpage percpu first chunk allocator ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jan Beulich <JBeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
According to Andi, it isn't clear whether lpage allocator is worth the
trouble as there are many processors where PMD TLB is far scarcer than
PTE TLB. The advantage or disadvantage probably depends on the actual
size of percpu area and specific processor. As performance
degradation due to TLB pressure tends to be highly workload specific
and subtle, it is difficult to decide which way to go without more
data.
This patch implements percpu_alloc kernel parameter to allow selecting
which first chunk allocator to use to ease debugging and testing.
While at it, make sure all the failure paths report why something
failed to help determining why certain allocator isn't working. Also,
kill the "Great future plan" comment which had already been realized
quite some time ago.
[ Impact: allow explicit percpu first chunk allocator selection ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jan Beulich <JBeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
lpage allocator aliases a PMD page for each cpu and returns whatever
is unused to the page allocator. When the pageattr of the recycled
pages are changed, this makes the two aliases point to the overlapping
regions with different attributes which isn't allowed and known to
cause subtle data corruption in certain cases.
This can be handled in simliar manner to the x86_64 highmap alias.
pageattr code should detect if the target pages have PMD alias and
split the PMD alias and synchronize the attributes.
pcpur allocator is updated to keep the allocated PMD pages map sorted
in ascending address order and provide pcpu_lpage_remapped() function
which binary searches the array to determine whether the given address
is aliased and if so to which address. pageattr is updated to use
pcpu_lpage_remapped() to detect the PMD alias and split it up as
necessary from cpa_process_alias().
Jan Beulich spotted the original problem and incorrect usage of vaddr
instead of laddr for lookup.
With this, lpage percpu allocator should work correctly. Re-enable
it.
[ Impact: fix subtle lpage pageattr bug and re-enable lpage ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jan Beulich <JBeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Make the following changes in preparation of coming pageattr updates.
* Define and use array of struct pcpul_ent instead of array of
pointers. The only difference is ->cpu field which is set but
unused yet.
* Rename variables according to the above change.
* Rename local variable vm to pcpul_vm and move it out of the
function.
[ Impact: no functional difference ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
The "remap" allocator remaps large pages to build the first chunk;
however, the name isn't very good because 4k allocator remaps too and
the whole point of the remap allocator is using large page mapping.
The allocator will be generalized and exported outside of x86, rename
it to lpage before that happens.
percpu_alloc kernel parameter is updated to accept both "remap" and
"lpage" for lpage allocator.
[ Impact: code cleanup, kernel parameter argument updated ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
In the failure path, setup_pcpu_remap() tries to free the area which
has already been freed to make holes in the large page. Fix it.
[ Impact: fix duplicate free in failure path ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
This counts when building sched domains in case NUMA information
is not available.
( See cpu_coregroup_mask() which uses llc_shared_map which in turn is
created based on cpu_llc_id. )
Currently Linux builds domains as follows:
(example from a dual socket quad-core system)
CPU0 attaching sched-domain:
domain 0: span 0-7 level CPU
groups: 0 1 2 3 4 5 6 7
...
CPU7 attaching sched-domain:
domain 0: span 0-7 level CPU
groups: 7 0 1 2 3 4 5 6
Ever since that is borked for multi-core AMD CPU systems.
This patch fixes that and now we get a proper:
CPU0 attaching sched-domain:
domain 0: span 0-3 level MC
groups: 0 1 2 3
domain 1: span 0-7 level CPU
groups: 0-3 4-7
...
CPU7 attaching sched-domain:
domain 0: span 4-7 level MC
groups: 7 4 5 6
domain 1: span 0-7 level CPU
groups: 4-7 0-3
This allows scheduler to assign tasks to cores on different sockets
(i.e. that don't share last level cache) for performance reasons.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20090619085909.GJ5218@alberich.amd.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
perfcounter: Handle some IO return values
perf_counter: Push perf_sample_data through the swcounter code
perf_counter tools: Define and use our own u64, s64 etc. definitions
perf_counter: Close race in perf_lock_task_context()
perf_counter, x86: Improve interactions with fast-gup
perf_counter: Simplify and fix task migration counting
perf_counter tools: Add a data file header
perf_counter: Update userspace callchain sampling uses
perf_counter: Make callchain samples extensible
perf report: Filter to parent set by default
perf_counter tools: Handle lost events
perf_counter: Add event overlow handling
fs: Provide empty .set_page_dirty() aop for anon inodes
perf_counter: tools: Makefile tweaks for 64-bit powerpc
perf_counter: powerpc: Add processor back-end for MPC7450 family
perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels
perf_counter: powerpc: Change how processor-specific back-ends get selected
perf_counter: powerpc: Use unsigned long for register and constraint values
perf_counter: powerpc: Enable use of software counters on 32-bit powerpc
perf_counter tools: Add and use isprint()
...
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix out of scope variable access in sched_slice()
sched: Hide runqueues from direct refer at source code level
sched: Remove unneeded __ref tag
sched, x86: Fix cpufreq + sched_clock() TSC scaling
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits)
tracing/urgent: warn in case of ftrace_start_up inbalance
tracing/urgent: fix unbalanced ftrace_start_up
function-graph: add stack frame test
function-graph: disable when both x86_32 and optimize for size are configured
ring-buffer: have benchmark test print to trace buffer
ring-buffer: do not grab locks in nmi
ring-buffer: add locks around rb_per_cpu_empty
ring-buffer: check for less than two in size allocation
ring-buffer: remove useless compile check for buffer_page size
ring-buffer: remove useless warn on check
ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index
tracing: update sample event documentation
tracing/filters: fix race between filter setting and module unload
tracing/filters: free filter_string in destroy_preds()
ring-buffer: use commit counters for commit pointer accounting
ring-buffer: remove unused variable
ring-buffer: have benchmark test handle discarded events
ring-buffer: prevent adding write in discarded area
tracing/filters: strloc should be unsigned short
tracing/filters: operand can be negative
...
Fix up kmemcheck-induced conflict in kernel/trace/ring_buffer.c manually
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (45 commits)
x86, mce: fix error path in mce_create_device()
x86: use zalloc_cpumask_var for mce_dev_initialized
x86: fix duplicated sysfs attribute
x86: de-assembler-ize asm/desc.h
i386: fix/simplify espfix stack switching, move it into assembly
i386: fix return to 16-bit stack from NMI handler
x86, ioapic: Don't call disconnect_bsp_APIC if no APIC present
x86: Remove duplicated #include's
x86: msr.h linux/types.h is only required for __KERNEL__
x86: nmi: Add Intel processor 0x6f4 to NMI perfctr1 workaround
x86, mce: mce_intel.c needs <asm/apic.h>
x86: apic/io_apic.c: dmar_msi_type should be static
x86, io_apic.c: Work around compiler warning
x86: mce: Don't touch THERMAL_APIC_VECTOR if no active APIC present
x86: mce: Handle banks == 0 case in K7 quirk
x86, boot: use .code16gcc instead of .code16
x86: correct the conversion of EFI memory types
x86: cap iomem_resource to addressable physical memory
x86, mce: rename _64.c files which are no longer 64-bit-specific
x86, mce: mce.h cleanup
...
Manually fix up trivial conflict in arch/x86/mm/fault.c
arch_acpi_processor_cleanup_pdc() in x86 and ia64 results in memory allocated
for _PDC objects that is never freed and will cause memory leak in case of
physical CPU remove and add. Patch fixes the memory leak by freeing the
objects soon after _PDC is evaluated.
Reported-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Before exposing upstream tools to a callchain-samples ABI, tidy it
up to make it more extensible in the future:
Use markers in the IP chain to denote context, use (u64)-1..-4095 range
for these context markers because we use them for ERR_PTR(), so these
addresses are unlikely to be mapped.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In case gcc does something funny with the stack frames, or the return
from function code, we would like to detect that.
An arch may implement passing of a variable that is unique to the
function and can be saved on entering a function and can be tested
when exiting the function. Usually the frame pointer can be used for
this purpose.
This patch also implements this for x86. Where it passes in the stack
frame of the parent function, and will test that frame on exit.
There was a case in x86_32 with optimize for size (-Os) where, for a
few functions, gcc would align the stack frame and place a copy of the
return address into it. The function graph tracer modified the copy and
not the actual return address. On return from the funtion, it did not go
to the tracer hook, but returned to the parent. This broke the function
graph tracer, because the return of the parent (where gcc did not do
this funky manipulation) returned to the location that the child function
was suppose to. This caused strange kernel crashes.
This test detected the problem and pointed out where the issue was.
This modifies the parameters of one of the functions that the arch
specific code calls, so it includes changes to arch code to accommodate
the new prototype.
Note, I notice that the parsic arch implements its own push_return_trace.
This is now a generic function and the ftrace_push_return_trace should be
used instead. This patch does not touch that code.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Enable gcov profiling of the entire kernel on x86_64. Required changes
include disabling profiling for:
* arch/kernel/acpi/realmode and arch/kernel/boot/compressed:
not linked to main kernel
* arch/vdso, arch/kernel/vsyscall_64 and arch/kernel/hpet:
profiling causes segfaults during boot (incompatible context)
Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Li Wei <W.Li@Sun.COM>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need a cleared cpu_mask to record if mce is initialized, especially
when MAXSMP is used.
used zalloc_... instead
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: stable@kernel.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The sysfs attribute cmci_disabled was accidentall turned into a
duplicate of ignore_ce, breaking all other attributes.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
asm/desc.h is included in three assembly files, but the only macro
it defines, GET_DESC_BASE, is never used. This patch removes the
includes, removes the macro GET_DESC_BASE and the ASSEMBLY guard
from asm/desc.h.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The espfix code triggers if we have a protected mode userspace
application with a 16-bit stack. On returning to userspace, with iret,
the CPU doesn't restore the high word of the stack pointer. This is an
"official" bug, and the work-around used in the kernel is to temporarily
switch to a 32-bit stack segment/pointer pair where the high word of the
pointer is equal to the high word of the userspace stackpointer.
The current implementation uses THREAD_SIZE to determine the cut-off,
but there is no good reason not to use the more natural 64kb... However,
implementing this by simply substituting THREAD_SIZE with 65536 in
patch_espfix_desc crashed the test application. patch_espfix_desc tries
to do what is described above, but gets it subtly wrong if the userspace
stack pointer is just below a multiple of THREAD_SIZE: an overflow
occurs to bit 13... With a bit of luck, when the kernelspace
stackpointer is just below a 64kb-boundary, the overflow then ripples
trough to bit 16 and userspace will see its stack pointer changed by
65536.
This patch moves all espfix code into entry_32.S. Selecting a 16-bit
cut-off simplifies the code. The game with changing the limit dynamically
is removed too. It complicates matters and I see no value in it. Changing
only the top 16-bit word of ESP is one instruction and it also implies
that only two bytes of the ESPFIX GDT entry need to be changed and this
can be implemented in just a handful simple to understand instructions.
As a side effect, the operation to compute the original ESP from the
ESPFIX ESP and the GDT entry simplifies a bit too, and the remaining
three instructions have been expanded inline in entry_32.S.
impact: can now reliably run userspace with ESP=xxxxfffc on 16-bit
stack segment
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Acked-by: Stas Sergeev <stsp@aknet.ru>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Returning to a task with a 16-bit stack requires special care: the iret
instruction does not restore the high word of esp in that case. The
espfix code fixes this, but currently is not invoked on NMIs. This means
that a running task gets the upper word of esp clobbered due intervening
NMIs. To reproduce, compile and run the following program with the nmi
watchdog enabled (nmi_watchdog=2 on the command line). Using gdb you can
see that the high bits of esp contain garbage, while the low bits are
still correct.
This patch puts the espfix code back into the NMI code path.
The patch is slightly complicated due to the irqtrace infrastructure not
being NMI-safe. The NMI return path cannot call TRACE_IRQS_IRET.
Otherwise, the tail of the normal iret-code is correct for the nmi code
path too. To be able to share this code-path, the TRACE_IRQS_IRET was
move up a bit. The espfix code exists after the TRACE_IRQS_IRET, but
this code explicitly disables interrupts. This short interrupts-off
section is now not traced anymore. The return-to-kernel path now always
includes the preliminary test to decide if the espfix code should be
called. This is never the case, but doing it this way keeps the patch as
simple as possible and the few extra instructions should not affect
timing in any significant way.
#define _GNU_SOURCE
#include <stdio.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <asm/ldt.h>
int modify_ldt(int func, void *ptr, unsigned long bytecount)
{
return syscall(SYS_modify_ldt, func, ptr, bytecount);
}
/* this is assumed to be usable */
#define SEGBASEADDR 0x10000
#define SEGLIMIT 0x20000
/* 16-bit segment */
struct user_desc desc = {
.entry_number = 0,
.base_addr = SEGBASEADDR,
.limit = SEGLIMIT,
.seg_32bit = 0,
.contents = 0, /* ??? */
.read_exec_only = 0,
.limit_in_pages = 0,
.seg_not_present = 0,
.useable = 1
};
int main(void)
{
setvbuf(stdout, NULL, _IONBF, 0);
/* map a 64 kb segment */
char *pointer = mmap((void *)SEGBASEADDR, SEGLIMIT+1,
PROT_EXEC|PROT_READ|PROT_WRITE,
MAP_SHARED|MAP_ANONYMOUS, -1, 0);
if (pointer == NULL) {
printf("could not map space\n");
return 0;
}
/* write ldt, new mode */
int err = modify_ldt(0x11, &desc, sizeof(desc));
if (err) {
printf("error modifying ldt: %i\n", err);
return 0;
}
for (int i=0; i<1000; i++) {
asm volatile (
"pusha\n\t"
"mov %ss, %eax\n\t" /* preserve ss:esp */
"mov %esp, %ebp\n\t"
"push $7\n\t" /* index 0, ldt, user mode */
"push $65536-4096\n\t" /* esp */
"lss (%esp), %esp\n\t" /* switch to new stack */
"push %eax\n\t" /* save old ss:esp on new stack */
"push %ebp\n\t"
"add $17*65536, %esp\n\t" /* set high bits */
"mov %esp, %edx\n\t"
"mov $10000000, %ecx\n\t" /* wait... */
"1: loop 1b\n\t" /* ... a bit */
"cmp %esp, %edx\n\t"
"je 1f\n\t"
"ud2\n\t" /* esp changed inexplicably! */
"1:\n\t"
"sub $17*65536, %esp\n\t" /* restore high bits */
"lss (%esp), %esp\n\t" /* restore old ss:esp */
"popa\n\t");
printf("\rx%ix", i);
}
return 0;
}
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Acked-by: Stas Sergeev <stsp@aknet.ru>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Commit 9e350de37a ("perf_counter: Accurate period data")
missed a spot, which caused all Intel-PMU samples to have a
period of 0.
This broke auto-freq sampling.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] cpumask: new cpumask operators for arch/x86/kernel/cpu/cpufreq/powernow-k8.c
[CPUFREQ] cpumask: avoid playing with cpus_allowed in powernow-k8.c
[CPUFREQ] cpumask: avoid cpumask games in arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c
[CPUFREQ] cpumask: avoid playing with cpus_allowed in speedstep-ich.c
[CPUFREQ] powernow-k8: get drv data for correct CPU
[CPUFREQ] powernow-k8: read P-state from HW
[CPUFREQ] reduce scope of ACPI_PSS_BIOS_BUG_MSG[]
[CPUFREQ] Clean up convoluted code in arch/x86/kernel/tsc.c:time_cpufreq_notifier()
[CPUFREQ] minor correction to cpu-freq documentation
[CPUFREQ] powernow-k8.c: mess cleanup
[CPUFREQ] Only set sampling_rate_max deprecated, sampling_rate_min is useful
[CPUFREQ] powernow-k8: Set transition latency to 1 if ACPI tables export 0
[CPUFREQ] ondemand: Uncouple minimal sampling rate from HZ in NO_HZ case
Expand Intel NMI perfctr1 workaround to include a Core2 processor stepping
(cpuid family-6, model-f, stepping-4). Resolves a situation where the NMI
would not enable on these processors.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: prarit@redhat.com
Cc: suresh.b.siddha@intel.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
mce_intel.c uses apic_write() and lapic_get_maxlvt(), and so it needs
<asm/apic.h>.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
This compiler warning:
arch/x86/kernel/apic/io_apic.c: In function ‘ioapic_write_entry’:
arch/x86/kernel/apic/io_apic.c:466: warning: ‘eu’ is used uninitialized in this function
arch/x86/kernel/apic/io_apic.c:465: note: ‘eu’ was declared here
Is bogus as 'eu' is always initialized. But annotate it away by
initializing the variable, to make it easier for people to notice
real warnings. A compiler that sees through this logic will
optimize away the initialization.
Signed-off-by: Figo.zhang <figo1802@gmail.com>
LKML-Reference: <1245248720.3312.27.camel@myhost>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If APIC was disabled (for some reason) and as result
it's not even mapped we should not try to enable thermal
interrupts at all.
Reported-by: Simon Holm Thøgersen <odie@cs.aau.dk>
Tested-by: Simon Holm Thøgersen <odie@cs.aau.dk>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <20090615182633.GA7606@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For freqency dependent TSCs we only scale the cycles, we do not account
for the discrepancy in absolute value.
Our current formula is: time = cycles * mult
(where mult is a function of the cpu-speed on variable tsc machines)
Suppose our current cycle count is 10, and we have a multiplier of 5,
then our time value would end up being 50.
Now cpufreq comes along and changes the multiplier to say 3 or 7,
which would result in our time being resp. 30 or 70.
That means that we can observe random jumps in the time value due to
frequency changes in both fwd and bwd direction.
So what this patch does is change the formula to:
time = cycles * frequency + offset
And we calculate offset so that time_before == time_after, thereby
ridding us of these jumps in time.
[ Impact: fix/reduce sched_clock() jumps across frequency changing events ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Chucked-on-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* akpm: (182 commits)
fbdev: bf54x-lq043fb: use kzalloc over kmalloc/memset
fbdev: *bfin*: fix __dev{init,exit} markings
fbdev: *bfin*: drop unnecessary calls to memset
fbdev: bfin-t350mcqb-fb: drop unused local variables
fbdev: blackfin has __raw I/O accessors, so use them in fb.h
fbdev: s1d13xxxfb: add accelerated bitblt functions
tcx: use standard fields for framebuffer physical address and length
fbdev: add support for handoff from firmware to hw framebuffers
intelfb: fix a bug when changing video timing
fbdev: use framebuffer_release() for freeing fb_info structures
radeon: P2G2CLK_ALWAYS_ONb tested twice, should 2nd be P2G2CLK_DAC_ALWAYS_ONb?
s3c-fb: CPUFREQ frequency scaling support
s3c-fb: fix resource releasing on error during probing
carminefb: fix possible access beyond end of carmine_modedb[]
acornfb: remove fb_mmap function
mb862xxfb: use CONFIG_OF instead of CONFIG_PPC_OF
mb862xxfb: restrict compliation of platform driver to PPC
Samsung SoC Framebuffer driver: add Alpha Channel support
atmel-lcdc: fix pixclock upper bound detection
offb: use framebuffer_alloc() to allocate fb_info struct
...
Manually fix up conflicts due to kmemcheck in mm/slab.c
There are some places to be able to use printk_once instead of hard coding.
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* create mm/init-mm.c, move init_mm there
* remove INIT_MM, initialize init_mm with C99 initializer
* unexport init_mm on all arches:
init_mm is already unexported on x86.
One strange place is some OMAP driver (drivers/video/omap/) which
won't build modular, but it's already wants get_vm_area() export.
Somebody should look there.
[akpm@linux-foundation.org: add missing #includes]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
PIT_TICK_RATE is currently defined in four architectures, but in three
different places. While linux/timex.h is not the perfect place for it, it
is still a reasonable replacement for those drivers that traditionally use
asm/timex.h to get CLOCK_TICK_RATE and expect it to be the PIT frequency.
Note that for Alpha, the actual value changed from 1193182UL to 1193180UL.
This is unlikely to make a difference, and probably can only improve
accuracy. There was a discussion on the correct value of CLOCK_TICK_RATE
a few years ago, after which every existing instance was getting changed
to 1193182. According to the specification, it should be
1193181.818181...
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Len Brown <lenb@kernel.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch causes all the EFI_RESERVED_TYPE memory reservations to be recorded
in the e820 table as type E820_RESERVED.
(This patch replaces one called 'x86: vendor reserved memory type'.
This version has been discussed a bit with Peter and Yinghai but not given
a final opinion.)
Without this patch EFI_RESERVED_TYPE memory reservations may be
marked usable in the e820 table. There may be a collision between
kernel use and some reserver's use of this memory.
(An example use of this functionality is the UV system, which
will access extremely large areas of memory with a memory engine
that allows a user to address beyond the processor's range. Such
areas are reserved in the EFI table by the BIOS.
Some loaders have a restricted number of entries possible in the e820 table,
hence the need to record the reservations in the unrestricted EFI table.)
The call to do_add_efi_memmap() is only made if "add_efi_memmap" is specified
on the kernel command line.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
iomem_resource is by default initialized to -1, which means 64 bits of
physical address space if 64-bit resources are enabled. However, x86
CPUs cannot address 64 bits of physical address space. Thus, we want
to cap the physical address space to what the union of all CPU can
actually address.
Without this patch, we may end up assigning inaccessible values to
uninitialized 64-bit PCI memory resources.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Martin Mares <mj@ucw.cz>
Cc: stable@kernel.org
Rename files that are no longer 64bit specific:
mce_amd_64.c => mce_amd.c
mce_intel_64.c => mce_intel.c
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Now all symbols in the header are static. Remove the header.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
move intel_init_thermal() into therm_throt.c
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Put common functions into therm_throt.c, modify Makefile.
unexpected_thermal_interrupt
intel_thermal_interrupt
smp_thermal_interrupt
intel_set_thermal_handler
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Break smp_thermal_interrupt() into two functions.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Remove unused argument regs from handlers, and use inc_irq_stat.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The mce_disabled on 32bit is a tristate variable [1,0,-1],
while 64bit version is boolean [0,1].
This patch makes mce_disabled always boolean, and use mce_p5_enabled
to indicate the third state instead.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
There are 2 headers:
arch/x86/include/asm/mce.h
arch/x86/kernel/cpu/mcheck/mce.h
and in the latter small header:
#include <asm/mce.h>
This patch move all contents in the latter header into the former,
and fix all files using the latter to include the former instead.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Add sysfs interface for admins who want to tweak these options without
rebooting the system.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
"trigger" is not straight forward name for valiable that holds name
of user mode helper program which triggered by machine check events.
This patch renames this valiable and kins to more recognizable names.
trigger => mce_helper
trigger_argv => mce_helper_argv
notify_user => mce_need_notify
No functional changes.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Add __read_mostly to data written during setup.
Suggested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Simplify interface of mce_start():
- no_way_out = mce_start(no_way_out, &order);
+ order = mce_start(&no_way_out);
Now Monarch and Subjects share same exit(return) in usual path.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
In mce_cpu_restart, mce_init_timer is called unconditionally.
If !mce_available (e.g. mce is disabled), there are no useful work
for timer. Stop running it.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
If one CPU has no_way_out == 1, all other CPUs should have no_way_out
== 1. But despite global_nwo is read after mce_callin, global_nwo is
updated after mce_callin too. So it is possible that some CPU read
global_nwo before some other CPU update global_nwo, so that no_way_out
== 1 for some CPU, while no_way_out == 0 for some other CPU.
This patch fixes this race condition via moving mce_callin updating
after global_nwo updating, with a smp_wmb in between. A smp_rmb is
added between their reading too.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>