Commit Graph

13147 Commits

Author SHA1 Message Date
Fenghua Yu e365c9df2f x86, mem: clear_page_64.S: Support clear_page() with enhanced REP MOVSB/STOSB
Intel processors are adding enhancements to REP MOVSB/STOSB and the use of
REP MOVSB/STOSB for optimal memcpy/memset or similar functions is recommended.
Enhancement availability is indicated by CPUID.7.0.EBX[9] (Enhanced REP MOVSB/
STOSB).

Support clear_page() with rep stosb for processor supporting enhanced REP MOVSB
/STOSB. On processors supporting enhanced REP MOVSB/STOSB, the alternative
clear_page_c_e function using enhanced REP STOSB overrides the original function
and the fast string function.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-6-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 15:40:27 -07:00
Fenghua Yu 9072d11da1 x86, alternative: Add altinstruction_entry macro
Add altinstruction_entry macro to generate .altinstructions section
entries from assembly code.  This should be less failure-prone than
open-coding.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-5-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 15:40:25 -07:00
Fenghua Yu 5097313363 x86, alternative, doc: Add comment for applying alternatives order
Some string operation functions may be patched twice, e.g. on enhanced REP MOVSB
/STOSB processors, memcpy is patched first by fast string alternative function,
then it is patched by enhanced REP MOVSB/STOSB alternative function.

Add comment for applying alternatives order to warn people who may change the
applying alternatives order for any reason.

[ Documentation-only patch ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-4-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 15:40:25 -07:00
Fenghua Yu 161ec53c70 x86, mem, intel: Initialize Enhanced REP MOVSB/STOSB
If kernel intends to use enhanced REP MOVSB/STOSB, it must ensure
IA32_MISC_ENABLE.Fast_String_Enable (bit 0) is set and CPUID.(EAX=07H, ECX=0H):
EBX[bit 9] also reports 1.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-3-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 15:40:23 -07:00
Fenghua Yu 724a92ee45 x86, cpufeature: Add CPU feature bit for enhanced REP MOVSB/STOSB
Intel processors are adding enhancements to REP MOVSB/STOSB and the use of
REP MOVSB/STOSB for optimal memcpy/memset or similar functions is recommended.
Enhancement availability is indicated by CPUID.7.0.EBX[9] (Enhanced REP MOVSB/
STOSB).

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-2-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 14:56:36 -07:00
Rafael J. Wysocki 2d2a9163bd Merge branch 'syscore' into for-linus
* syscore:
  PM: Remove sysdev suspend, resume and shutdown operations
  PM / PowerPC: Use struct syscore_ops instead of sysdevs for PM
  PM / UNICORE32: Use struct syscore_ops instead of sysdevs for PM
  PM / AVR32: Use struct syscore_ops instead of sysdevs for PM
  PM / Blackfin: Use struct syscore_ops instead of sysdevs for PM
  ARM / Samsung: Use struct syscore_ops for "core" power management
  ARM / PXA: Use struct syscore_ops for "core" power management
  ARM / SA1100: Use struct syscore_ops for "core" power management
  ARM / Integrator: Use struct syscore_ops for core PM
  ARM / OMAP: Use struct syscore_ops for "core" power management
  ARM: Use struct syscore_ops instead of sysdevs for PM in common code
2011-05-17 23:23:40 +02:00
Amerigo Wang c3b0795c98 PM / ACPI: Remove acpi_sleep=s4_nonvs
acpi_sleep=s4_nonvs is superseded by acpi_sleep=nonvs, so remove it.

Signed-off-by: WANG Cong <amwang@redhat.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Len Brown <lenb@kernel.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-05-17 23:19:18 +02:00
Fenghua Yu 2494b030ba x86, cpufeature: Fix cpuid leaf 7 feature detection
CPUID leaf 7, subleaf 0 returns the maximum subleaf in EAX, not the
number of subleaves.  Since so far only subleaf 0 is defined (and only
the EBX bitfield) we do not need to qualify the test.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305660806-17519-1-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@kernel.org> 2.6.36..39
2011-05-17 13:36:29 -07:00
Borislav Petkov 14fb57dccb x86, AMD: Fix ARAT feature setting again
Trying to enable the local APIC timer on early K8 revisions
uncovers a number of other issues with it, in conjunction with
the C1E enter path on AMD. Fixing those causes much more churn
and troubles than the benefit of using that timer brings so
don't enable it on K8 at all, falling back to the original
functionality the kernel had wrt to that.

Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com>
Cc: Boris Ostrovsky <Boris.Ostrovsky@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Nick Bowler <nbowler@elliptictech.com>
Cc: Joerg-Volker-Peetz <jvpeetz@web.de>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1305636919-31165-3-git-send-email-bp@amd64.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-17 15:28:34 +02:00
Borislav Petkov 328935e634 Revert "x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors"
This reverts commit e20a2d205c, as it crashes
certain boxes with specific AMD CPU models.

Moving the lower endpoint of the Erratum 400 check to accomodate
earlier K8 revisions (A-E) opens a can of worms which is simply
not worth to fix properly by tweaking the errata checking
framework:

* missing IntPenging MSR on revisions < CG cause #GP:

http://marc.info/?l=linux-kernel&m=130541471818831

* makes earlier revisions use the LAPIC timer instead of the C1E
idle routine which switches to HPET, thus not waking up in
deeper C-states:

http://lkml.org/lkml/2011/4/24/20

Therefore, leave the original boundary starting with K8-revF.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-17 15:28:33 +02:00
Ingo Molnar 86b9523ab1 Merge branch 'gart/rename' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into core/iommu 2011-05-17 14:39:00 +02:00
David Rientjes dc382fd5bc x86, mm: Allow ZONE_DMA to be configurable
ZONE_DMA is unnecessary for a large number of machines that do not
require less than 32-bit DMA addressing, e.g. ISA legacy DMA or PCI
cards with a restricted DMA address mask.

This patch allows users to disable ZONE_DMA for x86 if they know they
will not be using such devices with their kernel.

This prevents the VM from unnecessarily reserving a ratio of memory
(defaulting to 1/256th of system capacity) with lowmem_reserve_ratio
for such allocations when it will never be used.

Signed-off-by: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1105161353560.4353@chino.kir.corp.google.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-16 14:03:28 -07:00
Ondrej Zary 865be7a810 x86, cpu: Fix detection of Celeron Covington stepping A1 and B0
Steppings A1 and B0 of Celeron Covington are currently misdetected as
Pentium II (Dixon). Fix it by removing the stepping check.

[ hpa: this fixes this specific bug... the CPUID documentation
  specifies that the L2 cache size can disambiguate additional CPUs;
  this patch does not fix that. ]

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Link: http://lkml.kernel.org/r/201105162138.15416.linux@rainbow-software.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-16 13:24:21 -07:00
Martin Schwidefsky 521ccb5c4a ftrace/x86: mcount offset calculation
Do the mcount offset adjustment in the recordmcount.pl/recordmcount.[ch]
at compile time and not in ftrace_call_adjust at run time.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-05-16 14:55:57 -04:00
Steven Rostedt 2895cd2ab8 ftrace/x86: Do not trace .discard.text section
The section called .discard.text has tracing attached to it and is
currently ignored by ftrace. But it does include a call to the mcount
stub. Adding a notrace to the code keeps gcc from adding the useless
mcount caller to it.

Link: http://lkml.kernel.org/r/20110421023739.243651696@goodmis.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-05-16 14:47:13 -04:00
Frank Arnold 42be450565 x86, AMD, cacheinfo: Fix L3 cache index disable checks
We provide two slots to disable cache indices, and have a check to
prevent both slots to be used for the same index.

If the user disables the same index on different subcaches, both slots
will hold the same index, e.g.

  $ echo 2047 > /sys/devices/system/cpu/cpu0/cache/index3/cache_disable_0
  $ cat /sys/devices/system/cpu/cpu0/cache/index3/cache_disable_0
  2047
  $ echo 1050623 > /sys/devices/system/cpu/cpu0/cache/index3/cache_disable_1
  $ cat /sys/devices/system/cpu/cpu0/cache/index3/cache_disable_1
  2047

due to the fact that the check was looking only at index bits [11:0]
and was ignoring writes to bits outside that range. The more correct
fix is to simply check whether the index is within the bounds of
[0..l3->indices].

While at it, cleanup comments and drop now-unused local macros.

Signed-off-by: Frank Arnold <frank.arnold@amd.com>
Link: http://lkml.kernel.org/r/1305553188-21061-3-git-send-email-bp@amd64.org
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-16 11:24:27 -07:00
Borislav Petkov 50e7534427 x86, AMD, cacheinfo: Fix fallout caused by max3 conversion
732eacc054 converted code around the
kernel using nested max() macros to use the new max3 macro but forgot to
remove the old line in intel_cacheinfo.c. Fix it.

Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Frank Arnold <farnold@amd64.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1305553188-21061-2-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-16 11:24:23 -07:00
Konrad Rzeszutek Wilk 7c1bfd685b xen/pci: Fix compiler error when CONFIG_XEN_PRIVILEGED_GUEST is not set.
If we have CONFIG_XEN and the other parameters to build an
Linux kernel that is non-privileged, the xen_[find|register|unregister]_
device_domain_owner functions should not be compiled. They should
use the nops defined in arch/x86/include/asm/xen/pci.h instead.

This fixes:

arch/x86/pci/xen.c:496: error: redefinition of ‘xen_find_device_domain_owner’
arch/x86/include/asm/xen/pci.h:25: note: previous definition of ‘xen_find_device_domain_owner’ was here
arch/x86/pci/xen.c:510: error: redefinition of ‘xen_register_device_domain_owner’
arch/x86/include/asm/xen/pci.h:29: note: previous definition of ‘xen_register_device_domain_owner’ was here
arch/x86/pci/xen.c:532: error: redefinition of ‘xen_unregister_device_domain_owner’
arch/x86/include/asm/xen/pci.h:34: note: previous definition of ‘xen_unregister_device_domain_owner’ was here

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
2011-05-16 13:47:30 -04:00
Youquan Song e503f9e4b0 x86, apic: Fix spurious error interrupts triggering on all non-boot APs
This patch fixes a bug reported by a customer, who found
that many unreasonable error interrupts reported on all
non-boot CPUs (APs) during the system boot stage.

According to Chapter 10 of Intel Software Developer Manual
Volume 3A, Local APIC may signal an illegal vector error when
an LVT entry is set as an illegal vector value (0~15) under
FIXED delivery mode (bits 8-11 is 0), regardless of whether
the mask bit is set or an interrupt actually happen. These
errors are seen as error interrupts.

The initial value of thermal LVT entries on all APs always reads
0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
sequence to them and LVT registers are reset to 0s except for
the mask bits which are set to 1s when APs receive INIT IPI.

When the BIOS takes over the thermal throttling interrupt,
the LVT thermal deliver mode should be SMI and it is required
from the kernel to keep AP's LVT thermal monitoring register
programmed as such as well.

This issue happens when BIOS does not take over thermal throttling
interrupt, AP's LVT thermal monitor register will be restored to
0x10000 which means vector 0 and fixed deliver mode, so all APs will
signal illegal vector error interrupts.

This patch check if interrupt delivery mode is not fixed mode before
restoring AP's LVT thermal monitor register.

Signed-off-by: Youquan Song <youquan.song@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Yong Wang <yong.y.wang@intel.com>
Cc: hpa@linux.intel.com
Cc: joe@perches.com
Cc: jbaron@redhat.com
Cc: trenn@suse.de
Cc: kent.liu@intel.com
Cc: chaohong.guo@intel.com
Cc: <stable@kernel.org> # As far back as possible
Link: http://lkml.kernel.org/r/1303402963-17738-1-git-send-email-youquan.song@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-16 13:48:25 +02:00
Thomas Gleixner a18f22a968 Merge branch 'consolidate-clksrc-i8253' of master.kernel.org:~rmk/linux-2.6-arm into timers/clocksource
Conflicts:
	arch/ia64/kernel/cyclone.c
	arch/mips/kernel/i8253.c
	arch/x86/kernel/i8253.c

Reason: Resolve conflicts so further cleanups do not conflict further

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-14 12:06:36 +02:00
Russell King 82491451dd clocksource: convert x86 to generic i8253 clocksource
Convert x86 i8253 clocksource code to use generic i8253 clocksource.

Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-05-14 10:29:48 +01:00
Greg Kroah-Hartman 82a3242e11 sysfs: remove "last sysfs file:" line from the oops messages
On some arches (x86, sh, arm, unicore, powerpc) the oops message would
print out the last sysfs file accessed.

This was very useful in finding a number of sysfs and driver core bugs
in the 2.5 and early 2.6 development days, but it has been a number of
years since this file has actually helped in debugging anything that
couldn't also be trivially determined from the stack traceback.

So it's time to delete the line.  This is good as we need all the space
we can get for oops messages at times on consoles.

Acked-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-13 16:05:51 -07:00
Julia Lawall d9a5ac9ef3 x86, mce, AMD: Fix leaving freed data in a list
b may be added to a list, but is not removed before being freed
in the case of an error.  This is done in the corresponding
deallocation function, so the code here has been changed to
follow that.

The sematic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression E,E1,E2;
identifier l;
@@

*list_add(&E->l,E1);
... when != E1
    when != list_del(&E->l)
    when != list_del_init(&E->l)
    when != E = E2
*kfree(E);// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1305294731-12127-1-git-send-email-julia@diku.dk
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-13 17:11:02 +02:00
Cliff Wickman 77ed23f8d9 x86: Fix UV BAU for non-consecutive nasids
This is a fix for the SGI Altix-UV Broadcast Assist Unit code,
which is used for TLB flushing.

Certain hardware configurations (that customers are ordering)
cause nasids (numa address space id's) to be non-consecutive.
Specifically, once you have more than 4 blades in a IRU
(Individual Rack Unit - or 1/2 rack) but less than the maximum
of 16, the nasid numbering becomes non-consecutive.  This
currently results in a 'catastrophic error' (CATERR) detected by
the firmware during OS boot.  The BAU is generating an 'INTD'
request that is targeting a non-existent nasid value. Such
configurations may also occur when a blade is configured off
because of hardware errors. (There is one UV hub per blade.)

This patch is required to support such configurations.

The problem with the tlb_uv.c code is that is using the
consecutive hub numbers as indices to the BAU distribution bit
map. These are simply the ordinal position of the hub or blade
within its partition.  It should be using physical node numbers
(pnodes), which correspond to the physical nasid values. Use of
the hub number only works as long as the nasids in the partition
are consecutive and increase with a stride of 1.

This patch changes the index to be the pnode number, thus
allowing nasids to be non-consecutive.
It also provides a table in local memory for each cpu to
translate target cpu number to target pnode and nasid.
And it improves naming to properly reflect 'node' and 'uvhub'
versus 'nasid'.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/E1QJmxX-0002Mz-Fk@eag09.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-12 23:45:42 +02:00
Daniel Kiper ae15a3b4d1 arch/x86/xen/setup: Cleanup code/data sections definitions
Cleanup code/data sections definitions
accordingly to include/linux/init.h.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 17:19:34 -04:00
Daniel Kiper ad3062a0f4 arch/x86/xen/enlighten: Cleanup code/data sections definitions
Cleanup code/data sections definitions
accordingly to include/linux/init.h.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 17:19:34 -04:00
Daniel Kiper 251511a18d arch/x86/xen/irq: Cleanup code/data sections definitions
Cleanup code/data sections definitions
accordingly to include/linux/init.h.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 17:19:33 -04:00
Linus Torvalds 0c5e1577f1 Merge branch 'stable/bug-fixes-for-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug-fixes-for-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  x86/mm: Fix section mismatch derived from native_pagetable_reserve()
  x86,xen: introduce x86_init.mapping.pagetable_reserve
  Revert "xen/mmu: Add workaround "x86-64, mm: Put early page table high""
2011-05-12 12:21:51 -07:00
Konrad Rzeszutek Wilk 8c5950881c xen/p2m: Create entries in the P2M_MFN trees's to track 1-1 mappings
.. when applicable. We need to track in the p2m_mfn and
p2m_mfn_p the MFNs and pointers, respectivly, for the P2M entries
that are allocated for the identity mappings. Without this,
a PV domain with an E820 that triggers the 1-1 mapping to kick in,
won't be able to be restored as the P2M won't have the identity
mappings.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 14:38:53 -04:00
Daniel Kiper 0f16d0dfcd xen/setup: Fix for incorrect xen_extra_mem_start initialization under 32-bit
git commit 24bdb0b62c (xen: do not create
the extra e820 region at an addr lower than 4G) does not take into
account that ifdef CONFIG_X86_32 instead of e820_end_of_low_ram_pfn()
find_low_pfn_range() is called (both calls are from arch/x86/kernel/setup.c).
find_low_pfn_range() behaves correctly and does not require change in
xen_extra_mem_start initialization. Additionally, if xen_extra_mem_start
is initialized in the same way as ifdef CONFIG_X86_64 then memory hotplug
support for Xen balloon driver (under development) is broken.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 14:37:06 -04:00
Konrad Rzeszutek Wilk 15bfc09451 xen/setup: Ignore E820_UNUSABLE when setting 1-1 mappings.
When we parse the raw E820, the Xen hypervisor can set "E820_RAM"
to "E820_UNUSABLE" if the mem=X argument is used. As such we
should _not_ consider the E820_UNUSABLE as an 1-1 identity
mapping, but instead use the same case as for E820_RAM.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 14:32:13 -04:00
Tian, Kevin 7899891c7d xen mmu: fix a race window causing leave_mm BUG()
There's a race window in xen_drop_mm_ref, where remote cpu may exit
dirty bitmap between the check on this cpu and the point where remote
cpu handles drop request. So in drop_other_mm_ref we need check
whether TLB state is still lazy before calling into leave_mm. This
bug is rarely observed in earlier kernel, but exaggerated by the
commit 831d52bc15
("x86, mm: avoid possible bogus tlb entries by clearing prev mm_cpumask after switching mm")
which clears bitmap after changing the TLB state. the call trace is as below:

---------------------------------
kernel BUG at arch/x86/mm/tlb.c:61!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/system/xen_memory/xen_memory0/info/current_kb
CPU 1
Modules linked in: 8021q garp xen_netback xen_blkback blktap blkback_pagemap nbd bridge stp llc autofs4 ipmi_devintf ipmi_si ipmi_msghandler lockd sunrpc bonding ipv6 xenfs dm_multipath video output sbs sbshc parport_pc lp parport ses enclosure snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device serio_raw bnx2 snd_pcm_oss snd_mixer_oss snd_pcm snd_timer iTCO_wdt snd soundcore snd_page_alloc i2c_i801 iTCO_vendor_support i2c_core pcs pkr pata_acpi ata_generic ata_piix shpchp mptsas mptscsih mptbase [last unloaded: freq_table]
Pid: 25581, comm: khelper Not tainted 2.6.32.36fixxen #1 Tecal RH2285
RIP: e030:[<ffffffff8103a3cb>]  [<ffffffff8103a3cb>] leave_mm+0x15/0x46
RSP: e02b:ffff88002805be48  EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88015f8e2da0
RDX: ffff88002805be78 RSI: 0000000000000000 RDI: 0000000000000001
RBP: ffff88002805be48 R08: ffff88009d662000 R09: dead000000200200
R10: dead000000100100 R11: ffffffff814472b2 R12: ffff88009bfc1880
R13: ffff880028063020 R14: 00000000000004f6 R15: 0000000000000000
FS:  00007f62362d66e0(0000) GS:ffff880028058000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003aabc11909 CR3: 000000009b8ca000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000000000 00
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process khelper (pid: 25581, threadinfo ffff88007691e000, task ffff88009b92db40)
Stack:
 ffff88002805be68 ffffffff8100e4ae 0000000000000001 ffff88009d733b88
<0> ffff88002805be98 ffffffff81087224 ffff88002805be78 ffff88002805be78
<0> ffff88015f808360 00000000000004f6 ffff88002805bea8 ffffffff81010108
Call Trace:
 <IRQ>
 [<ffffffff8100e4ae>] drop_other_mm_ref+0x2a/0x53
 [<ffffffff81087224>] generic_smp_call_function_single_interrupt+0xd8/0xfc
 [<ffffffff81010108>] xen_call_function_single_interrupt+0x13/0x28
 [<ffffffff810a936a>] handle_IRQ_event+0x66/0x120
 [<ffffffff810aac5b>] handle_percpu_irq+0x41/0x6e
 [<ffffffff8128c1c0>] __xen_evtchn_do_upcall+0x1ab/0x27d
 [<ffffffff8128dd11>] xen_evtchn_do_upcall+0x33/0x46
 [<ffffffff81013efe>] xen_do_hyper visor_callback+0x1e/0x30
 <EOI>
 [<ffffffff814472b2>] ? _spin_unlock_irqrestore+0x15/0x17
 [<ffffffff8100f8cf>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff81113f71>] ? flush_old_exec+0x3ac/0x500
 [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef
 [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef
 [<ffffffff8115115d>] ? load_elf_binary+0x398/0x17ef
 [<ffffffff81042fcf>] ? need_resched+0x23/0x2d
 [<ffffffff811f4648>] ? process_measurement+0xc0/0xd7
 [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef
 [<ffffffff81113094>] ? search_binary_handler+0xc8/0x255
 [<ffffffff81114362>] ? do_execve+0x1c3/0x29e
 [<ffffffff8101155d>] ? sys_execve+0x43/0x5d
 [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
 [<ffffffff81013e28>] ? kernel_execve+0x68/0xd0
 [<ffffffff 8106fc45>] ? __call_usermodehelper+0x0/0x6f
 [<ffffffff8100f8cf>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8106fb64>] ? ____call_usermodehelper+0x113/0x11e
 [<ffffffff81013daa>] ? child_rip+0xa/0x20
 [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
 [<ffffffff81012f91>] ? int_ret_from_sys_call+0x7/0x1b
 [<ffffffff8101371d>] ? retint_restore_args+0x5/0x6
 [<ffffffff81013da0>] ? child_rip+0x0/0x20
Code: 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00 e8 17 ff ff ff c9 c3 55 48 89 e5 0f 1f 44 00 00 65 8b 04 25 c8 55 01 00 ff c8 75 04 <0f> 0b eb fe 65 48 8b 34 25 c0 55 01 00 48 81 c6 b8 02 00 00 e8
RIP  [<ffffffff8103a3cb>] leave_mm+0x15/0x46
 RSP <ffff88002805be48>
---[ end trace ce9cee6832a9c503 ]---

Tested-by: Maoxiaoyun<tinnycloud@hotmail.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
[v1: Fleshed out the git description a bit]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 14:27:43 -04:00
Sedat Dilek 53f8023feb x86/mm: Fix section mismatch derived from native_pagetable_reserve()
With CONFIG_DEBUG_SECTION_MISMATCH=y I see these warnings in next-20110415:

  LD      vmlinux.o
  MODPOST vmlinux.o
WARNING: vmlinux.o(.text+0x1ba48): Section mismatch in reference from the function native_pagetable_reserve() to the function .init.text:memblock_x86_reserve_range()
The function native_pagetable_reserve() references
the function __init memblock_x86_reserve_range().
This is often because native_pagetable_reserve lacks a __init
annotation or the annotation of memblock_x86_reserve_range is wrong.

This patch fixes the issue.
Thanks to pipacs from PaX project for help on IRC.

Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 13:05:05 -04:00
Stefano Stabellini 279b706bf8 x86,xen: introduce x86_init.mapping.pagetable_reserve
Introduce a new x86_init hook called pagetable_reserve that at the end
of init_memory_mapping is used to reserve a range of memory addresses for
the kernel pagetable pages we used and free the other ones.

On native it just calls memblock_x86_reserve_range while on xen it also
takes care of setting the spare memory previously allocated
for kernel pagetable pages from RO to RW, so that it can be used for
other purposes.

A detailed explanation of the reason why this hook is needed follows.

As a consequence of the commit:

commit 4b239f458c
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Fri Dec 17 16:58:28 2010 -0800

    x86-64, mm: Put early page table high

at some point init_memory_mapping is going to reach the pagetable pages
area and map those pages too (mapping them as normal memory that falls
in the range of addresses passed to init_memory_mapping as argument).
Some of those pages are already pagetable pages (they are in the range
pgt_buf_start-pgt_buf_end) therefore they are going to be mapped RO and
everything is fine.
Some of these pages are not pagetable pages yet (they fall in the range
pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
are going to be mapped RW.  When these pages become pagetable pages and
are hooked into the pagetable, xen will find that the guest has already
a RW mapping of them somewhere and fail the operation.
The reason Xen requires pagetables to be RO is that the hypervisor needs
to verify that the pagetables are valid before using them. The validation
operations are called "pinning" (more details in arch/x86/xen/mmu.c).

In order to fix the issue we mark all the pages in the entire range
pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
is completed only the range pgt_buf_start-pgt_buf_end is reserved by
init_memory_mapping. Hence the kernel is going to crash as soon as one
of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
ranges are RO).

For this reason we need a hook to reserve the kernel pagetable pages we
used and free the other ones so that they can be reused for other
purposes.
On native it just means calling memblock_x86_reserve_range, on Xen it
also means marking RW the pagetable pages that we allocated before but
that haven't been used before.

Another way to fix this is without using the hook is by adding a 'if
(xen_pv_domain)' in the 'init_memory_mapping' code and calling the Xen
counterpart, but that is just nasty.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 13:05:04 -04:00
Konrad Rzeszutek Wilk 92bdaef7b2 Revert "xen/mmu: Add workaround "x86-64, mm: Put early page table high""
This reverts commit a38647837a.

It does not work with certain AMD machines.

last_pfn = 0x100000 max_arch_pfn = 0x400000000
initial memory mapped : 0 - 02c3a000
Base memory trampoline at [ffff88000009b000] 9b000 size 20480
init_memory_mapping: 0000000000000000-0000000100000000
 0000000000 - 0100000000 page 4k
kernel direct mapping tables up to 100000000 @ ff7fb000-100000000
init_memory_mapping: 0000000100000000-00000001e0800000
 0100000000 - 01e0800000 page 4k
kernel direct mapping tables up to 1e0800000 @ 1df0f3000-1e0000000
xen: setting RW the range fffdc000 - 100000000
RAMDISK: 0203b000 - 02c3a000
No NUMA configuration found
Faking a node at 0000000000000000-00000001e0800000
NUMA: Using 63 for the hash shift.
Initmem setup node 0 0000000000000000-00000001e0800000
  NODE_DATA [00000001dfffb000 - 00000001dfffffff]
BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
PGD 0
Oops: 0003 [#1] SMP
last sysfs file:
CPU 0
Modules linked in:

Pid: 0, comm: swapper Not tainted 2.6.39-0-virtual #6~smb1
RIP: e030:[<ffffffff81cf6a75>]  [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
RSP: e02b:ffffffff81c01e38  EFLAGS: 00010046
RAX: 0000000000000000 RBX: 00000001e0800000 RCX: 0000000000001040
RDX: 0000000000004100 RSI: 0000000000000000 RDI: ffff8801dfffb000
RBP: ffffffff81c01e58 R08: 0000000000000020 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000bfe400
FS:  0000000000000000(0000) GS:ffffffff81cca000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000001c03000 CR4: 0000000000000660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81c00000, task ffffffff81c0b020)
Stack:
 0000000000000040 0000000000000001 0000000000000000 ffffffffffffffff
 ffffffff81c01e88 ffffffff81cf6c25 0000000000000000 0000000000000000
 ffffffff81cf687f 0000000000000000 ffffffff81c01ea8 ffffffff81cf6e45
Call Trace:
 [<ffffffff81cf6c25>] numa_register_memblks.constprop.3+0x150/0x181
 [<ffffffff81cf687f>] ? numa_add_memblk+0x7c/0x7c
 [<ffffffff81cf6e45>] numa_init.part.2+0x1c/0x7c
 [<ffffffff81cf687f>] ? numa_add_memblk+0x7c/0x7c
 [<ffffffff81cf6f67>] numa_init+0x6c/0x70
 [<ffffffff81cf7057>] initmem_init+0x39/0x3b
 [<ffffffff81ce5865>] setup_arch+0x64e/0x769
 [<ffffffff815e43c1>] ? printk+0x51/0x53
 [<ffffffff81cdf92b>] start_kernel+0xd4/0x3f3
 [<ffffffff81cdf388>] x86_64_start_reservations+0x132/0x136
 [<ffffffff81ce2ed4>] xen_start_kernel+0x588/0x58f
Code: 41 00 00 48 8b 3c c5 a0 24 cc 81 31 c0 40 f6 c7 01 74 05 aa 66 ba ff 40 40 f6 c7 02 74 05 66 ab 83 ea 02 89 d1 c1 e9 02 f6 c2 02 <f3> ab 74 02 66 ab 80 e2 01 74 01 aa 49 63 c4 48 c1 eb 0c 44 89
RIP  [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
 RSP <ffffffff81c01e38>
CR2: 0000000000000000
---[ end trace a7919e7f17c0a725 ]---
Kernel panic - not syncing: Attempted to kill the idle task!
Pid: 0, comm: swapper Tainted: G      D     2.6.39-0-virtual #6~smb1

Reported-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 13:04:29 -04:00
Richard Weinberger 449a66fd1f x86: Remove warning and warning_symbol from struct stacktrace_ops
Both warning and warning_symbol are nowhere used.
Let's get rid of them.

Signed-off-by: Richard Weinberger <richard@nod.at>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Soeren Sandmann Pedersen <ssp@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: x86 <x86@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Link: http://lkml.kernel.org/r/1305205872-10321-2-git-send-email-richard@nod.at
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2011-05-12 15:31:28 +02:00
Ingo Molnar 9cb5baba5e Merge commit 'v2.6.39-rc7' into sched/core 2011-05-12 09:36:18 +02:00
Rafael J. Wysocki 2e711c04db PM: Remove sysdev suspend, resume and shutdown operations
Since suspend, resume and shutdown operations in struct sysdev_class
and struct sysdev_driver are not used any more, remove them.  Also
drop sysdev_suspend(), sysdev_resume() and sysdev_shutdown() used
for executing those operations and modify all of their users
accordingly.  This reduces kernel code size quite a bit and reduces
its complexity.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-11 21:37:15 +02:00
Jiri Olsa 9bbeacf52f kprobes, x86: Disable irqs during optimized callback
Disable irqs during optimized callback, so we dont miss any in-irq kprobes.

The following commands:

 # cd /debug/tracing/
 # echo "p mutex_unlock" >> kprobe_events
 # echo "p _raw_spin_lock" >> kprobe_events
 # echo "p smp_apic_timer_interrupt" >> ./kprobe_events
 # echo 1 > events/enable

Cause the optimized kprobes to be missed. None is missed
with the fix applied.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/20110511110613.GB2390@jolsa.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-11 13:21:23 +02:00
Joerg Roedel fffcda1183 x86, gart: Rename pci-gart_64.c to amd_gart_64.c
This file only contains code relevant for the northbridge
gart in AMD processors. This patch renames the file to
represent this fact in the filename.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-05-10 17:22:06 +02:00
Ingo Molnar 932fed4e2e Merge commit 'v2.6.39-rc7' into perf/core
Merge reason: pull in the latest fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-10 17:05:45 +02:00
Joerg Roedel 72fe00f01f x86/amd-iommu: Use threaded interupt handler
Move the interupt handling for the iommu into the interupt
thread to reduce latencies and prepare interupt handling for
pri handling.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-05-10 11:07:58 +02:00
Joerg Roedel 604c307bf4 Merge branches 'dma-debug/next', 'amd-iommu/command-cleanups', 'amd-iommu/ats' and 'amd-iommu/extended-features' into iommu/2.6.40
Conflicts:
	arch/x86/include/asm/amd_iommu_types.h
	arch/x86/kernel/amd_iommu.c
	arch/x86/kernel/amd_iommu_init.c
2011-05-10 10:25:23 +02:00
Joe Perches e969687595 arch/x86/kernel/pci-iommu_table.c: Convert sprintf_symbol to %pS
Coalesce format as well.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-05-10 10:21:35 +02:00
Jack Steiner 1d44e8288a x86, UV: Fix NMI handler for UV platforms
This fixes problems seen on UV systems handling NMIs from the
node controller.

I isolated the "dazed..." messages that I saw earlier to a bug in
the BMC on our platform. It was sending NMIs w/o properly setting
a register that indicated the source of NMI.

So rather than _assuming_ any unhandled NMI came from the UV system
maintenance console (SMC), add a check to verify that the SMC actually
sent the NMI.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: gorcunov@gmail.com
Cc: dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-10 09:26:55 +02:00
Matthew Garrett 935a638241 x86, efi: Ensure that the entirity of a region is mapped
It's possible for init_memory_mapping() to fail to map the entire region
if it crosses a boundary, so ensure that we complete the mapping.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1304623186-18261-5-git-send-email-mjg@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-09 12:14:45 -07:00
Matthew Garrett 7cb00b7287 x86, efi: Pass a minimal map to SetVirtualAddressMap()
Experimentation with various EFI implementations has shown that functions
outside runtime services will still update their pointers if
SetVirtualAddressMap() is called with memory descriptors outside the
runtime area. This is obviously insane, and therefore is unsurprising.
Evidence from instrumenting another EFI implementation suggests that it
only passes the set of descriptors covering runtime regions, so let's
avoid any problems by doing the same. Runtime descriptors are copied to
a separate memory map, and only that map is passed back to the firmware.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1304623186-18261-4-git-send-email-mjg@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-09 12:14:39 -07:00
Matthew Garrett 202f9d0a41 x86, efi: Merge contiguous memory regions of the same type and attribute
Some firmware implementations assume that physically contiguous regions
will be contiguous in virtual address space. This assumption is, obviously,
entirely unjustifiable. Said firmware implementations lack the good grace
to handle their failings in a measured and reasonable manner, instead
tending to shit all over address space and oopsing the kernel.

In an ideal universe these firmware implementations would simultaneously
catch fire and cease to be a problem, but since some of them are present
in attractively thin and shiny metal devices vanity wins out and some
poor developer spends an extended period of time surrounded by a
growing array of empty bottles until the underlying reason becomes
apparent. Said developer presents this patch, which simply merges
adjacent regions if they happen to be contiguous and have the same EFI
memory type and caching attributes.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1304623186-18261-3-git-send-email-mjg@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-09 12:14:34 -07:00
Matthew Garrett 9cd2b07c19 x86, efi: Consolidate EFI nx control
The core EFI code and 64-bit EFI code currently have independent
implementations of code for setting memory regions as executable or not.
Let's consolidate them.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1304623186-18261-2-git-send-email-mjg@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-09 12:14:29 -07:00
Matthew Garrett 2b5e8ef35b x86, efi: Remove virtual-mode SetVirtualAddressMap call
The spec says that SetVirtualAddressMap doesn't work once you're in
virtual mode, so there's no point in having infrastructure for calling
it from there.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1304623186-18261-1-git-send-email-mjg@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-09 12:14:25 -07:00
Luis R. Rodriguez 8fab6af215 x86: Fix mrst sparse complaints
Fix these Sparse complaints:

  CHECK   arch/x86/platform/mrst/mrst.c
  arch/x86/platform/mrst/mrst.c:197:13: warning: symbol 'mrst_time_init' was not declared. Should it be static?
  arch/x86/platform/mrst/mrst.c:219:16: warning: symbol 'mrst_arch_setup' was not declared. Should it be static?

Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Roman Gezikov <roman.gezikov@atheros.com>
Cc: Joonas Viskari <joonas.viskari@atheros.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Allen Kao <allen.kao@atheros.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Link: http://lkml.kernel.org/r/1304719209-26913-1-git-send-email-lrodriguez@atheros.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-07 10:52:30 +02:00
Ingo Molnar 4cb1f43ce8 Merge commit 'v2.6.39-rc6' into x86/cleanups
Merge reason: move to a (much) newer upstream base.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-07 10:51:48 +02:00
Ingo Molnar 57d524154f Merge branch 'perf/stat' into perf/core
Merge reason: the perf stat improvements are tested and ready now.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-06 21:07:38 +02:00
Rob Landley 6151658751 Correct occurrences of
- Documentation/kvm/ to Documentation/virtual/kvm
- Documentation/uml/ to Documentation/virtual/uml
- Documentation/lguest/ to Documentation/virtual/lguest
throughout the kernel source tree.

Signed-off-by: Rob Landley <rob@landley.net>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
2011-05-06 09:27:55 -07:00
Peter Zijlstra 63b6a6758e perf events, x86: Fix Intel Nehalem and Westmere last level cache event definitions
The Intel Nehalem offcore bits implemented in:

  e994d7d23a0b: perf: Fix LLC-* events on Intel Nehalem/Westmere

... are wrong: they implemented _ACCESS as _HIT and counted OTHER_CORE_HIT* as
MISS even though its clearly documented as an L3 hit ...

Fix them and the Westmere definitions as well.

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1299119690-13991-3-git-send-email-ming.m.lin@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-06 11:24:48 +02:00
Lin Ming e04d1b23f9 perf events, x86: Add SandyBridge stalled-cycles-frontend/backend events
Extend the Intel SandyBridge PMU driver with definitions
for generic front-end and back-end stall events.

( As commit 3011203 "perf events, x86: Add Westmere stalled-cycles-frontend/backend
  events" says, these are only approximations. )

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1304666042-17577-1-git-send-email-ming.m.lin@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-06 09:37:03 +02:00
Ingo Molnar 4d70230bb4 Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into perf/urgent 2011-05-06 08:11:28 +02:00
Ingo Molnar 98bb318864 Merge branch 'perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/urgent 2011-05-04 20:33:42 +02:00
Dominik Brodowski 2d06d8c49a [CPUFREQ] use dynamic debug instead of custom infrastructure
With dynamic debug having gained the capability to report debug messages
also during the boot process, it offers a far superior interface for
debug messages than the custom cpufreq infrastructure. As a first step,
remove the old cpufreq_debug_printk() function and replace it with a call
to the generic pr_debug() function.

How can dynamic debug be used on cpufreq? You need a kernel which has
CONFIG_DYNAMIC_DEBUG enabled.

To enabled debugging during runtime, mount debugfs and

$ echo -n 'module cpufreq +p' > /sys/kernel/debug/dynamic_debug/control

for debugging the complete "cpufreq" module. To achieve the same goal during
boot, append

	ddebug_query="module cpufreq +p"

as a boot parameter to the kernel of your choice.

For more detailled instructions, please see
Documentation/dynamic-debug-howto.txt

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
2011-05-04 11:50:57 -04:00
Naga Chumbalkar 904cc1e637 [CPUFREQ] Fix _OSC UUID in pcc-cpufreq
UUID needs to be written out the way it is described in
Sec 18.5.124 of ACPI 4.0a Specification.

Platform firmware's use of this UUID/_OSC is optional, which is
why we didn't notice this bug earlier.

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: stable@kernel.org
2011-05-04 11:50:56 -04:00
Linus Torvalds 609cfda586 Merge branch 'stable/bug-fixes-for-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug-fixes-for-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: mask_rw_pte mark RO all pagetable pages up to pgt_buf_top
  xen/mmu: Add workaround "x86-64, mm: Put early page table high"
2011-05-03 09:25:42 -07:00
H. Peter Anvin 7806a49ab6 x86, reboot: Fix relocations in reboot_32.S
The use of base for %ebx in this file is arbitrary, *except* that we
also use it to compute the real-mode segment.  Therefore, make it so
that r_base really is the true address to which %ebx points.

This resolves kernel bugzilla 33302.

Reported-and-tested-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/n/tip-08os5wi3yq1no0y4i5m4z7he@git.kernel.org
2011-05-02 14:44:46 -07:00
Stefano Stabellini b9269dc7bf xen: mask_rw_pte mark RO all pagetable pages up to pgt_buf_top
mask_rw_pte is currently checking if a pfn is a pagetable page if it
falls in the range pgt_buf_start - pgt_buf_end but that is incorrect
because pgt_buf_end is a moving target: pgt_buf_top is the real
boundary.

Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-02 16:33:52 -04:00
Konrad Rzeszutek Wilk a38647837a xen/mmu: Add workaround "x86-64, mm: Put early page table high"
As a consequence of the commit:

commit 4b239f458c
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Fri Dec 17 16:58:28 2010 -0800

    x86-64, mm: Put early page table high

it causes the Linux kernel to crash under Xen:

mapping kernel into physical memory
Xen: setup ISA identity maps
about to get started...
(XEN) mm.c:2466:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn b1d89 (pfn bacf7)
(XEN) mm.c:3027:d0 Error while pinning mfn b1d89
(XEN) traps.c:481:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
...

The reason is that at some point init_memory_mapping is going to reach
the pagetable pages area and map those pages too (mapping them as normal
memory that falls in the range of addresses passed to init_memory_mapping
as argument). Some of those pages are already pagetable pages (they are
in the range pgt_buf_start-pgt_buf_end) therefore they are going to be
mapped RO and everything is fine.
Some of these pages are not pagetable pages yet (they fall in the range
pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
are going to be mapped RW.  When these pages become pagetable pages and
are hooked into the pagetable, xen will find that the guest has already
a RW mapping of them somewhere and fail the operation.
The reason Xen requires pagetables to be RO is that the hypervisor needs
to verify that the pagetables are valid before using them. The validation
operations are called "pinning" (more details in arch/x86/xen/mmu.c).

In order to fix the issue we mark all the pages in the entire range
pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
is completed only the range pgt_buf_start-pgt_buf_end is reserved by
init_memory_mapping. Hence the kernel is going to crash as soon as one
of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
ranges are RO).

For this reason, this function is introduced which is called _after_
the init_memory_mapping has completed (in a perfect world we would
call this function from init_memory_mapping, but lets ignore that).

Because we are called _after_ init_memory_mapping the pgt_buf_[start,
end,top] have all changed to new values (b/c another init_memory_mapping
is called). Hence, the first time we enter this function, we save
away the pgt_buf_start value and update the pgt_buf_[end,top].

When we detect that the "old" pgt_buf_start through pgt_buf_end
PFNs have been reserved (so memblock_x86_reserve_range has been called),
we immediately set out to RW the "old" pgt_buf_end through pgt_buf_top.

And then we update those "old" pgt_buf_[end|top] with the new ones
so that we can redo this on the next pagetable.

Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
[v1: Updated with Jeremy's comments]
[v2: Added the crash output]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-02 16:33:34 -04:00
Yinghai Lu e5a10c1bd1 x86, NUMA: Trim numa meminfo with max_pfn in a separate loop
During testing 32bit numa unifying code from tj, found one system with
more than 64g fails to use numa.  It turns out we do not trim numa
meminfo correctly against max_pfn in case start address of a node is
higher than 64GiB.  Bug fix made it to tip tree.

This patch moves the checking and trimming to a separate loop.  So we
don't need to compare low/high in following merge loops.  It makes the
code more readable.

Also it makes the node merge printouts less strange.  On a 512GiB numa
system with 32bit,

before:
> NUMA: Node 0 [0,a0000) + [100000,80000000) -> [0,80000000)
> NUMA: Node 0 [0,80000000) + [100000000,1080000000) -> [0,1000000000)

after:
> NUMA: Node 0 [0,a0000) + [100000,80000000) -> [0,80000000)
> NUMA: Node 0 [0,80000000) + [100000000,1000000000) -> [0,1000000000)

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[Updated patch description and comment slightly.]
Signed-off-by: Tejun Heo <tj@kernel.org>
2011-05-02 17:24:49 +02:00
Yinghai Lu a56bca80db x86, NUMA: Rename setup_node_bootmem() to setup_node_data()
After using memblock to replace bootmem, that function only sets up
node_data now.

Change the name to reflect what it actually does.

tj: Minor adjustment to the patch description.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2011-05-02 17:24:49 +02:00
Tejun Heo 1b7e03ef75 x86, NUMA: Enable emulation on 32bit too
Now that NUMA init path is unified, NUMA emulation can be enabled on
32bit.  Make numa_emluation.c safe on 32bit by doing the followings.

* Define MAX_DMA32_PFN on 32bit too.

* Include bootmem.h for max_pfn declaration.

* Use u64 explicitly and always use PFN_PHYS() when converting page
  number to address.

* Avoid __udivdi3() generation on 32bit by doing number of pages
  calculation instead in split_nodes_interleave().

And drop X86_64 dependency from Kconfig.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 17:24:48 +02:00
Tejun Heo 2706a0bf7b x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit too
Now that NUMA init path is unified, amdtopology can be enabled on
32bit.  Make amdtopology.c safe on 32bit by explicitly using u64 and
drop X86_64 dependency from Kconfig.

Inclusion of bootmem.h is added for max_pfn declaration.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 17:24:48 +02:00
Tejun Heo c6f5887820 x86, NUMA: Rename amdtopology_64.c to amdtopology.c
amdtopology is going to be used by 32bit too drop _64 suffix.  This is
pure rename.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 17:24:48 +02:00
Tejun Heo 752d4f372f x86, NUMA: Make numa_init_array() static
numa_init_array() no longer has users outside of numa.c.  Make it
static.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 17:24:48 +02:00
Tejun Heo bd6709a91a x86, NUMA: Make 32bit use common NUMA init path
With both _numa_init() methods converted and the rest of init code
adjusted, numa_32.c now can switch from the 32bit only init code to
the common one in numa.c.

* Shim get_memcfg_*()'s are dropped and initmem_init() calls
  x86_numa_init(), which is updated to handle NUMAQ.

* All boilerplate operations including node range limiting, pgdat
  alloc/init are handled by numa_init().  32bit only implementation is
  removed.

* 32bit numa_add_memblk(), numa_set_distance() and
  memory_add_physaddr_to_nid() removed and common versions in
  numa_32.c enabled for 32bit.

This change causes the following behavior changes.

* NODE_DATA()->node_start_pfn/node_spanned_pages properly initialized
  for 32bit too.

* Much more sanity checks and configuration cleanups.

* Proper handling of node distances.

* The same NUMA init messages as 64bit.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 17:24:48 +02:00
Tejun Heo 7888e96b26 x86, NUMA: Initialize and use remap allocator from setup_node_bootmem()
setup_node_bootmem() is taken from 64bit and doesn't use remap
allocator.  It's about to be shared with 32bit so add support for it.
If NODE_DATA is remapped, it's noted in the debug message and node
locality check is skipped as the __pa() of the remapped address
doesn't reflect the actual physical address.

On 64bit, remap allocator becomes noop and doesn't affect the
behavior.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:54 +02:00
Tejun Heo 99cca492ea x86-32, NUMA: Add @start and @end to init_alloc_remap()
Instead of dereferencing node_start/end_pfn[] directly, make
init_alloc_remap() take @start and @end and let the caller be
responsible for making sure the range is sane.  This is to prepare for
use from unified NUMA init code.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:54 +02:00
Tejun Heo 38f3e1ca24 x86, NUMA: Remove long 64bit assumption from numa.c
Code moved from numa_64.c has assumption that long is 64bit in several
places.  This patch removes the assumption by using {s|u}64_t
explicity, using PFN_PHYS() for page number -> addr conversions and
adjusting printf formats.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00
Tejun Heo 744baba0c4 x86, NUMA: Enable build of generic NUMA init code on 32bit
Generic NUMA init code was moved to numa.c from numa_64.c but is still
guaraded by CONFIG_X86_64.  This patch removes the compile guard and
enables compiling on 32bit.

* numa_add_memblk() and numa_set_distance() clash with the shim
  implementation in numa_32.c and are left out.

* memory_add_physaddr_to_nid() clashes with 32bit implementation and
  is left out.

* MAX_DMA_PFN definition in dma.h moved out of !CONFIG_X86_32.

* node_data definition in numa_32.c removed in favor of the one in
  numa.c.

There are places where ulong is assumed to be 64bit.  The next patch
will fix them up.  Note that although the code is compiled it isn't
used yet and this patch doesn't cause any functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00
Tejun Heo a4106eae65 x86, NUMA: Move NUMA init logic from numa_64.c to numa.c
Move the generic 64bit NUMA init machinery from numa_64.c to numa.c.

* node_data[], numa_mem_info and numa_distance
* numa_add_memblk[_to](), numa_remove_memblk[_from]()
* numa_set_distance() and friends
* numa_init() and all the numa_meminfo handling helpers called from it
* dummy_numa_init()
* memory_add_physaddr_to_nid()

A new function x86_numa_init() is added and the content of
numa_64.c::initmem_init() is moved into it.  initmem_init() now simply
calls x86_numa_init().

Constants and numa_off declaration are moved from numa_{32|64}.h to
numa.h.

This is code reorganization and doesn't involve any functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00
Tejun Heo 299a180aec x86-32, NUMA: Update numaq to use new NUMA init protocol
Update numaq such that it calls numa_add_memblk() and sets
numa_nodes_parsed instead of directly diddling with NUMA states.  The
original get_memcfg_numaq() is renamed to numaq_numa_init() and new
get_memcfg_numaq() is created in numa_32.c.

The shim numa_add_memblk() implementation handles node_start/end_pfn[]
and node_set_online() for nodes with memory.  The new
get_memcfg_numaq() exactly the same with get_memcfg_from_srat() other
than calling the numaq init function.  Things get_memcfgs_numaq() do
are not strictly necessary for numaq but added for consistency and to
help unifying NUMA init handling.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00
Tejun Heo 5acd91ab83 x86-32, NUMA: Replace srat_32.c with srat.c
SRAT support implementation in srat_32.c and srat.c are generally
similar; however, there are some differences.

First of all, 64bit implementation supports more types of SRAT
entries.  64bit supports x2apic, affinity, memory and SLIT.  32bit
only supports processor and memory.

Most other differences stem from different initialization protocols
employed by 64bit and 32bit NUMA init paths.

On 64bit,

* Mappings among PXM, node and apicid are directly done in each SRAT
  entry callback.

* Memory affinity information is passed to numa_add_memblk() which
  takes care of all interfacing with NUMA init.

* Doesn't directly initialize NUMA configurations.  All the
  information is recorded in numa_nodes_parsed and memblks.

On 32bit,

* Checks numa_off.

* Things go through one more level of indirection via private tables
  but eventually end up initializing the same mappings.

* node_start/end_pfn[] are initialized and
  memblock_x86_register_active_regions() is called for each memory
  chunk.

* node_set_online() is called for each online node.

* sort_node_map() is called.

There are also other minor differences in sanity checking and messages
but taking 64bit version should be good enough.

This patch drops the 32bit specific implementation and makes the 64bit
implementation common for both 32 and 64bit.

The init protocol differences are dealt with in two places - the
numa_add_memblk() shim added in the previous patch and new temporary
numa_32.c:get_memcfg_from_srat() which wraps invocation of
x86_acpi_numa_init().

The shim numa_add_memblk() handles the folowings.

* node_start/end_pfn[] initialization.

* node_set_online() for memory nodes.

* Invocation of memblock_x86_register_active_regions().

The shim get_memcfg_from_srat() handles the followings.

* numa_off check.

* node_set_online() for CPU nodes.

* sort_node_map() invocation.

* Clearing of numa_nodes_parsed and active_ranges on failure.

The shims are temporary and will be removed as the generic NUMA init
path in 32bit is replaced with 64bit one.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00
Tejun Heo b0d310801a x86-32, NUMA: implement temporary NUMA init shims
To help transition to common NUMA init, implement temporary 32bit
shims for numa_add_memblk() and numa_set_distance().
numa_add_memblk() registers the memblk and adjusts
node_start/end_pfn[].  numa_set_distance() is noop.

These shims will allow using 64bit NUMA init functions on 32bit and
gradual transition to common NUMA init path.

For detailed description, please read description of commits which
make use of the shim functions.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00
Tejun Heo e6df595b37 x86, NUMA: Move numa_nodes_parsed to numa.[hc]
Move numa_nodes_parsed from numa_64.[hc] to numa.[hc] to prepare for
NUMA init path unification.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00
Tejun Heo daf4f480ae x86-32, NUMA: Move get_memcfg_numa() into numa_32.c
There's no reason get_memcfg_numa() to be implemented inline in
mmzone_32.h.  Move it to numa_32.c and also make
get_memcfg_numa_flag() static.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:53 +02:00
Tejun Heo eca9ad3132 x86, NUMA: make srat.c 32bit safe
Make srat.c 32bit safe by removing the assumption that unsigned long
is 64bit.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:52 +02:00
Tejun Heo 7b2600f8ee x86, NUMA: rename srat_64.c to srat.c
Rename srat_64.c to srat.c.  This is to prepare for unification of
NUMA init paths between 32 and 64bit.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:52 +02:00
Tejun Heo 1201e10a09 x86, NUMA: trivial cleanups
* Kill no longer used struct bootnode.

* Kill dangling declaration of pxm_to_nid() in numa_32.h.

* Make setup_node_bootmem() static.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:52 +02:00
Tejun Heo 797390d855 x86-32, NUMA: use sparse_memory_present_with_active_regions()
Instead of calling memory_present() for each region from NUMA init,
call sparse_memory_present_with_active_regions() from paging_init()
similarly to x86-64.

For flat and numaq, this results in exactly the same memory_present()
calls.  For srat, if there are multiple memory chunks for a node,
after this change, memory_present() will be called separately for each
chunk instead of being called once to encompass the whole range, which
doesn't cause any harm and actually is the better behavior.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:52 +02:00
Tejun Heo 84914ed0ec x86-32, NUMA: Make apic->x86_32_numa_cpu_node() optional
NUMAQ is the only meaningful user of this callback and
setup_local_APIC() the only callsite.  Stop torturing everyone else by
making the callback optional and removing all the boilerplate
implementations and assignments.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:52 +02:00
Tejun Heo 6bd262731b x86, NUMA: Unify 32/64bit numa_cpu_node() implementation
Currently, the only meaningful user of apic->x86_32_numa_cpu_node() is
NUMAQ which returns valid mapping only after CPU is initialized during
SMP bringup; thus, the previous patch to set apicid -> node in
setup_local_APIC() makes __apicid_to_node[] always contain the correct
mapping whether custom apic->x86_32_numa_cpu_node() is used or not.

So, there is no reason to keep separate 32bit implementation.  We can
always consult __apicid_to_node[].  Move 64bit implementation from
numa_64.c to numa.c and remove 32bit implementation from numa_32.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:52 +02:00
Tejun Heo c4b90c1199 x86-32, NUMA: Automatically set apicid -> node in setup_local_APIC()
Some x86-32 NUMA implementations (NUMAQ) don't initialize apicid ->
node mapping using set_apicid_to_node() during NUMA init but implement
custom apic->x86_32_numa_cpu_node() instead.

This patch automatically initializes the default apic -> node mapping
table from apic->x86_32_numa_cpu_node() from setup_local_APIC() such
that the mapping table is in sync with the actual mapping.

As the table isn't used by custom implementations, this doesn't make
any difference at this point.  This is in preparation of unifying
numa_cpu_node() between x86-32 and 64.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:52 +02:00
Tejun Heo acd26d611e x86-64, NUMA: simplify nodedata allocation
With top-down memblock allocation, the allocation range limits in
ealry_node_mem() can be simplified - try node-local first, then any
node but in any case don't allocate below DMA limit.

Remove early_node_mem() and implement simplified allocation directly
in setup_node_bootmem().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:51 +02:00
Tejun Heo ebe685f24e x86-64, NUMA: trivial cleanups for setup_node_bootmem()
Make the following trivial changes in preparation for further updates.

* nodeid -> nid, nid -> tnid
* use nd_ prefix for nodedata related variables
* remove start/end_pfn and use start/end directly

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:51 +02:00
Tejun Heo 9688678a66 x86-64, NUMA: Simplify hotadd memory handling
The only special handling NUMA needs to do for hotadd memory is
determining the node for the hotadd memory given the address of it and
there's nothing specific to specific config method used.

srat_64.c does somewhat elaborate error checking on
ACPI_SRAT_MEM_HOT_PLUGGABLE regions, remembers them and implements
memory_add_physaddr_to_nid() which determines the node for given
hotadd address.

This is almost completely redundant.  All the information is already
available to the generic NUMA code which already performs all the
sanity checking and merging.  All that's necessary is not using
__initdata from numa_meminfo and providing a function which uses it to
map address to node.

Drop the specific implementation from srat_64.c and add generic
memory_add_physaddr_to_nid() in numa_64.c, which is enabled if
CONFIG_MEMORY_HOTPLUG is set.  Other than dropping the code, srat_64.c
doesn't need any change as it already calls numa_add_memblk() for hot
pluggable regions which is enough.

While at it, change CONFIG_MEMORY_HOTPLUG_SPARSE in srat_64.c to
CONFIG_MEMORY_HOTPLUG, for NUMA on x86-64, the two are always the
same.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-05-02 14:18:51 +02:00
Tejun Heo ba67cf5cf2 Merge branch 'x86/urgent' into x86-mm
Merge reason: Pick up the following two fix commits.

  2be19102b7: x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo()
  765af22da8: x86-32, NUMA: Fix ACPI NUMA init broken by recent x86-64 change

Scheduled NUMA init 32/64bit unification changes depend on these.

Signed-off-by: Tejun Heo <tj@kernel.org>
2011-05-02 14:16:47 +02:00
Tejun Heo aff364860a Merge branch 'x86/numa' into x86-mm
Merge reason: Pick up x86-32 remap allocator cleanup changes - 14
commits, 3fe14ab541^..993ba1585c.

  3fe14ab541: x86-32, numa: Fix failure condition check in alloc_remap()
  993ba1585c: x86-32, numa: Update remap allocator comments

Scheduled NUMA init 32/64bit unification changes depend on them.

Signed-off-by: Tejun Heo <tj@kernel.org>
2011-05-02 14:08:47 +02:00
Bart Van Assche 9de4966a4d x86: Fix spelling error in the memcpy() source code comment
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: http://lkml.kernel.org/r/201105011409.21629.bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-01 19:16:18 +02:00
Yinghai Lu 2be19102b7 x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo()
numa_cleanup_meminfo() trims each memblk between low (0) and
high (max_pfn) limits and discards empty ones.  However, the
emptiness detection incorrectly used equality test.  If the
start of a memblk is higher than max_pfn, it is empty but fails
the equality test and doesn't get discarded.

The condition triggers when max_pfn is lower than start of a
NUMA node and results in memory misconfiguration - leading to
WARN_ON()s and other funnies.  The bug was discovered in devel
branch where 32bit too uses this code path for NUMA init.  If a
node is above the addressing limit, max_pfn ends up lower than
the node triggering this problem.

The failure hasn't been observed on x86-64 but is still possible
with broken hardware e820/NUMA info.  As the fix is very low
risk, it would be better to apply it even for 64bit.

Fix it by using >= instead of ==.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[ Extracted the actual fix from the original patch and rewrote patch description. ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110501171204.GO29280@htj.dyndns.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-01 19:15:11 +02:00
Ingo Molnar 809435ff4f Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core 2011-05-01 19:09:39 +02:00
Boris Ostrovsky e20a2d205c x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors
Older AMD K8 processors (Revisions A-E) are affected by erratum
400 (APIC timer interrupts don't occur in C states greater than
C1). This, for example, means that X86_FEATURE_ARAT flag should
not be set for these parts.

This addresses regression introduced by commit
b87cf80af3 ("x86, AMD: Set ARAT
feature on AMD processors") where the system may become
unresponsive until external interrupt (such as keyboard input)
occurs. This results, for example, in time not being reported
correctly, lack of progress on the system and other lockups.

Reported-by: Joerg-Volker Peetz <jvpeetz@web.de>
Tested-by: Joerg-Volker Peetz <jvpeetz@web.de>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Boris Ostrovsky <Boris.Ostrovsky@amd.com>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1304113663-6586-1-git-send-email-ostr@amd64.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-01 18:55:51 +02:00
Linus Torvalds 40a963502c Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86, nmi: Move LVT un-masking into irq handlers
  perf events, x86: Work around the Nehalem AAJ80 erratum
  perf, x86: Fix BTS condition
  ftrace: Build without frame pointers on Microblaze
2011-04-29 15:08:53 -07:00
Mike Waychison f548ccd47d x86: Better comments for get_bios_ebda()
Make the comments a bit clearer for get_bios_ebda so that it actually
tells us what it is returning.

Signed-off-by: Mike Waychison <mikew@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-29 14:13:15 -07:00
Mike Waychison 57d5f9f808 x86: get_bios_ebda_length()
Add a wrapper routine that tells us the length of the EBDA if it is
present.  This guy also ensures that the returned length doesn't let the
EBDA run past the 640KiB mark.

Signed-off-by: Mike Waychison <mikew@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-29 14:13:15 -07:00