Commit d518573de6 ("x86, amd: Normalize compute unit IDs on
multi-node processors") introduced compute unit normalization
but causes a compiler warning:
arch/x86/kernel/cpu/amd.c: In function 'amd_detect_cmp':
arch/x86/kernel/cpu/amd.c:268: warning: 'cores_per_cu' may be used uninitialized in this function
arch/x86/kernel/cpu/amd.c:268: note: 'cores_per_cu' was declared here
The compiler is right - initialize it with a proper value.
Also, fix up a comment while at it.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20110214171451.GB10076@kryptos.osrc.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
CONFIG_DEBUG_PER_CPU_MAPS may return NUMA_NO_NODE when an
early_cpu_to_node() mapping hasn't been initialized. In such a
case, it emits a warning and continues without an issue but
callers may try to use the return value to index into an array.
We can catch those errors and fail silently since a warning has
already been emitted. No current user of numa_add_cpu()
requires this error checking to avoid a crash, but it's better
to be proactive in case a future user happens to have a bug and
a user tries to diagnose it with CONFIG_DEBUG_PER_CPU_MAPS.
Reported-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>
LKML-Reference: <alpine.DEB.2.00.1102071407250.7812@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Emit warning when "mem=nopentium" is specified on any arch other
than x86_32 (the only that arch supports it).
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/553464
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
LKML-Reference: <1296783486-23033-2-git-send-email-kamal@canonical.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Avoid removing all of memory and panicing when "mem={invalid}"
is specified, e.g. mem=blahblah, mem=0, or mem=nopentium (on
platforms other than x86_32).
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/553464
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: <stable@kernel.org> # .3x: as far back as it applies
LKML-Reference: <1296783486-23033-1-git-send-email-kamal@canonical.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This one isn't related to previous patch. If online cpus are
below NUM_INVALIDATE_TLB_VECTORS, we don't need the lock. The
comments in the code declares we don't need the check, but a hot
lock still needs an atomic operation and expensive, so add the
check here.
Uses nr_cpu_ids here as suggested by Eric Dumazet.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
LKML-Reference: <1295232730.1949.710.camel@sli10-conroe>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make the maxium TLB invalidate vectors depend on NR_CPUS linearly,
with a maximum of 32 vectors.
We currently only have 8 vectors for TLB invalidation and that is clearly
inadequate. If we have a lot of CPUs, the CPUs need share the 8 vectors and
tlbstate_lock is used to protect them. flush_tlb_page() is
heavily used in page reclaim, which will cause a lot of lock
contention for tlbstate_lock.
Andi Kleen suggested increasing the vectors number to 32, which should be
good for current typical systems to reduce the tlbstate_lock contention.
My test system has 4 sockets and 64G memory, and 64 CPUs. My
workload creates 64 processes. Each process mmap reads a big
empty sparse file. The total size of the files are 2*total_mem,
so this will cause a lot of page reclaim.
Below is the result I get from perf call-graph profiling:
without the patch:
------------------
24.25% usemem [kernel] [k] _raw_spin_lock
|
--- _raw_spin_lock
|
|--42.15%-- native_flush_tlb_others
with the patch:
------------------
14.96% usemem [kernel] [k] _raw_spin_lock
|
--- _raw_spin_lock
|--13.89%-- native_flush_tlb_others
So this heavily reduces the tlbstate_lock contention.
Suggested-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1295232727.1949.709.camel@sli10-conroe>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add up to 32 invalidate_interrupt handlers. How many handlers are
added depends on NUM_INVALIDATE_TLB_VECTORS. So if
NUM_INVALIDATE_TLB_VECTORS is smaller than 32, we reduce code
size.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
LKML-Reference: <1295232725.1949.708.camel@sli10-conroe>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cleanup the vector usage and make them continuous if possible.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
LKML-Reference: <1295232722.1949.707.camel@sli10-conroe>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/x86/mm/numa_64.c
Merge reason: fix the conflict, update to latest -rc and pick up this
dependent fix from Yinghai:
e6d2e2b2b1e1: memblock: don't adjust size in memblock_find_base()
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 's5p-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
ARM: SAMSUNG: Ensure struct sys_device is declared in plat/pm.h
ARM: S5PV310: Cleanup System MMU
ARM: S5PV310: Add support System MMU on SMDKV310
Previously we were relying on it being pulled in by other headers for
the prototype of s3c24xx_irq_suspend() and s3c24xx_irq_resume().
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch cleans following up.
- Moved definition of System MMU IPNUM into mach/sysmmu.h
- Removed useless SYSMMU_DEBUG configuration
- Removed useless header file plat/sysmmu.h
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
The 's5pv310_device_sysmmu' is used on SMDKV310. But since it is not
compiled now, there is a build error. To fix this compilation error,
S5PV310_DEV_SYSMMU needs to be selected for SMDKV310 board.
This patch enables System MMU support on SMDKV310.
Signed-off-by: Thomas Abraham <thomas.ab@samsung.com>
[kgene.kim@samsung.com: Adding description]
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
* 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6:
serial: bfin_5xx: split uart RX lock from uart port lock to avoid deadlock
68360serial: Plumb in rs_360_get_icount()
n_gsm: copy mtu over when configuring via ioctl interface
virtio: console: Move file back to drivers/char/
amd_nb_misc_ids[] can live in .rodata, and enable_pci_io_ecs()
can be moved into .cpuinit.text.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Andreas Herrmann <Andreas.Herrmann3@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <4D525DDD0200007800030F07@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The gs_index loading code uses the swapgs instruction to
switch to the user gs_base temporarily. This is unsave in an
lightweight exit-path in KVM on AMD because the
KERNEL_GS_BASE MSR is switches lazily. An NMI happening in
the critical path of load_gs_index may use the wrong GS_BASE
value then leading to unpredictable behavior, e.g. a
triple-fault.
This patch fixes the issue by making sure that load_gs_index
is called only with a valid KERNEL_GS_BASE value loaded in
KVM.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
* master.kernel.org:/home/rmk/linux-2.6-arm:
ALSA: AACI: allow writes to MAINCR to take effect
ARM: Update mach-types
ARM: 6652/1: ep93xx: correct the end address of the AC97 memory resource
ARM: mxs/imx28: remove now unused clock lookup "fec.0"
ARM: mxs: fix clock base address missing
ARM: mxs: acknowledge gpio irq
ARM: mach-imx/mach-mx25_3ds: Fix section type
ARM: imx: Add VPR200 and MX51_3DS entries to uncompress.h
ARM i.MX23: use correct register for setting the rate
ARM i.MX23/28: remove secondary field from struct clk. It's unused
ARM i.MX28: use correct register for setting the rate
ARM i.MX28: fix bit operation
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Fix hcall tracepoint recursion
powerpc/numa: Fix bug in unmap_cpu_from_node
powerpc/numa: Disable VPHN on dedicated processor partitions
powerpc/numa: Add length when creating OF properties via VPHN
powerpc/numa: Check for all VPHN changes
powerpc/numa: Only use active VPHN count fields
powerpc/pseries: Remove unnecessary variable initializations in numa.c
powerpc/pseries: Fix brace placement in numa.c
powerpc/pseries: Fix typo in VPHN comments
powerpc: Fix some 6xx/7xxx CPU setup functions
powerpc: Pass the right cpu_spec to ->setup_cpu() on 64-bit
powerpc/book3e: Protect complex macro args in mmu-book3e.h
powerpc: Fix pfn_valid() when memory starts at a non-zero address
L3 Cache Partitioning allows selecting which of the 4 L3 subcaches can be used
for evictions by the L2 cache of each compute unit. By writing a 4-bit
hexadecimal mask into the the sysfs file
/sys/devices/system/cpu/cpuX/cache/index3/subcaches, the user can set the
enabled subcaches for a CPU.
The settings are directly read from and written to the hardware, so there is no
way to have contradicting settings for two CPUs belonging to the same compute
unit. Writing will always overwrite any previous setting for a compute unit.
Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: <Andreas.Herrmann3@amd.com>
LKML-Reference: <1297098639-431383-1-git-send-email-hans.rosenfeld@amd.com>
[ -v3: minor style fixes ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix msr instructions detection. The current code
just use msrclr for loading msr content and compare
it with proper MSR content. If msrclr is not implemented
r8 contains pc address.
Previous code wanted to use MSR carry bit but if msrclr
wasn't implemented carry wasn't cleared.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Do not disable irq in asm but use irq macros.
Systems with MSR=0 couldn't use pte_update function
because msrclr was hardcoded.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Patch: Fix IRQ flag handling naming
(sha1: f9ee29270c11dba7d0fe0b83ce47a4d8e8d2101)
introduced problem on system with MSR=0.
Signed-off-by: Michal Simek <monstr@monstr.eu>
We reserve lowmem for the things that need it, like the ACPI
wakeup code, way early to guarantee availability. This happens
before we set up the proper pagetables, so set_memory_x() has no
effect.
Until we have a better solution, use an initcall to mark the
wakeup code executable.
Originally-by: Matthieu Castet <castet.matthieu@free.fr>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Matthias Hopf <mhopf@suse.de>
Cc: rjw@sisk.pl
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <4D4F8019.2090104@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Spinlocks on shared processor partitions use H_YIELD to notify the
hypervisor we are waiting on another virtual CPU. Unfortunately this means
the hcall tracepoints can recurse.
The patch below adds a percpu depth and checks it on both the entry and
exit hcall tracepoints.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: stable@kernel.org
When converting to the new cpumask code I screwed up:
- if (cpu_isset(cpu, numa_cpumask_lookup_table[node])) {
- cpu_clear(cpu, numa_cpumask_lookup_table[node]);
+ if (cpumask_test_cpu(cpu, node_to_cpumask_map[node])) {
+ cpumask_set_cpu(cpu, node_to_cpumask_map[node]);
This was introduced in commit 25863de07a (powerpc/cpumask: Convert NUMA code
to new cpumask API)
Fix it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There is no need to start up the timer and monitor topology changes on a
dedicated processor partition, so disable it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The rest of the NUMA code expects an OF associativity property with
the first cell containing the length. Without this fix all topology changes
cause us to misparse the property and put the cpu into node 0.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The hypervisor uses unsigned 1 byte counters to signal topology changes to
the OS. Since they can wrap we need to check for any difference, not just if
the hypervisor count is greater than the previous count.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
VPHN supports up to 8 distance fields but the number of entries in
ibm,associativity-reference-points signifies how many are in use.
Don't look at all the VPHN counts, only distance_ref_points_depth
worth.
Since we already cap our distance metrics at MAX_DISTANCE_REF_POINTS,
use that to size the VPHN arrays and add a BUILD_BUG_ON to avoid it growing
larger than the VPHN maximum of 8.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Correct a spelling error in VPHN comments in numa.c.
Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Some of those functions try to adjust the CPU features, for example
to remove NAP support on some revisions. However, they seem to use
r5 as an index into the CPU table entry, which might have been right
a long time ago but no longer is. r4 is the right register to use.
This probably caused some off behaviours on some PowerMac variants
using 750cx or 7455 processor revisions.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: stable@kernel.org
When calling setup_cpu() on 64-bit, we pass a pointer to the
cputable entry we have found. This used to be fine when cur_cpu_spec
was a pointer to that entry, but nowadays, we copy the entry into
a separate variable, and we do so before we call the setup_cpu()
callback. That means that any attempt by that callback at patching
the CPU table entry (to adjust CPU features for example) will patch
the wrong table.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
max_mapnr is a pfn, not an index innto mem_map[]. So don't add
ARCH_PFN_OFFSET a second time.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
m32r: Fixup last __do_IRQ leftover
genirq: Add missing status flags to modification mask
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-32: Make sure the stack is set up before we use it
x86, mtrr: Avoid MTRR reprogramming on BP during boot on UP platforms
x86, nx: Don't force pages RW when setting NX bits