linux

Commit Graph

Author	SHA1	Message	Date
Benjamin Herrenschmidt	a63fdc5156	mm: Move definition of MIN_MEMORY_BLOCK_SIZE to a header The macro MIN_MEMORY_BLOCK_SIZE is currently defined twice in two .c files, and I need it in a third one to fix a powerpc bug, so let's first move it into a header Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Ingo Molnar <mingo@elte.hu>	2011-07-12 11:08:01 +10:00
David Rientjes	dc382fd5bc	x86, mm: Allow ZONE_DMA to be configurable ZONE_DMA is unnecessary for a large number of machines that do not require less than 32-bit DMA addressing, e.g. ISA legacy DMA or PCI cards with a restricted DMA address mask. This patch allows users to disable ZONE_DMA for x86 if they know they will not be using such devices with their kernel. This prevents the VM from unnecessarily reserving a ratio of memory (defaulting to 1/256th of system capacity) with lowmem_reserve_ratio for such allocations when it will never be used. Signed-off-by: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1105161353560.4353@chino.kir.corp.google.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-05-16 14:03:28 -07:00
Tejun Heo	9688678a66	x86-64, NUMA: Simplify hotadd memory handling The only special handling NUMA needs to do for hotadd memory is determining the node for the hotadd memory given the address of it and there's nothing specific to specific config method used. srat_64.c does somewhat elaborate error checking on ACPI_SRAT_MEM_HOT_PLUGGABLE regions, remembers them and implements memory_add_physaddr_to_nid() which determines the node for given hotadd address. This is almost completely redundant. All the information is already available to the generic NUMA code which already performs all the sanity checking and merging. All that's necessary is not using __initdata from numa_meminfo and providing a function which uses it to map address to node. Drop the specific implementation from srat_64.c and add generic memory_add_physaddr_to_nid() in numa_64.c, which is enabled if CONFIG_MEMORY_HOTPLUG is set. Other than dropping the code, srat_64.c doesn't need any change as it already calls numa_add_memblk() for hot pluggable regions which is enough. While at it, change CONFIG_MEMORY_HOTPLUG_SPARSE in srat_64.c to CONFIG_MEMORY_HOTPLUG, for NUMA on x86-64, the two are always the same. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:51 +02:00
Linus Torvalds	b81a618dcd	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: deal with races in /proc//{syscall,stack,personality} proc: enable writing to /proc/pid/mem proc: make check_mem_permission() return an mm_struct on success proc: hold cred_guard_mutex in check_mem_permission() proc: disable mem_write after exec mm: implement access_remote_vm mm: factor out main logic of access_process_vm mm: use mm_struct to resolve gate vma's in __get_user_pages mm: arch: rename in_gate_area_no_task to in_gate_area_no_mm mm: arch: make in_gate_area take an mm_struct instead of a task_struct mm: arch: make get_gate_vma take an mm_struct instead of a task_struct x86: mark associated mm when running a task in 32 bit compatibility mode x86: add context tag to mark mm when running a task in 32-bit compatibility mode auxv: require the target to be tracable (or yourself) close race in /proc//environ report errors in /proc//map* sanely pagemap: close races with suid execve make sessionid permissions in /proc//task/ match those in /proc/* fix leaks in path_lookupat() Fix up trivial conflicts in fs/proc/base.c	2011-03-23 20:51:42 -07:00
Stephen Wilson	cae5d39032	mm: arch: rename in_gate_area_no_task to in_gate_area_no_mm Now that gate vma's are referenced with respect to a particular mm and not a particular task it only makes sense to propagate the change to this predicate as well. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-23 16:36:55 -04:00
Stephen Wilson	83b964bbf8	mm: arch: make in_gate_area take an mm_struct instead of a task_struct Morally, the question of whether an address lies in a gate vma should be asked with respect to an mm, not a particular task. Moreover, dropping the dependency on task_struct will help make existing and future operations on mm's more flexible and convenient. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-23 16:36:54 -04:00
Stephen Wilson	31db58b3ab	mm: arch: make get_gate_vma take an mm_struct instead of a task_struct Morally, the presence of a gate vma is more an attribute of a particular mm than a particular task. Moreover, dropping the dependency on task_struct will help make both existing and future operations on mm's more flexible and convenient. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-23 16:36:54 -04:00
Linus Torvalds	73d5a8675f	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: xen: update mask_rw_pte after kernel page tables init changes xen: set max_pfn_mapped to the last pfn mapped x86: Cleanup highmap after brk is concluded Fix up trivial onflict (added header file includes) in arch/x86/mm/init_64.c	2011-03-22 10:41:36 -07:00
Yinghai Lu	e5f15b45dd	x86: Cleanup highmap after brk is concluded Now cleanup_highmap actually is in two steps: one is early in head64.c and only clears above _end; a second one is in init_memory_mapping() and tries to clean from _brk_end to _end. It should check if those boundaries are PMD_SIZE aligned but currently does not. Also init_memory_mapping() is called several times for numa or memory hotplug, so we really should not handle initial kernel mappings there. This patch moves cleanup_highmap() down after _brk_end is settled so we can do everything in one step. Also we honor max_pfn_mapped in the implementation of cleanup_highmap. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-03-19 11:58:19 -07:00
Linus Torvalds	a5e6b135bd	Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 * 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (50 commits) printk: do not mangle valid userspace syslog prefixes efivars: Add Documentation efivars: Expose efivars functionality to external drivers. efivars: Parameterize operations. efivars: Split out variable registration efivars: parameterize efivars efivars: Make efivars bin_attributes dynamic efivars: move efivars globals into struct efivars drivers:misc: ti-st: fix debugging code kref: Fix typo in kref documentation UIO: add PRUSS UIO driver support Fix spelling mistakes in Documentation/zh_CN/SubmittingPatches firmware: Fix unaligned memory accesses in dmi-sysfs firmware: Add documentation for /sys/firmware/dmi firmware: Expose DMI type 15 System Event Log firmware: Break out system_event_log in dmi-sysfs firmware: Basic dmi-sysfs support firmware: Add DMI entry types to the headers Driver core: convert platform_{get,set}_drvdata to static inline functions Translate linux-2.6/Documentation/magic-number.txt into Chinese ...	2011-03-16 15:05:40 -07:00
Ingo Molnar	8460b3e5bc	Merge commit 'v2.6.38' into x86/mm Conflicts: arch/x86/mm/numa_64.c Merge reason: Resolve the conflict, update the branch to .38. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-15 08:29:44 +01:00
Andrea Arcangeli	a79e53d856	x86/mm: Fix pgd_lock deadlock It's forbidden to take the page_table_lock with the irq disabled or if there's contention the IPIs (for tlb flushes) sent with the page_table_lock held will never run leading to a deadlock. Nobody takes the pgd_lock from irq context so the _irqsave can be removed. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@kernel.org> LKML-Reference: <201102162345.p1GNjMjm021738@imap1.linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-10 09:41:57 +01:00
Tejun Heo	f891125028	x86-64, NUMA: Revert NUMA affine page table allocation This patch reverts NUMA affine page table allocation added by commit `1411e0ec31` (x86-64, numa: Put pgtable to local node memory). The commit made an undocumented change where the kernel linear mapping strictly follows intersection of e820 memory map and NUMA configuration. If the physical memory configuration has holes or NUMA nodes are not properly aligned, this leads to using unnecessarily smaller mapping size which leads to increased TLB pressure. For details, http://thread.gmane.org/gmane.linux.kernel/1104672 Patches to fix the problem have been proposed but the underlying code needs more cleanup and the approach itself seems a bit heavy handed and it has been determined to revert the feature for now and come back to it in the next developement cycle. http://thread.gmane.org/gmane.linux.kernel/1105959 As init_memory_mapping_high() callsites have been consolidated since the commit, reverting is done manually. Also, the RED-PEN comment in arch/x86/mm/init.c is not restored as the problem no longer exists with memblock based top-down early memory allocation. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de>	2011-03-04 10:26:36 +01:00
Yinghai Lu	d1b19426b0	x86: Rename e820_table_* to pgt_buf_* e820_table_{start\|end\|top}, which are used to buffer page table allocation during early boot, are now derived from memblock and don't have much to do with e820. Change the names so that they reflect what they're used for. This patch doesn't introduce any behavior change. -v2: Ingo found that earlier patch "x86: Use early pre-allocated page table buffer top-down" caused crash on 32bit and needed to be dropped. This patch was updated to reflect the change. -tj: Updated commit description. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-02-24 14:52:18 +01:00
Tejun Heo	d8fc3afc49	x86, NUMA: Move *_numa_init() invocations into initmem_init() There's no reason for these to live in setup_arch(). Move them inside initmem_init(). - v2: x86-32 initmem_init() weren't updated breaking 32bit builds. Fixed. Found by Ankita. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ankita Garg <ankita@in.ibm.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 12:13:06 +01:00
Tejun Heo	86ef4dbf1f	x86, NUMA: Drop @start/last_pfn from initmem_init() initmem_init() extensively accesses and modifies global data structures and the parameters aren't even followed depending on which path is being used. Drop @start/last_pfn and let it deal with @max_pfn directly. This is in preparation for further NUMA init cleanups. - v2: x86-32 initmem_init() weren't updated breaking 32bit builds. Fixed. Found by Yinghai. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 12:13:06 +01:00
Nathan Fontenot	1dc41aa6d6	memory hotplug: Define memory_block_size_bytes for x86_64 with CONFIG_X86_UV Define a version of memory_block_size_bytes for x86_64 when CONFIG_X86_UV is set. Signed-off-by: Robin Holt <holt@sgi.com> Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-03 16:08:58 -08:00
Yinghai Lu	1411e0ec31	x86-64, numa: Put pgtable to local node memory Introduce init_memory_mapping_high(), and use it with 64bit. It will go with every memory segment above 4g to create page table to the memory range itself. before this patch all page tables was on one node. with this patch, one RED-PEN is killed debug out for 8 sockets system after patch [ 0.000000] initial memory mapped : 0 - 20000000 [ 0.000000] init_memory_mapping: [0x00000000000000-0x0000007f74ffff] [ 0.000000] 0000000000 - 007f600000 page 2M [ 0.000000] 007f600000 - 007f750000 page 4k [ 0.000000] kernel direct mapping tables up to 7f750000 @ [0x7f74c000-0x7f74ffff] [ 0.000000] RAMDISK: 7bc84000 - 7f745000 .... [ 0.000000] Adding active range (0, 0x10, 0x95) 0 entries of 3200 used [ 0.000000] Adding active range (0, 0x100, 0x7f750) 1 entries of 3200 used [ 0.000000] Adding active range (0, 0x100000, 0x1080000) 2 entries of 3200 used [ 0.000000] Adding active range (1, 0x1080000, 0x2080000) 3 entries of 3200 used [ 0.000000] Adding active range (2, 0x2080000, 0x3080000) 4 entries of 3200 used [ 0.000000] Adding active range (3, 0x3080000, 0x4080000) 5 entries of 3200 used [ 0.000000] Adding active range (4, 0x4080000, 0x5080000) 6 entries of 3200 used [ 0.000000] Adding active range (5, 0x5080000, 0x6080000) 7 entries of 3200 used [ 0.000000] Adding active range (6, 0x6080000, 0x7080000) 8 entries of 3200 used [ 0.000000] Adding active range (7, 0x7080000, 0x8080000) 9 entries of 3200 used [ 0.000000] init_memory_mapping: [0x00000100000000-0x0000107fffffff] [ 0.000000] 0100000000 - 1080000000 page 2M [ 0.000000] kernel direct mapping tables up to 1080000000 @ [0x107ffbd000-0x107fffffff] [ 0.000000] memblock_x86_reserve_range: [0x107ffc2000-0x107fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00001080000000-0x0000207fffffff] [ 0.000000] 1080000000 - 2080000000 page 2M [ 0.000000] kernel direct mapping tables up to 2080000000 @ [0x207ff7d000-0x207fffffff] [ 0.000000] memblock_x86_reserve_range: [0x207ffc0000-0x207fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00002080000000-0x0000307fffffff] [ 0.000000] 2080000000 - 3080000000 page 2M [ 0.000000] kernel direct mapping tables up to 3080000000 @ [0x307ff3d000-0x307fffffff] [ 0.000000] memblock_x86_reserve_range: [0x307ffc0000-0x307fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00003080000000-0x0000407fffffff] [ 0.000000] 3080000000 - 4080000000 page 2M [ 0.000000] kernel direct mapping tables up to 4080000000 @ [0x407fefd000-0x407fffffff] [ 0.000000] memblock_x86_reserve_range: [0x407ffc0000-0x407fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00004080000000-0x0000507fffffff] [ 0.000000] 4080000000 - 5080000000 page 2M [ 0.000000] kernel direct mapping tables up to 5080000000 @ [0x507febd000-0x507fffffff] [ 0.000000] memblock_x86_reserve_range: [0x507ffc0000-0x507fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00005080000000-0x0000607fffffff] [ 0.000000] 5080000000 - 6080000000 page 2M [ 0.000000] kernel direct mapping tables up to 6080000000 @ [0x607fe7d000-0x607fffffff] [ 0.000000] memblock_x86_reserve_range: [0x607ffc0000-0x607fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00006080000000-0x0000707fffffff] [ 0.000000] 6080000000 - 7080000000 page 2M [ 0.000000] kernel direct mapping tables up to 7080000000 @ [0x707fe3d000-0x707fffffff] [ 0.000000] memblock_x86_reserve_range: [0x707ffc0000-0x707fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00007080000000-0x0000807fffffff] [ 0.000000] 7080000000 - 8080000000 page 2M [ 0.000000] kernel direct mapping tables up to 8080000000 @ [0x807fdfc000-0x807fffffff] [ 0.000000] memblock_x86_reserve_range: [0x807ffbf000-0x807fffffff] PGTABLE [ 0.000000] Initmem setup node 0 [0000000000000000-000000107fffffff] [ 0.000000] NODE_DATA [0x0000107ffbd000-0x0000107ffc1fff] [ 0.000000] Initmem setup node 1 [0000001080000000-000000207fffffff] [ 0.000000] NODE_DATA [0x0000207ffbb000-0x0000207ffbffff] [ 0.000000] Initmem setup node 2 [0000002080000000-000000307fffffff] [ 0.000000] NODE_DATA [0x0000307ffbb000-0x0000307ffbffff] [ 0.000000] Initmem setup node 3 [0000003080000000-000000407fffffff] [ 0.000000] NODE_DATA [0x0000407ffbb000-0x0000407ffbffff] [ 0.000000] Initmem setup node 4 [0000004080000000-000000507fffffff] [ 0.000000] NODE_DATA [0x0000507ffbb000-0x0000507ffbffff] [ 0.000000] Initmem setup node 5 [0000005080000000-000000607fffffff] [ 0.000000] NODE_DATA [0x0000607ffbb000-0x0000607ffbffff] [ 0.000000] Initmem setup node 6 [0000006080000000-000000707fffffff] [ 0.000000] NODE_DATA [0x0000707ffbb000-0x0000707ffbffff] [ 0.000000] Initmem setup node 7 [0000007080000000-000000807fffffff] [ 0.000000] NODE_DATA [0x0000807ffba000-0x0000807ffbefff] Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4D1933D1.9020609@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-29 15:48:08 -08:00
Yinghai Lu	4b239f458c	x86-64, mm: Put early page table high While dubug kdump, found current kernel will have problem with crashkernel=512M. It turns out that initial mapping is to 512M, and later initial mapping to 4G (acutally is 2040M in my platform), will put page table near 512M. then initial mapping to 128g will be near 2g. before this patch: [ 0.000000] initial memory mapped : 0 - 20000000 [ 0.000000] init_memory_mapping: [0x00000000000000-0x0000007f74ffff] [ 0.000000] 0000000000 - 007f600000 page 2M [ 0.000000] 007f600000 - 007f750000 page 4k [ 0.000000] kernel direct mapping tables up to 7f750000 @ [0x1fffc000-0x1fffffff] [ 0.000000] memblock_x86_reserve_range: [0x1fffc000-0x1fffdfff] PGTABLE [ 0.000000] init_memory_mapping: [0x00000100000000-0x0000207fffffff] [ 0.000000] 0100000000 - 2080000000 page 2M [ 0.000000] kernel direct mapping tables up to 2080000000 @ [0x7bc01000-0x7bc83fff] [ 0.000000] memblock_x86_reserve_range: [0x7bc01000-0x7bc7efff] PGTABLE [ 0.000000] RAMDISK: 7bc84000 - 7f745000 [ 0.000000] crashkernel reservation failed - No suitable area found. after patch: [ 0.000000] initial memory mapped : 0 - 20000000 [ 0.000000] init_memory_mapping: [0x00000000000000-0x0000007f74ffff] [ 0.000000] 0000000000 - 007f600000 page 2M [ 0.000000] 007f600000 - 007f750000 page 4k [ 0.000000] kernel direct mapping tables up to 7f750000 @ [0x7f74c000-0x7f74ffff] [ 0.000000] memblock_x86_reserve_range: [0x7f74c000-0x7f74dfff] PGTABLE [ 0.000000] init_memory_mapping: [0x00000100000000-0x0000207fffffff] [ 0.000000] 0100000000 - 2080000000 page 2M [ 0.000000] kernel direct mapping tables up to 2080000000 @ [0x207ff7d000-0x207fffffff] [ 0.000000] memblock_x86_reserve_range: [0x207ff7d000-0x207fffafff] PGTABLE [ 0.000000] RAMDISK: 7bc84000 - 7f745000 [ 0.000000] memblock_x86_reserve_range: [0x17000000-0x36ffffff] CRASH KERNEL [ 0.000000] Reserving 512MB of memory at 368MB for crashkernel (System RAM: 133120MB) It means with the patch, page table for [0, 2g) will need 2g, instead of under 512M, page table for [4g, 128g) will be near 128g, instead of under 2g. That would good, if we have lots of memory above 4g, like 1024g, or 2048g or 16T, will not put related page table under 2g. that would be have chance to fill the under 2g if 1G or 2M page is not used. the code change will use add map_low_page() and update unmap_low_page() for 64bit, and use them to get access the corresponding high memory for page table setting. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4D0C0734.7060900@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-29 14:46:54 -08:00
Zimny Lech	61d8e11e51	Remove duplicate includes from many files Signed-off-by: Zimny Lech <napohybelskurwysynom2010@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-27 18:03:18 -07:00
Linus Torvalds	3044100e58	Merge branch 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (74 commits) x86-64: Only set max_pfn_mapped to 512 MiB if we enter via head_64.S xen: Cope with unmapped pages when initializing kernel pagetable memblock, bootmem: Round pfn properly for memory and reserved regions memblock: Annotate memblock functions with __init_memblock memblock: Allow memblock_init to be called early memblock/arm: Fix memblock_region_is_memory() typo x86, memblock: Remove __memblock_x86_find_in_range_size() memblock: Fix wraparound in find_region() x86-32, memblock: Make add_highpages honor early reserved ranges x86, memblock: Fix crashkernel allocation arm, memblock: Fix the sparsemem build memblock: Fix section mismatch warnings powerpc, memblock: Fix memblock API change fallout memblock, microblaze: Fix memblock API change fallout x86: Remove old bootmem code x86, memblock: Use memblock_memory_size()/memblock_free_memory_size() to get correct dma_reserve x86: Remove not used early_res code x86, memblock: Replace e820_/_early string with memblock_ x86: Use memblock to replace early_res x86, memblock: Use memblock_debug to control debug message print out ... Fix up trivial conflicts in arch/x86/kernel/setup.c and kernel/Makefile	2010-10-21 18:52:11 -07:00
Linus Torvalds	c3b86a2942	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32, percpu: Correct the ordering of the percpu readmostly section x86, mm: Enable ARCH_DMA_ADDR_T_64BIT with X86_64 \|\| HIGHMEM64G x86: Spread tlb flush vector between nodes percpu: Introduce a read-mostly percpu API x86, mm: Fix incorrect data type in vmalloc_sync_all() x86, mm: Hold mm->page_table_lock while doing vmalloc_sync x86, mm: Fix bogus whitespace in sync_global_pgds() x86-32: Fix sparse warning for the __PHYSICAL_MASK calculation x86, mm: Add RESERVE_BRK_ARRAY() helper mm, x86: Saving vmcore with non-lazy freeing of vmas x86, kdump: Change copy_oldmem_page() to use cached addressing x86, mm: fix uninitialized addr in kernel_physical_mapping_init() x86, kmemcheck: Remove double test x86, mm: Make spurious_fault check explicitly check the PRESENT bit x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes x86, mm: Separate x86_64 vmalloc_sync_all() into separate functions x86, mm: Avoid unnecessary TLB flush	2010-10-21 13:47:29 -07:00
Jeremy Fitzhardinge	617d34d9e5	x86, mm: Hold mm->page_table_lock while doing vmalloc_sync Take mm->page_table_lock while syncing the vmalloc region. This prevents a race with the Xen pagetable pin/unpin code, which expects that the page_table_lock is already held. If this race occurs, then Xen can see an inconsistent page type (a page can either be read/write or a pagetable page, and pin/unpin converts it between them), which will cause either the pin or the set_p[gm]d to fail; either will crash the kernel. vmalloc_sync_all() should be called rarely, so this extra use of page_table_lock should not interfere with its normal users. The mm pointer is stashed in the pgd page's index field, as that won't be otherwise used for pgds. Reported-by: Ian Campbell <ian.cambell@eu.citrix.com> Originally-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4CB88A4C.1080305@goop.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-19 13:57:08 -07:00
Jeremy Fitzhardinge	44235dcde4	x86, mm: Fix bogus whitespace in sync_global_pgds() Whitespace cleanup only. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-19 13:56:03 -07:00
Jan Beulich	234bb549ee	x86, cleanups: Use clear_page/copy_page rather than memset/memcpy When operating on whole pages, use clear_page() and copy_page() in favor of memset() and memcpy(); after all that's what they are intended for. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4C7FB8CA0200007800013F51@vpn.id2.novell.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-22 15:36:49 -07:00
Wu Fengguang	1c5f50ee34	x86, mm: fix uninitialized addr in kernel_physical_mapping_init() This re-adds the lost chunk in commit `9b861528a8`. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Haicheng Li <haicheng.li@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> LKML-Reference: <20100903090407.GA19771@localhost> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-03 11:40:11 +02:00
Ingo Molnar	daab7fc734	Merge commit 'v2.6.36-rc3' into x86/memblock Conflicts: arch/x86/kernel/trampoline.c mm/memblock.c Merge reason: Resolve the conflicts, update to latest upstream. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-08-31 09:45:46 +02:00
Yinghai Lu	774ea0bcb2	x86: Remove old bootmem code Requested by Ingo, Thomas and HPA. The old bootmem code is no longer necessary, and the transition is complete. Remove it. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:14:37 -07:00
Yinghai Lu	6f2a75369e	x86, memblock: Use memblock_memory_size()/memblock_free_memory_size() to get correct dma_reserve memblock_memory_size() will return memory size in memblock.memory.region. memblock_free_memory_size() will return free memory size in memblock.memory.region. So We can get exact reseved size in specified range. Set the size right after initmem_init(), because later bootmem API will get area above 16M. (except some fallback). Later after we remove the bootmem, We could call that just before paging_init(). Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:13:54 -07:00
Yinghai Lu	a9ce6bc151	x86, memblock: Replace e820_/_early string with memblock_ 1.include linux/memblock.h directly. so later could reduce e820.h reference. 2 this patch is done by sed scripts mainly -v2: use MEMBLOCK_ERROR instead of -1ULL or -1UL Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:13:47 -07:00
Yinghai Lu	f88eff74aa	bootmem, x86: Add weak version of reserve_bootmem_generic It will be used memblock_x86_to_bootmem converting It is an wrapper for reserve_bootmem, and x86 64bit is using special one. Also clean up that version for x86_64. We don't need to take care of numa path for that, bootmem can handle it how Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:08:13 -07:00
Haicheng Li	9b861528a8	x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes When memory hotplug-adding happens for a large enough area that a new PGD entry is needed for the direct mapping, the PGDs of other processes would not get updated. This leads to some CPUs oopsing like below when they have to access the unmapped areas. [ 1139.243192] BUG: soft lockup - CPU#0 stuck for 61s! [bash:6534] [ 1139.243195] Modules linked in: ipv6 autofs4 rfcomm l2cap crc16 bluetooth rfkill binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc fan battery ac parport_pc lp parport joydev usbhid processor thermal thermal_sys container button rtc_cmos rtc_core rtc_lib i2c_i801 i2c_core pcspkr uhci_hcd ohci_hcd ehci_hcd usbcore [ 1139.243229] irq event stamp: 8538759 [ 1139.243230] hardirqs last enabled at (8538759): [<ffffffff8100c3fc>] restore_args+0x0/0x30 [ 1139.243236] hardirqs last disabled at (8538757): [<ffffffff810422df>] __do_softirq+0x106/0x146 [ 1139.243240] softirqs last enabled at (8538758): [<ffffffff81042310>] __do_softirq+0x137/0x146 [ 1139.243245] softirqs last disabled at (8538743): [<ffffffff8100cb5c>] call_softirq+0x1c/0x34 [ 1139.243249] CPU 0: [ 1139.243250] Modules linked in: ipv6 autofs4 rfcomm l2cap crc16 bluetooth rfkill binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc fan battery ac parport_pc lp parport joydev usbhid processor thermal thermal_sys container button rtc_cmos rtc_core rtc_lib i2c_i801 i2c_core pcspkr uhci_hcd ohci_hcd ehci_hcd usbcore [ 1139.243284] Pid: 6534, comm: bash Tainted: G M 2.6.32-haicheng-cpuhp #7 QSSC-S4R [ 1139.243287] RIP: 0010:[<ffffffff810ace35>] [<ffffffff810ace35>] alloc_arraycache+0x35/0x69 [ 1139.243292] RSP: 0018:ffff8802799f9d78 EFLAGS: 00010286 [ 1139.243295] RAX: ffff8884ffc00000 RBX: ffff8802799f9d98 RCX: 0000000000000000 [ 1139.243297] RDX: 0000000000190018 RSI: 0000000000000001 RDI: ffff8884ffc00010 [ 1139.243300] RBP: ffffffff8100c34e R08: 0000000000000002 R09: 0000000000000000 [ 1139.243303] R10: ffffffff8246dda0 R11: 000000d08246dda0 R12: ffff8802599bfff0 [ 1139.243305] R13: ffff88027904c040 R14: ffff8802799f8000 R15: 0000000000000001 [ 1139.243308] FS: 00007fe81bfe86e0(0000) GS:ffff88000d800000(0000) knlGS:0000000000000000 [ 1139.243311] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1139.243313] CR2: ffff8884ffc00000 CR3: 000000026cf2d000 CR4: 00000000000006f0 [ 1139.243316] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1139.243318] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1139.243321] Call Trace: [ 1139.243324] [<ffffffff810ace29>] ? alloc_arraycache+0x29/0x69 [ 1139.243328] [<ffffffff8135004e>] ? cpuup_callback+0x1b0/0x32a [ 1139.243333] [<ffffffff8105385d>] ? notifier_call_chain+0x33/0x5b [ 1139.243337] [<ffffffff810538a4>] ? __raw_notifier_call_chain+0x9/0xb [ 1139.243340] [<ffffffff8134ecfc>] ? cpu_up+0xb3/0x152 [ 1139.243344] [<ffffffff813388ce>] ? store_online+0x4d/0x75 [ 1139.243348] [<ffffffff811e53f3>] ? sysdev_store+0x1b/0x1d [ 1139.243351] [<ffffffff8110589f>] ? sysfs_write_file+0xe5/0x121 [ 1139.243355] [<ffffffff810b539d>] ? vfs_write+0xae/0x14a [ 1139.243358] [<ffffffff810b587f>] ? sys_write+0x47/0x6f [ 1139.243362] [<ffffffff8100b9ab>] ? system_call_fastpath+0x16/0x1b This patch makes sure to always replicate new direct mapping PGD entries to the PGDs of all processes, as well as ensures corresponding vmemmap mapping gets synced. V1: initial code by Andi Kleen. V2: fix several issues found in testing. V3: as suggested by Wu Fengguang, reuse common code of vmalloc_sync_all(). [ hpa: changed pgd_change from int to bool ] Originally-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> LKML-Reference: <4C6E4FD8.6080100@linux.intel.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-26 14:02:33 -07:00
Haicheng Li	6afb5157b9	x86, mm: Separate x86_64 vmalloc_sync_all() into separate functions No behavior change. Move some of vmalloc_sync_all() code into a new function sync_global_pgds() that will be useful for memory hotplug. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> LKML-Reference: <4C6E4ECD.1090607@linux.intel.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-26 14:02:29 -07:00
Pavel Machek	a2531293db	update email address pavel@suse.cz no longer works, replace it with working address. Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-07-19 10:56:54 +02:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Yinghai Lu	9bdac91424	sparsemem: Put mem map for one node together. Add vmemmap_alloc_block_buf for mem map only. It will fallback to the old way if it cannot get a block that big. Before this patch, when a node have 128g ram installed, memmap are split into two parts or more. [ 0.000000] [ffffea0000000000-ffffea003fffffff] PMD -> [ffff880100600000-ffff88013e9fffff] on node 1 [ 0.000000] [ffffea0040000000-ffffea006fffffff] PMD -> [ffff88013ec00000-ffff88016ebfffff] on node 1 [ 0.000000] [ffffea0070000000-ffffea007fffffff] PMD -> [ffff882000600000-ffff8820105fffff] on node 0 [ 0.000000] [ffffea0080000000-ffffea00bfffffff] PMD -> [ffff882010800000-ffff8820507fffff] on node 0 [ 0.000000] [ffffea00c0000000-ffffea00dfffffff] PMD -> [ffff882050a00000-ffff8820709fffff] on node 0 [ 0.000000] [ffffea00e0000000-ffffea00ffffffff] PMD -> [ffff884000600000-ffff8840205fffff] on node 2 [ 0.000000] [ffffea0100000000-ffffea013fffffff] PMD -> [ffff884020800000-ffff8840607fffff] on node 2 [ 0.000000] [ffffea0140000000-ffffea014fffffff] PMD -> [ffff884060a00000-ffff8840709fffff] on node 2 [ 0.000000] [ffffea0150000000-ffffea017fffffff] PMD -> [ffff886000600000-ffff8860305fffff] on node 3 [ 0.000000] [ffffea0180000000-ffffea01bfffffff] PMD -> [ffff886030800000-ffff8860707fffff] on node 3 [ 0.000000] [ffffea01c0000000-ffffea01ffffffff] PMD -> [ffff888000600000-ffff8880405fffff] on node 4 [ 0.000000] [ffffea0200000000-ffffea022fffffff] PMD -> [ffff888040800000-ffff8880707fffff] on node 4 [ 0.000000] [ffffea0230000000-ffffea023fffffff] PMD -> [ffff88a000600000-ffff88a0105fffff] on node 5 [ 0.000000] [ffffea0240000000-ffffea027fffffff] PMD -> [ffff88a010800000-ffff88a0507fffff] on node 5 [ 0.000000] [ffffea0280000000-ffffea029fffffff] PMD -> [ffff88a050a00000-ffff88a0709fffff] on node 5 [ 0.000000] [ffffea02a0000000-ffffea02bfffffff] PMD -> [ffff88c000600000-ffff88c0205fffff] on node 6 [ 0.000000] [ffffea02c0000000-ffffea02ffffffff] PMD -> [ffff88c020800000-ffff88c0607fffff] on node 6 [ 0.000000] [ffffea0300000000-ffffea030fffffff] PMD -> [ffff88c060a00000-ffff88c0709fffff] on node 6 [ 0.000000] [ffffea0310000000-ffffea033fffffff] PMD -> [ffff88e000600000-ffff88e0305fffff] on node 7 [ 0.000000] [ffffea0340000000-ffffea037fffffff] PMD -> [ffff88e030800000-ffff88e0707fffff] on node 7 after patch will get [ 0.000000] [ffffea0000000000-ffffea006fffffff] PMD -> [ffff880100200000-ffff88016e5fffff] on node 0 [ 0.000000] [ffffea0070000000-ffffea00dfffffff] PMD -> [ffff882000200000-ffff8820701fffff] on node 1 [ 0.000000] [ffffea00e0000000-ffffea014fffffff] PMD -> [ffff884000200000-ffff8840701fffff] on node 2 [ 0.000000] [ffffea0150000000-ffffea01bfffffff] PMD -> [ffff886000200000-ffff8860701fffff] on node 3 [ 0.000000] [ffffea01c0000000-ffffea022fffffff] PMD -> [ffff888000200000-ffff8880701fffff] on node 4 [ 0.000000] [ffffea0230000000-ffffea029fffffff] PMD -> [ffff88a000200000-ffff88a0701fffff] on node 5 [ 0.000000] [ffffea02a0000000-ffffea030fffffff] PMD -> [ffff88c000200000-ffff88c0701fffff] on node 6 [ 0.000000] [ffffea0310000000-ffffea037fffffff] PMD -> [ffff88e000200000-ffff88e0701fffff] on node 7 -v2: change buf to vmemmap_buf instead according to Ingo also add CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER according to Ingo -v3: according to Andrew, use sizeof(name) instead of hard coded 15 Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-19-git-send-email-yinghai@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-02-12 09:42:38 -08:00
Yinghai Lu	08677214e3	x86: Make 64 bit use early_res instead of bootmem before slab Finally we can use early_res to replace bootmem for x86_64 now. Still can use CONFIG_NO_BOOTMEM to enable it or not. -v2: fix 32bit compiling about MAX_DMA32_PFN -v3: folded bug fix from LKML message below Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4B747239.4070907@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-02-12 09:41:59 -08:00
Yinghai Lu	1842f90cc9	x86: Call early_res_to_bootmem one time Simplify setup_node_mem: don't use bootmem from other node, instead just find_e820_area in early_node_mem. This keeps the boundary between early_res and boot mem more clear, and lets us only call early_res_to_bootmem() one time instead of for all nodes. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-12-git-send-email-yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-02-10 17:47:18 -08:00
Shaohui Zheng	ea0854170c	memory hotplug: fix a bug on /dev/mem for 64-bit kernels Newly added memory can not be accessed via /dev/mem, because we do not update the variables high_memory, max_pfn and max_low_pfn. Add a function update_end_of_memory_vars() to update these variables for 64-bit kernels. [akpm@linux-foundation.org: simplify comment] Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Li Haicheng <haicheng.li@intel.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-02-02 18:11:23 -08:00
Suresh Siddha	e7d23dde9b	x86_64, cpa: Use only text section in set_kernel_text_rw/ro set_kernel_text_rw()/set_kernel_text_ro() are marking pages starting from _text to __start_rodata as RW or RO. With CONFIG_DEBUG_RODATA, there might be free pages (associated with padding the sections to 2MB large page boundary) between text and rodata sections that are given back to page allocator. So we should use only use the start (__text) and end (__stop___ex_table) of the text section in set_kernel_text_rw()/set_kernel_text_ro(). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <20091029024821.164525222@sbs-t61.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-02 17:17:24 +01:00
Suresh Siddha	502f660466	x86, cpa: Fix kernel text RO checks in static_protection() Steven Rostedt reported that we are unconditionally making the kernel text mapping as read-only. i.e., if someone does cpa() to the kernel text area for setting/clearing any page table attribute, we unconditionally clear the read-write attribute for the kernel text mapping that is set at compile time. We should delay (to forbid the write attribute) and enforce only after the kernel has mapped the text as read-only. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <20091029024820.996634347@sbs-t61.sc.intel.com> [ marked kernel_set_to_readonly as __read_mostly ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-02 17:16:35 +01:00
Suresh Siddha	74e081797b	x86-64: align RODATA kernel section to 2MB with CONFIG_DEBUG_RODATA CONFIG_DEBUG_RODATA chops the large pages spanning boundaries of kernel text/rodata/data to small 4KB pages as they are mapped with different attributes (text as RO, RODATA as RO and NX etc). On x86_64, preserve the large page mappings for kernel text/rodata/data boundaries when CONFIG_DEBUG_RODATA is enabled. This is done by allowing the RODATA section to be hugepage aligned and having same RWX attributes for the 2MB page boundaries Extra Memory pages padding the sections will be freed during the end of the boot and the kernel identity mappings will have different RWX permissions compared to the kernel text mappings. Kernel identity mappings to these physical pages will be mapped with smaller pages but large page mappings are still retained for kernel text,rodata,data mappings. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20091014220254.190119924@sbs-t61.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-10-20 14:46:00 +09:00
Suresh Siddha	b9af7c0d44	x86-64: preserve large page mapping for 1st 2MB kernel txt with CONFIG_DEBUG_RODATA In the first 2MB, kernel text is co-located with kernel static page tables setup by head_64.S. CONFIG_DEBUG_RODATA chops this 2MB large page mapping to small 4KB pages as we mark the kernel text as RO, leaving the static page tables as RW. With CONFIG_DEBUG_RODATA disabled, OLTP run on NHM-EP shows 1% improvement with 2% reduction in system time and 1% improvement in iowait idle time. To recover this, move the kernel static page tables to .data section, so that we don't have to break the first 2MB of kernel text to small pages with CONFIG_DEBUG_RODATA. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20091014220254.063193621@sbs-t61.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-10-20 14:46:00 +09:00
David Rientjes	8ee2debce3	x86: Export k8 physical topology To eventually interleave emulated nodes over physical nodes, we need to know the physical topology of the machine without actually registering it. This does the k8 node setup in two parts: detection and registration. NUMA emulation can then used the physical topology detected to setup the address ranges of emulated nodes accordingly. If emulation isn't used, the k8 nodes are registered as normal. Two formals are added to the x86 NUMA setup functions: `acpi' and `k8'. These represent whether ACPI or K8 NUMA has been detected; both cannot be true at the same time. This specifies to the NUMA emulation code whether an underlying physical NUMA topology exists and which interface to use. This patch deals solely with separating the k8 setup path into Northbridge detection and registration steps and leaves the ACPI changes for a subsequent patch. The `acpi' formal is added here, however, to avoid touching all the header files again in the next patch. This approach also ensures emulated nodes will not span physical nodes so the true memory latency is not misrepresented. k8_get_nodes() may now be used to export the k8 physical topology of the machine for NUMA emulation. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Ankita Garg <ankita@in.ibm.com> Cc: Len Brown <len.brown@intel.com> LKML-Reference: <alpine.DEB.1.00.0909251518400.14754@chino.kir.corp.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-12 22:56:45 +02:00
KAMEZAWA Hiroyuki	81ac3ad906	kcore: register module area in generic way Some archs define MODULED_VADDR/MODULES_END which is not in VMALLOC area. This is handled only in x86-64. This patch make it more generic. And we can use vread/vwrite to access the area. Fix it. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Jiri Slaby <jirislaby@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:42 -07:00
KAMEZAWA Hiroyuki	3089aa1b0c	kcore: use registerd physmem information For /proc/kcore, each arch registers its memory range by kclist_add(). In usual, - range of physical memory - range of vmalloc area - text, etc... are registered but "range of physical memory" has some troubles. It doesn't updated at memory hotplug and it tend to include unnecessary memory holes. Now, /proc/iomem (kernel/resource.c) includes required physical memory range information and it's properly updated at memory hotplug. Then, it's good to avoid using its own code(duplicating information) and to rebuild kclist for physical memory based on /proc/iomem. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:41 -07:00
KAMEZAWA Hiroyuki	9492587cf3	kcore: register text area in generic way Some 64bit arch has special segment for mapping kernel text. It should be entried to /proc/kcore in addtion to direct-linear-map, vmalloc area. This patch unifies KCORE_TEXT entry scattered under x86 and ia64. I'm not familiar with other archs (mips has its own even after this patch) but range of [_stext ..._end) is a valid area of text and it's not in direct-map area, defining CONFIG_ARCH_PROC_KCORE_TEXT is only a necessary thing to do. Note: I left mips as it is now. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:41 -07:00
KAMEZAWA Hiroyuki	a0614da88b	kcore: register vmalloc area in generic way For /proc/kcore, vmalloc areas are registered per arch. But, all of them registers same range of [VMALLOC_START...VMALLOC_END) This patch unifies them. By this. archs which have no kclist_add() hooks can see vmalloc area correctly. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:41 -07:00
KAMEZAWA Hiroyuki	c30bb2a25f	kcore: add kclist types Presently, kclist_add() only eats start address and size as its arguments. Considering to make kclist dynamically reconfigulable, it's necessary to know which kclists are for System RAM and which are not. This patch add kclist types as KCORE_RAM KCORE_VMALLOC KCORE_TEXT KCORE_OTHER This "type" is used in a patch following this for detecting KCORE_RAM. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:41 -07:00
Geert Uytterhoeven	cc013a8890	arches: drop superfluous casts in nr_free_pages() callers Commit `9617729941` ("Drop free_pages()") modified nr_free_pages() to return 'unsigned long' instead of 'unsigned int'. This made the casts to 'unsigned long' in most callers superfluous, so remove them. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Mikael Starvik <starvik@axis.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Chris Zankel <zankel@tensilica.com> Cc: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:34 -07:00
Amerigo Wang	a6a06f7b57	x86: Fix an incorrect argument of reserve_bootmem() This line looks suspicious, because if this is true, then the 'flags' parameter of function reserve_bootmem_generic() will be unused when !CONFIG_NUMA. I don't think this is what we want. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: akpm@linux-foundation.org LKML-Reference: <20090821083709.5098.52505.sendpatchset@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-24 20:22:55 +02:00
Yinghai Lu	44b5728095	x86: don't clear nodes_states[N_NORMAL_MEMORY] when numa is not compiled in Alex found that specjbb2005 still can not run with hugepages on an x86-64 machine. This only happens when numa is not compiled in. The root cause: node_set_state will not set it back for us in that case, so don't clear that when numa is not select in config [ v2: use node_clear_state instead ] Reported-and-Tested-by: Alex Shi <alex.shi@intel.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-08 10:32:50 -07:00
Yinghai Lu	66918dcdf9	x86: only clear node_states for 64bit Nathan reported that \| commit `73d60b7f74` \| Author: Yinghai Lu <yinghai@kernel.org> \| Date: Tue Jun 16 15:33:00 2009 -0700 \| \| page-allocator: clear N_HIGH_MEMORY map before we set it again \| \| SRAT tables may contains nodes of very small size. The arch code may \| decide to not activate such a node. However, currently the early boot \| code sets N_HIGH_MEMORY for such nodes. These nodes therefore seem to be \| active although these nodes have no present pages. \| \| For 64bit N_HIGH_MEMORY == N_NORMAL_MEMORY, so that works for 64 bit too unintentionally and incorrectly clears the cpuset.mems cgroup attribute on an i386 kvm guest, meaning that cpuset.mems can not be used. Fix this by only clearing node_states[N_NORMAL_MEMORY] for 64bit only. and need to do save/restore for that in find_zone_movable_pfn Reported-by: Nathan Lynch <ntl@pobox.com> Tested-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu>, Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-30 18:56:01 -07:00
Linus Torvalds	c4c5ab3089	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (45 commits) x86, mce: fix error path in mce_create_device() x86: use zalloc_cpumask_var for mce_dev_initialized x86: fix duplicated sysfs attribute x86: de-assembler-ize asm/desc.h i386: fix/simplify espfix stack switching, move it into assembly i386: fix return to 16-bit stack from NMI handler x86, ioapic: Don't call disconnect_bsp_APIC if no APIC present x86: Remove duplicated #include's x86: msr.h linux/types.h is only required for __KERNEL__ x86: nmi: Add Intel processor 0x6f4 to NMI perfctr1 workaround x86, mce: mce_intel.c needs <asm/apic.h> x86: apic/io_apic.c: dmar_msi_type should be static x86, io_apic.c: Work around compiler warning x86: mce: Don't touch THERMAL_APIC_VECTOR if no active APIC present x86: mce: Handle banks == 0 case in K7 quirk x86, boot: use .code16gcc instead of .code16 x86: correct the conversion of EFI memory types x86: cap iomem_resource to addressable physical memory x86, mce: rename _64.c files which are no longer 64-bit-specific x86, mce: mce.h cleanup ... Manually fix up trivial conflict in arch/x86/mm/fault.c	2009-06-20 10:49:48 -07:00
Vegard Nossum	9e730237c2	kmemcheck: don't track page tables As these are allocated using the page allocator, we need to pass __GFP_NOTRACK before we add page allocator support to kmemcheck. Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-15 12:40:11 +02:00
Shaohua Li	41d840e224	x86: change kernel_physical_mapping_init() __init to __meminit kernel_physical_mapping_init() could be called in memory hotplug path. [ Impact: fix potential crash with memory hotplug ] Signed-off-by: Shaohua Li <shaohua.li@intel.com> LKML-Reference: <20090612045752.GA827@sli10-desk.sh.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-12 14:39:21 +02:00
Pekka Enberg	087fa4e964	x86: use sparse_memory_present_with_active_regions() on UMA There's no need to use call memory_present() manually on UMA because initmem_init() sets up early_node_map by calling e820_register_active_regions(). [ Impact: cleanup ] Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1241699742.17846.31.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-11 11:52:06 +02:00
Pekka Enberg	3551f88f64	x86: unify 64-bit UMA and NUMA paging_init() 64-bit UMA and NUMA versions of paging_init() are almost identical. Therefore, merge the copy in mm/numa_64.c to mm/init_64.c to remove duplicate code. [ Impact: cleanup ] Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1241699741.17846.30.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-11 11:52:06 +02:00
Pekka Enberg	9518e0e435	x86: move per-cpu mmu_gathers to mm/init.c [ Impact: cleanup ] Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1240923650.1982.22.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-30 10:12:37 +02:00
Pekka Enberg	2b72394e40	x86: move max_pfn_mapped and max_low_pfn_mapped to setup.c This patch moves the max_pfn_mapped and max_low_pfn_mapped global variables to kernel/setup.c where they're initialized. [ Impact: cleanup ] Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1240923649.1982.21.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-30 10:12:36 +02:00
Pekka Enberg	89388913f2	x86: unify noexec handling This patch unifies noexec handling on 32-bit and 64-bit. [ Impact: cleanup ] Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> [ mingo@elte.hu: build fix ] LKML-Reference: <1240303167.771.69.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-21 10:48:08 +02:00
Ingo Molnar	8293dd6f86	Merge branch 'x86/core' into tracing/ftrace Semantic merge: kernel/trace/trace_functions_graph.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-10 10:17:48 +01:00
Ingo Molnar	f0ef039851	Merge branch 'x86/core' into tracing/textedit Conflicts: arch/x86/Kconfig block/blktrace.c kernel/irq/handle.c Semantic conflict: kernel/trace/blktrace.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-06 16:45:01 +01:00
Pekka Enberg	5dd61dfabc	x86: rename do_not_nx to disable_nx in mm/init_64.c As a preparational step for unifying noexec handling on 32-bit and 64-bit, rename the do_not_nx variable to disable_nx on 64-bit. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1236265497.31324.11.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-06 15:25:52 +01:00
Ingo Molnar	28e93a005b	Merge branch 'x86/mm' into x86/core	2009-03-05 21:49:35 +01:00
Pekka Enberg	4fcb208391	x86: move function and variable declarations to asm/init.h Impact: cleanup Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-17-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:18 +01:00
Pekka Enberg	e53fb04fce	x86: unify kernel_physical_mapping_init() function signatures Impact: cleanup In preparation for moving the function declaration to a header file, unify 32-bit and 64-bit signatures. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-16-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:18 +01:00
Pekka Enberg	298af9d89f	x86: fix up some bad global variable names in mm/init.c Impact: cleanup The table_start, table_end, and table_top are too generic for global namespace so rename them to be more specific. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-15-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:17 +01:00
Pekka Enberg	f765090a26	x86: move init_memory_mapping() to common mm/init.c Impact: cleanup This patch moves the init_memory_mapping() function to common mm/init.c. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-14-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:17 +01:00
Pekka Enberg	b47e3418c5	x86: ifdef 32-bit and 64-bit NR_RANGE_MR for save_mr() unification Impact: cleanup As a trivial preparation for moving common code to arc/x86/mm/init.c, ifdef the 32-bit and 64-bit versions of NR_RANGE_MR. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-12-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:16 +01:00
Pekka Enberg	c338d6f60f	x86: ifdef 32-bit and 64-bit pfn setup in init_memory_mapping() Impact: cleanup To reduce the diff between the 32-bit and 64-bit versions of init_memory_mapping(), ifdef configuration specific pfn setup code in the function. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-11-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:15 +01:00
Pekka Enberg	01ced9ec14	x86: ifdef 32-bit and 64-bit setup in init_memory_mapping() Impact: cleanup To reduce the diff between the 32-bit and 64-bit versions of init_memory_mapping(), ifdef configuration specific setup code in the function. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-10-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:15 +01:00
Pekka Enberg	cbba65796d	x86: unify kernel_physical_mapping_init() call in init_memory_mapping() Impact: cleanup The 64-bit version of init_memory_mapping() uses the last mapped address returned from kernel_physical_mapping_init() whereas the 32-bit version doesn't. This patch adds relevant ifdefs to both versions of the function to reduce the diff between them. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-8-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:14 +01:00
Pekka Enberg	54e63f3a42	x86: ifdef 32-bit specific setup in init_memory_mapping() Impact: cleanup Enabling NX, PSE, and PGE are only required on 32-bit so ifdef them in both versions of the function. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-5-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:12 +01:00
Pekka Enberg	49a2bf7303	x86: find_early_table_space() unification Impact: cleanup There are some minor differences between the 32-bit and 64-bit find_early_table_space() functions. This patch wraps those differences under CONFIG_X86_32 to make the function identical on both configurations. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-3-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:11 +01:00
Pekka Enberg	c3f5d2d8b5	x86: init_memory_mapping() trivial cleanups Impact: cleanup To reduce the diff between the 32-bit and 64-bit versions of init_memory_mapping(), fix up all trivial issues. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-1-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:10 +01:00
Pekka Enberg	731ddea636	x86: move free_initrd_mem() to common mm/init.c Impact: cleanup The function is identical on 32-bit and 64-bit configurations so move it to the common mm/init.c file. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1236158020.29024.28.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-04 20:59:26 +01:00
Pekka Enberg	540aca06b7	x86: move devmem_is_allowed() to common mm/init.c Impact: cleanup The function is identical on 32-bit and 64-bit configurations so move it to the common mm/init.c file. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1236160001.29024.29.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-04 11:40:04 +01:00
Ingo Molnar	a1be621dfa	Merge branch 'tracing/ftrace'; commit 'v2.6.29-rc7' into tracing/core	2009-03-04 11:14:47 +01:00
Jeremy Fitzhardinge	f254f3909e	x86: un-__init fill_pud/pmd/pte They are used by __set_fixmap->set_pte_vaddr_pud, which can be used by arch_setup_additional_pages(), and so is used after init. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-04 02:29:36 +01:00
Ingo Molnar	91d75e209b	Merge branch 'x86/core' into core/percpu	2009-03-04 02:29:19 +01:00
Ingo Molnar	8b0e5860cb	Merge branches 'x86/apic', 'x86/cpu', 'x86/fixmap', 'x86/mm', 'x86/sched', 'x86/setup-lzma', 'x86/signal' and 'x86/urgent' into x86/core	2009-03-04 02:22:31 +01:00
Pekka Enberg	e5b2bb5527	x86: unify free_init_pages() and free_initmem() Impact: unification This patch introduces a common arch/x86/mm/init.c and moves the identical free_init_pages() and free_initmem() functions to the file. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1236078906.2675.18.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-03 12:21:18 +01:00
Pekka Enberg	e087edd8c0	x86: make sure initmem is writable on 64-bit Impact: unification This patch ports commit `3c1df68b84` ("x86: make sure initmem is writable") to the 64-bit version to unify implementations of free_init_pages(). Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Arjan van de Ven <arjan@linux.intel.com> LKML-Reference: <1236078904.2675.17.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-03 12:21:18 +01:00
Yinghai Lu	0fc59d3a01	x86: fix init_memory_mapping() to handle small ranges Impact: fix failed EFI bootup in certain circumstances Ying Huang found init_memory_mapping() has problem with small ranges less than 2M when he tried to direct map the EFI runtime code out of max_low_pfn_mapped. It turns out we never considered that case and didn't check the range... Reported-by: Ying Huang <ying.huang@intel.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Brian Maly <bmaly@redhat.com> LKML-Reference: <49ACDDED.1060508@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-03 08:50:22 +01:00
Tejun Heo	458a3e644c	x86: update populate_extra_pte() and add populate_extra_pmd() Impact: minor change to populate_extra_pte() and addition of pmd flavor Update populate_extra_pte() to return pointer to the pte_t for the specified address and add populate_extra_pmd() which only populates till the pmd and returns pointer to the pmd entry for the address. For 64bit, pud/pmd/pte fill functions are separated out from set_pte_vaddr[_pud]() and used for set_pte_vaddr[_pud]() and populate_extra_{pte\|pmd}(). Signed-off-by: Tejun Heo <tj@kernel.org>	2009-02-24 11:57:21 +09:00
Steven Rostedt	1623963097	ftrace, x86: make kernel text writable only for conversions Impact: keep kernel text read only Because dynamic ftrace converts the calls to mcount into and out of nops at run time, we needed to always keep the kernel text writable. But this defeats the point of CONFIG_DEBUG_RODATA. This patch converts the kernel code to writable before ftrace modifies the text, and converts it back to read only afterward. The kernel text is converted to read/write, stop_machine is called to modify the code, then the kernel text is converted back to read only. The original version used SYSTEM_STATE to determine when it was OK or not to change the code to rw or ro. Andrew Morton pointed out that using SYSTEM_STATE is a bad idea since there is no guarantee to what its state will actually be. Instead, I moved the check into the set_kernel_text_* functions themselves, and use a local variable to determine when it is OK to change the kernel text RW permissions. [ Update: Ingo Molnar suggested moving the prototypes to cacheflush.h ] Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-20 14:30:06 -05:00
Tejun Heo	11124411aa	x86: convert to the new dynamic percpu allocator Impact: use new dynamic allocator, unified access to static/dynamic percpu memory Convert to the new dynamic percpu allocator. * implement populate_extra_pte() for both 32 and 64 * update setup_per_cpu_areas() to use pcpu_setup_static() * define __addr_to_pcpu_ptr() and __pcpu_ptr_to_addr() * define config HAVE_DYNAMIC_PER_CPU_AREA Signed-off-by: Tejun Heo <tj@kernel.org>	2009-02-20 16:29:09 +09:00
Gary Hade	f5495506c3	x86: remove kernel_physical_mapping_init() from init section Impact: fix crash with memory hotplug enabled kernel_physical_mapping_init() is called during memory hotplug so it does not belong in the init section. If the kernel is built with CONFIG_DEBUG_SECTION_MISMATCH=y on the make command line, arch/x86/mm/init_64.c is compiled with the -fno-inline-functions-called-once gcc option defeating inlining of kernel_physical_mapping_init() within init_memory_mapping(). When kernel_physical_mapping_init() is not inlined it is placed in the .init.text section according to the __init in it's current declaration. A later call to kernel_physical_mapping_init() during a memory hotplug operation encounters an int3 trap because the .init.text section memory has been freed. This patch eliminates the crash caused by the int3 trap by moving the non-inlined kernel_physical_mapping_init() from .init.text to .meminit.text. Signed-off-by: Gary Hade <garyhade@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-20 00:31:43 +01:00
Arjan van de Ven	e8de1481fd	resource: allow MMIO exclusivity for device drivers Device drivers that use pci_request_regions() (and similar APIs) have a reasonable expectation that they are the only ones accessing their device. As part of the e1000e hunt, we were afraid that some userland (X or some bootsplash stuff) was mapping the MMIO region that the driver thought it had exclusively via /dev/mem or via various sysfs resource mappings. This patch adds the option for device drivers to cause their reserved regions to the "banned from /dev/mem use" list, so now both kernel memory and device-exclusive MMIO regions are banned. NOTE: This is only active when CONFIG_STRICT_DEVMEM is set. In addition to the config option, a kernel parameter iomem=relaxed is provided for the cases where developers want to diagnose, in the field, drivers issues from userspace. Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-01-07 11:12:32 -08:00
Gary Hade	c04fc586c1	mm: show node to memory section relationship with symlinks in sysfs Show node to memory section relationship with symlinks in sysfs Add /sys/devices/system/node/nodeX/memoryY symlinks for all the memory sections located on nodeX. For example: /sys/devices/system/node/node1/memory135 -> ../../memory/memory135 indicates that memory section 135 resides on node1. Also revises documentation to cover this change as well as updating Documentation/ABI/testing/sysfs-devices-memory to include descriptions of memory hotremove files 'phys_device', 'phys_index', and 'state' that were previously not described there. In addition to it always being a good policy to provide users with the maximum possible amount of physical location information for resources that can be hot-added and/or hot-removed, the following are some (but likely not all) of the user benefits provided by this change. Immediate: - Provides information needed to determine the specific node on which a defective DIMM is located. This will reduce system downtime when the node or defective DIMM is swapped out. - Prevents unintended onlining of a memory section that was previously offlined due to a defective DIMM. This could happen during node hot-add when the user or node hot-add assist script onlines _all_ offlined sections due to user or script inability to identify the specific memory sections located on the hot-added node. The consequences of reintroducing the defective memory could be ugly. - Provides information needed to vary the amount and distribution of memory on specific nodes for testing or debugging purposes. Future: - Will provide information needed to identify the memory sections that need to be offlined prior to physical removal of a specific node. Symlink creation during boot was tested on 2-node x86_64, 2-node ppc64, and 2-node ia64 systems. Symlink creation during physical memory hot-add tested on a 2-node x86_64 system. Signed-off-by: Gary Hade <garyhade@us.ibm.com> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-06 15:59:00 -08:00
Ingo Molnar	90accd6fab	Merge branch 'linus' into x86/memory-corruption-check	2008-11-20 09:03:38 +01:00
Gary Hade	fe8b868ecc	x86: remove debug code from arch_add_memory() Impact: remove incorrect WARN_ON(1) Gets rid of dmesg spam created during physical memory hot-add which will very likely confuse users. The change removes what appears to be debugging code which I assume was unintentionally included in: x86: arch/x86/mm/init_64.c printk fixes commit `10f22dde55` Signed-off-by: Gary Hade <garyhade@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-29 09:29:22 +01:00
Yinghai Lu	f96f57d91c	x86: fix init_memory_mapping for [dc000000 - e0000000) - v2 Impact: change over-mapping to precise mapping, fix /proc/meminfo output v2: fix less than 1G ram system handling when gart aperture is 0xdc000000 - 0xe0000000 it return 0xc0000000 - 0xe0000000 that is not right. this patch fix that will get exact mapping on 256g sytem with that aperture after patch LBSuse:~ # cat /proc/meminfo MemTotal: 264742432 kB MemFree: 263920628 kB Buffers: 1416 kB Cached: 24468 kB ... DirectMap4k: 5760 kB DirectMap2M: 3205120 kB DirectMap1G: 265289728 kB it is consistent to LBSuse:~ # cat /sys/kernel/debug/kernel_page_tables .. ---[ Low Kernel Mapping ]--- 0xffff880000000000-0xffff880000200000 2M RW GLB x pte 0xffff880000200000-0xffff880040000000 1022M RW PSE GLB x pmd 0xffff880040000000-0xffff8800c0000000 2G RW PSE GLB NX pud 0xffff8800c0000000-0xffff8800d7e00000 382M RW PSE GLB NX pmd 0xffff8800d7e00000-0xffff8800d7fa0000 1664K RW GLB NX pte 0xffff8800d7fa0000-0xffff8800d8000000 384K pte 0xffff8800d8000000-0xffff8800dc000000 64M pmd 0xffff8800dc000000-0xffff8800e0000000 64M RW PSE GLB NX pmd 0xffff8800e0000000-0xffff880100000000 512M pmd 0xffff880100000000-0xffff880800000000 28G RW PSE GLB NX pud 0xffff880800000000-0xffff880824600000 582M RW PSE GLB NX pmd 0xffff880824600000-0xffff8808247f0000 1984K RW GLB NX pte 0xffff8808247f0000-0xffff880824800000 64K RW PCD GLB NX pte 0xffff880824800000-0xffff880840000000 440M RW PSE GLB NX pmd 0xffff880840000000-0xffff884000000000 223G RW PSE GLB NX pud 0xffff884000000000-0xffff884028000000 640M RW PSE GLB NX pmd 0xffff884028000000-0xffff884040000000 384M pmd 0xffff884040000000-0xffff888000000000 255G pud 0xffff888000000000-0xffffc20000000000 58880G pgd Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-28 20:54:47 +01:00
Yinghai Lu	11a6b0c933	x86: 64 bit print out absent pages num too so users are not confused with memhole causing big total ram we don't need to worry about 32 bit, because memhole is always above max_low_pfn. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-28 16:50:49 +01:00
Shaohua Li	60817c9b31	x86, memory hotplug: remove wrong -1 in calling init_memory_mapping() Impact: fix crash with memory hotplug Shuahua Li found: \| I just did some experiments on a desktop for memory hotplug and this bug \| triggered a crash in my test. \| \| Yinghai's suggestion also fixed the bug. We don't need to round it, just remove that extra -1 Signed-off-by: Yinghai <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-28 09:33:17 +01:00
Yinghai Lu	3afa39493d	x86: keep the /proc/meminfo page count correct Impact: get correct page count in /proc/meminfo found page count in /proc/meminfo is nor correct on 1G system in VirtualBox 2.0.4 # cat /proc/meminfo MemTotal: 1017508 kB MemFree: 822700 kB Buffers: 1456 kB Cached: 26632 kB SwapCached: 0 kB ... Hugepagesize: 2048 kB DirectMap4k: 4032 kB DirectMap2M: 18446744073709549568 kB with this patch get: ... DirectMap4k: 4032 kB DirectMap2M: 1044480 kB which is consistent to kernel_page_tables ---[ Low Kernel Mapping ]--- 0xffff880000000000-0xffff880000001000 4K RW PCD GLB x pte 0xffff880000001000-0xffff88000009f000 632K RW GLB x pte 0xffff88000009f000-0xffff8800000a0000 4K RW PCD GLB x pte 0xffff8800000a0000-0xffff880000200000 1408K RW GLB x pte 0xffff880000200000-0xffff88003fe00000 1020M RW PSE GLB x pmd 0xffff88003fe00000-0xffff88003fff0000 1984K RW GLB NX pte 0xffff88003fff0000-0xffff880040000000 64K pte 0xffff880040000000-0xffff888000000000 511G pud 0xffff888000000000-0xffffc20000000000 58880G pgd Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-27 18:55:26 +01:00
Arjan van de Ven	304e629bf4	x86: corruption check: run the corruption checks from a work queue Impact: change the implementation of the debug feature the periodic corruption checks are better off run from a work queue; there's nothing time critical about them and this way the amount of interrupt-context work is reduced. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-27 18:09:45 +01:00
Jan Beulich	5e72d9e485	x86-64: fix combining of regions in init_memory_mapping() When nr_range gets decremented, the same slot must be considered for coalescing with its new successor again. The issue is apparently pretty benign to native code, but surfaces as a boot time crash in our forward ported Xen tree (where the page table setup overall works differently than in native). Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-13 10:21:16 +02:00
Jeremy Fitzhardinge	a32ad46267	x86-64: don't check for map replacement The check prevents flags on mappings from being changed, which is not desireable. There's no need to check for replacing a mapping, and x86-32 does not do this check. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-13 10:21:05 +02:00
Jeremy Fitzhardinge	1494177942	x86: add early_memremap() early_ioremap() is also used to map normal memory when constructing the linear memory mapping. However, since we sometimes need to be able to distinguish between actual IO mappings and normal memory mappings, add a early_memremap() call, which maps with PAGE_KERNEL (as opposed to PAGE_KERNEL_IO for early_ioremap()), and use it when constructing pagetables. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-13 10:21:01 +02:00
Jeremy Fitzhardinge	be43d72835	x86: add _PAGE_IOMAP pte flag for IO mappings Use one of the software-defined PTE bits to indicate that a mapping is intended for an IO address. On native hardware this is irrelevent, since a physical address is a physical address. But in a virtual environment, physical addresses are also virtualized, so there needs to be some way to distinguish between pseudo-physical addresses and actual hardware addresses; _PAGE_IOMAP indicates this intent. By default, __supported_pte_mask masks out _PAGE_IOMAP, so it doesn't even appear in the final pagetable. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-13 10:20:56 +02:00
Ingo Molnar	46eaa67020	x86: memory corruption check - cleanup Move the prototypes from the generic kernel.h header to the more appropriate include/asm-x86/bios_ebda.h header file. Also, remove the check from the power management code - this is a pure x86 matter for now. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-12 15:09:23 +02:00
Ingo Molnar	a9b9e81c91	Merge branch 'linus' into x86/memory-corruption-check	2008-10-12 15:05:39 +02:00
Ingo Molnar	0afe2db213	Merge branch 'x86/unify-cpu-detect' into x86-v28-for-linus-phase4-D Conflicts: arch/x86/kernel/cpu/common.c arch/x86/kernel/signal_64.c include/asm-x86/cpufeature.h	2008-10-11 20:23:20 +02:00
Ingo Molnar	3dd392a407	Merge branch 'linus' into x86/pat2 Conflicts: arch/x86/mm/init_64.c	2008-10-10 19:30:08 +02:00
Suresh Siddha	b27a43c1e9	x86, cpa: make the kernel physical mapping initialization a two pass sequence, fix Jeremy Fitzhardinge wrote: > I'd noticed that current tip/master hasn't been booting under Xen, and I > just got around to bisecting it down to this change. > > commit 065ae73c5462d42e9761afb76f2b52965ff45bd6 > Author: Suresh Siddha <suresh.b.siddha@intel.com> > > x86, cpa: make the kernel physical mapping initialization a two pass sequence > > This patch is causing Xen to fail various pagetable updates because it > ends up remapping pagetables to RW, which Xen explicitly prohibits (as > that would allow guests to make arbitrary changes to pagetables, rather > than have them mediated by the hypervisor). Instead of making init a two pass sequence, to satisfy the Intel's TLB Application note (developer.intel.com/design/processor/applnots/317080.pdf Section 6 page 26), we preserve the original page permissions when fragmenting the large mappings and don't touch the existing memory mapping (which satisfies Xen's requirements). Only open issue is: on a native linux kernel, we will go back to mapping the first 0-1GB kernel identity mapping as executable (because of the static mapping setup in head_64.S). We can fix this in a different patch if needed. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-10 19:29:21 +02:00
Suresh Siddha	28dd033f43	x86: fix pagetable init 64-bit breakage Fix _end alignment check - can trigger a crash if _end happens to be on a page boundary. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-10 19:29:20 +02:00
Suresh Siddha	8311eb84bf	x86, cpa: remove cpa pool code Interrupt context no longer splits large page in cpa(). So we can do away with cpa memory pool code. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: arjan@linux.intel.com Cc: venkatesh.pallipadi@intel.com Cc: jeremy@goop.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-10 19:29:16 +02:00
Suresh Siddha	0b8fdcbcd2	x86, cpa: dont use large pages for kernel identity mapping with DEBUG_PAGEALLOC Don't use large pages for kernel identity mapping with DEBUG_PAGEALLOC. This will remove the need to split the large page for the allocated kernel page in the interrupt context. This will simplify cpa code(as we don't do the split any more from the interrupt context). cpa code simplication in the subsequent patches. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: arjan@linux.intel.com Cc: venkatesh.pallipadi@intel.com Cc: jeremy@goop.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-10 19:29:14 +02:00
Suresh Siddha	a2699e477b	x86, cpa: make the kernel physical mapping initialization a two pass sequence In the first pass, kernel physical mapping will be setup using large or small pages but uses the same PTE attributes as that of the early PTE attributes setup by early boot code in head_[32\|64].S After flushing TLB's, we go through the second pass, which setups the direct mapped PTE's with the appropriate attributes (like NX, GLOBAL etc) which are runtime detectable. This two pass mechanism conforms to the TLB app note which says: "Software should not write to a paging-structure entry in a way that would change, for any linear address, both the page size and either the page frame or attributes." Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: arjan@linux.intel.com Cc: venkatesh.pallipadi@intel.com Cc: jeremy@goop.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-10 19:29:13 +02:00
Hugh Dickins	bb577f980e	x86: add periodic corruption check Perodically check for corruption in low phusical memory. Don't bother checking at fault time, since it won't show anything useful. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-07 17:40:00 +02:00
Ingo Molnar	deed05b7c0	x86, init_64.c: cleanup Clean up comments. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-05 10:23:47 +02:00
Yinghai Lu	bd220a24a9	x86: move nonx_setup etc from common.c to init_64.c like 32 bit put it in init_32.c Signed-off-by: Yinghai <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-05 10:23:47 +02:00
Ingo Molnar	ea1c9de45e	Merge branch 'x86/urgent' into x86/cleanups	2008-08-25 11:10:42 +02:00
Jan Beulich	9482ac6e34	x86: fix two modpost warnings in mm/init_64.c early_io{re,un}map() are __init and hence can't be called from __meminit functions. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-22 07:51:54 +02:00
Jan Beulich	8ae3a5a8df	x86: fix 1:1 mapping init on 64-bit (memory hotplug case) While I don't have a hotplug capable system at hand, I think two issues need fixing: - pud_phys (in kernel_physical_ampping_init()) would remain uninitialized in the after_bootmem case - the locking done just around phys_pmd_{init,update}() would leave out pgd updates, and it was needlessly covering code portions that do allocations (perhaps using a more friendly gfp value in alloc_low_page() would then be possible) Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-22 07:51:53 +02:00
Ingo Molnar	7393423dd9	Merge branch 'linus' into x86/cleanups	2008-08-20 11:52:15 +02:00
Marcin Slusarz	8d6ea9674c	x86: fix section mismatch warning - spp_getpage() WARNING: vmlinux.o(.text+0x17a3e): Section mismatch in reference from the function set_pte_vaddr_pud() to the function .init.text:spp_getpage() The function set_pte_vaddr_pud() references the function __init spp_getpage(). This is often because set_pte_vaddr_pud lacks a __init annotation or the annotation of spp_getpage is wrong. spp_getpage is called from __init (__init_extra_mapping) and non __init (set_pte_vaddr_pud) functions, so it can't be __init. Unfortunately it calls alloc_bootmem_pages which is __init, but does it only when bootmem allocator is available (after_bootmem == 0). So annotate it accordingly. Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@zytor.com>	2008-08-15 19:16:06 +02:00
Hugh Dickins	a06de63000	x86: fix /proc/meminfo DirectMap Do we actually want these DirectMap lines in the x86 /proc/meminfo? I can see they're interesting to CPA developers and TLB optimizers, but they don't fit its usual "where has all my memory gone?" usage. If they are to stay, here are some fixes. 1. On x86_32 without PAE, they're not 2M but 4M pages: no need to mess with the internal enum, but show the right name to users. 2. Many machines can never show anything but 0 for DirectMap1G, so suppress that line unless direct_gbpages are really enabled. 3. The unit in /proc/meminfo is kB not number of pages: HugePages messed that up, but they're an example to regret not to follow. 4. Once we use kB, it's easy to see that 1GB has gone missing (which explains why CONFIG_CPA_DEBUG=y soon wraps DirectMap2M negative): because head_64.S's level2_ident_pgt entries were not counted. My fix is not ideal, but works for more and for less than 1G, and avoids interfering with early bootup pagetable contortions. Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-15 15:27:55 +02:00
Ingo Molnar	6de9c70882	Merge branch 'linus' into x86/cleanups	2008-08-11 12:57:01 +02:00
Johannes Weiner	8dad322f54	x86: use generic show_mem() Remove arch-specific show_mem() in favor of the generic version. This also removes the following redundant information display: - pages in swapcache, printed by show_swap_cache_info() - dirty pages, writeback pages, mapped pages, slab pages, pagetable pages, printed by show_free_areas() where show_mem() calls show_free_areas(), which calls show_swap_cache_info(). Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:10 -07:00
Joerg Roedel	d86bb0dac7	x86: convert init_64.c from round_up to roundup Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-26 15:39:21 +02:00
Yinghai Lu	1f067167a8	x86: seperate memtest from init_64.c it's separate functionality that deserves its own file. This also prepares 32-bit memtest support. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 14:10:27 +02:00
Jack Steiner	e22146e610	x86: fix kernel_physical_mapping_init() for large x86 systems Fix bug in kernel_physical_mapping_init() that causes kernel page table to be built incorrectly for systems with greater than 512GB of memory. Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 18:27:36 +02:00
Ingo Molnar	5806b81ac1	Merge branch 'auto-ftrace-next' into tracing/for-linus Conflicts: arch/x86/kernel/entry_32.S arch/x86/kernel/process_32.c arch/x86/kernel/process_64.c arch/x86/lib/Makefile include/asm-x86/irqflags.h kernel/Makefile kernel/sched.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-14 16:11:52 +02:00
Yinghai Lu	9958e810f8	x86: max_low_pfn_mapped fix, #3 optimization: try to merge the range with same page size in init_memory_mapping, to get the best possible linear mappings set up. thus when GBpages is not there, we could do 2M pages. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-13 08:19:16 +02:00
Yinghai Lu	f361a450bf	x86: introduce max_low_pfn_mapped for 64-bit when more than 4g memory is installed, don't map the big hole below 4g. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-11 10:24:04 +02:00
Ingo Molnar	bac0c9103b	Merge branch 'tracing/ftrace' into auto-ftrace-next	2008-07-10 11:43:00 +02:00
Yinghai Lu	7b16eb8930	x86: overmapped fix when 4K pages on tail, 64-bit fix phys_pmd_init to make sure not to return bigger value than end. also print out range split:1G/2M/4K in init_memory_mapping(). Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-10 08:46:40 +02:00
Yinghai Lu	c2e6d65bce	x86: not overmap more than the end of RAM in init_memory_mapping - 64bit handle head and tail that are not aligned to big pages (2MB/1GB boundary). with this patch, on system that support gbpages, change: last_map_addr: 1080000000 end: 1078000000 to: last_map_addr: 1078000000 end: 1078000000 Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-09 10:43:26 +02:00
Yinghai Lu	49c980df55	x86: fix vmemmap printout check Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Cc: "Nick Piggin" <npiggin@suse.de> Cc: "Mark McLoughlin" <markmc@redhat.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: "Eduardo Habkost" <ehabkost@redhat.com> Cc: "Vegard Nossum" <vegard.nossum@gmail.com> Cc: "Stephen Tweedie" <sct@redhat.com> Cc: "Jeremy Fitzhardinge" <jeremy@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-09 10:43:24 +02:00
Yinghai Lu	b50efd2a55	x86: introduce page_size_mask for 64bit prepare for overmapped patch also printout last_map_addr together with end Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-09 09:37:45 +02:00
Jack Steiner	3a9e189d69	x86: map UV chipset space - pagetable Add boot-time function for creating additional 2MB page table entries for mapping chipset specific cached/uncached ranges. Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-09 07:43:23 +02:00
Jeremy Fitzhardinge	574977a2ed	x86_64/setup: unconditionally populate the pgd When allocating a new pud, unconditionally populate the pgd (why did we bother to create a new pud if we weren't going to populate it?). This will only happen if the pgd slot was empty, since any existing pud will be reused. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 13:16:27 +02:00
Jeremy Fitzhardinge	22b45144f6	x86/paravirt: groundwork for 64-bit Xen support, fix #2 Ingo Molnar wrote: > that fixed the build but now we've got a boot crash with this config: > > time.c: Detected 2010.304 MHz processor. > spurious 8259A interrupt: IRQ7. > BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 > IP: [<0000000000000000>] > PGD 0 > Thread overran stack, or stack corrupted > Oops: 0010 [1] SMP > CPU 0 > I don't know if this will fix this bug, but it's definitely a bugfix. It was trashing random pages by overwriting them with pagetables... Don't trash a large pmd's data when mapping physical memory. This is a bugfix for "x86_64: adjust mapping of physical pagetables to work with Xen". Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 13:16:02 +02:00
Eduardo Habkost	0814e0bace	x86, 64-bit: split set_pte_vaddr() We will need to set a pte on l3_user_pgt. Extract set_pte_vaddr_pud() from set_pte_vaddr(), that will accept the l3 page table as parameter. This change should be a no-op for existing code. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 13:11:09 +02:00
Jeremy Fitzhardinge	7c934d3990	x86, 64-bit: create small vmemmap mappings if PSE not available If PSE is not available, then fall back to 4k page mappings for the vmemmap area. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 13:11:08 +02:00
Jeremy Fitzhardinge	4f9c11dd49	x86, 64-bit: adjust mapping of physical pagetables to work with Xen This makes a few of changes to the construction of the initial pagetables to work better with paravirt_ops/Xen. The main areas are: 1. Support non-PSE mapping of memory, since Xen doesn't currently allow 2M pages to be mapped in guests. 2. Make sure that the ioremap alias of all pages are dropped before attaching the new page to the pagetable. This avoids having writable aliases of pagetable pages. 3. Preserve existing pagetable entries, rather than overwriting. Its possible that a fair amount of pagetable has already been constructed, so reuse what's already in place rather than ignoring and overwriting it. The algorithm relies on the invariant that any page which is part of the kernel pagetable is itself mapped in the linear memory area. This way, it can avoid using ioremap on a pagetable page. The invariant holds because it maps memory from low to high addresses, and also allocates memory from low to high. Each allocated page can map at least 2M of address space, so the mapped area will always progress much faster than the allocated area. It relies on the early boot code mapping enough pages to get started. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 13:11:07 +02:00
Yinghai Lu	c987d12f84	x86: remove end_pfn in 64bit and use max_pfn directly. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 13:10:38 +02:00
Yinghai Lu	d86623a0d5	x86: add table_top check for alloc_low_page in 64 bit that range is from find_e820_area, so don't try to use end_pfn to see if out of boundary...use table_top instead to avoid possible strange result while cross the boundary... also change early_printk to printk, because init_memory_mapping is after early param parsing, and console=uart8250 already working at that time. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 13:10:36 +02:00
Yinghai Lu	1a0db38e5f	x86: get max_pfn_mapped in init_memory_mapping so don't shift that in the loop Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 13:10:35 +02:00
Jeremy Fitzhardinge	4583ed514e	x86, 64-bit: unify early_ioremap The 32-bit early_ioremap will work equally well for 64-bit, so just use it. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 13:10:28 +02:00
Jeremy Fitzhardinge	bb23e403e5	x86, 64-bit: use p??_populate() to attach pages to pagetable Use the _populate() functions to attach new pages to a pagetable, to make sure the right paravirt_ops calls get called. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 13:10:27 +02:00
Yinghai Lu	6a07a0edac	x86: fix compile warning in init_64.c len is long and ret is only for NUMA Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 12:50:23 +02:00
Yinghai Lu	346cafecde	x86: clean up min_low_pfn for 32bit we already had early_res support, so don't need to track min_low_pfn. keep it to 0 always. also use init_bootmem_node instead of init_bootmem, so don't touch min_low_pfn. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 12:50:21 +02:00
Yinghai Lu	1f75d7e32e	x86: introduce initmem_init for 64 bit Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 12:50:14 +02:00
Ingo Molnar	6236af82d8	Merge branch 'x86/fixmap' into x86/devel Conflicts: arch/x86/mm/init_64.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 12:24:29 +02:00
Ingo Molnar	3de352bbd8	Merge branch 'x86/mpparse' into x86/devel Conflicts: arch/x86/Kconfig arch/x86/kernel/io_apic_32.c arch/x86/kernel/setup_64.c arch/x86/mm/init_32.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 11:14:58 +02:00
Yinghai Lu	064d25f120	x86: merge setup_memory_map with e820 ... and kill e820_32/64.c and e820_32/64.h Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 10:38:25 +02:00

1 2 3 4 5 ...

324 Commits