linux

Commit Graph

Author	SHA1	Message	Date
Jeff Dike	995473aec0	[PATCH] uml: file renaming Move some foo_kern.c files to foo.c now that the old foo.c files are out of the way. Also cleaned up some whitespace and an emacs formatting comment. Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:16 -07:00
Jeff Dike	3c91735099	[PATCH] uml: thread creation tidying fork on UML has always somewhat subtle. The underlying cause has been the need to initialize a stack for the new process. The only portable way to initialize a new stack is to set it as the alternate signal stack and take a signal. The signal handler does whatever initialization is needed and jumps back to the original stack, where the fork processing is finished. The basic context switching mechanism is a jmp_buf for each process. You switch to a new process by longjmping to its jmp_buf. Now that UML has its own implementation of setjmp and longjmp, and I can poke around inside a jmp_buf without fear that libc will change the structure, a much simpler mechanism is possible. The jmpbuf can simply be initialized by hand. This eliminates - the need to set up and remove the alternate signal stack sending and handling a signal the signal blocking needed around the stack switching, since there is no stack switching setting up the jmp_buf needed to jump back to the original stack after the new one is set up In addition, since jmp_buf is now defined by UML, and not by libc, it can be embedded in the thread struct. This makes it unnecessary to have it exist on the stack, where it used to be. It also simplifies interfaces, since the switch jmp_buf used to be a void * inside the thread struct, and functions which took it as an argument needed to define a jmp_buf variable and assign it from the void *. Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:16 -07:00
Jeff Dike	0915ee38c7	[PATCH] uml: mark some tt-mode code Mark a symbol and file as being tt-mode only. This shrinks the binary slightly when tt mode support is compiled out and makes it easier to identity stuff when tt mode is removed. Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:16 -07:00
Jeff Dike	e3ccf6e369	[PATCH] uml: add checkstack support Make checkstack work for UML. We need to pass the underlying architecture name, rather than "um" to checkstack.pl. Signed-off-by: Jeff Dike <jdike@addtoit.com> Acked-by: Matt Mackall <mpm@selenic.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:16 -07:00
Jeff Dike	53dd2b55c5	[PATCH] uml: use correct SIGBUS handler BB noticed that we had the wrong bus error handler. Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:16 -07:00
Jeff Dike	09b185a316	[PATCH] uml: fix gcov support Make __bb_init_func weak in order to avoid a link failure with some libcs and/or gccs. Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:16 -07:00
Jeff Dike	a8b4fc4d7c	[PATCH] uml: fix missing x86_64 register definitions The UML/x86_64 headers were missing ptrace support for some segment registers. The underlying problem was that the x86_64 kernel uses user_regs_struct rather than the ptrace register definitions in ptrace. This patch switches UML/x86_64 to using user_regs_struct for its definitions of the host's registers. Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:16 -07:00
Jeff Dike	0715501bf1	[PATCH] uml: get rid of ZONE_DMA use ZONE_DMA might become dependent on CONFIG_ZONE_DMA, which UML doesn't define (we're still arguing about this) So, let's change ZONE_DMA to ZONE_NORMAL. This is prompted by optional-zone_dma-in-the-vm.patch, but should be harmless on its own. Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:15 -07:00
Jeff Dike	5e7672ec3f	[PATCH] uml: const more data Make lots of structures const in order to make it obvious that they need no locking. Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:15 -07:00
Paolo 'Blaisorblade' Giarrusso	48af05ed54	[PATCH] uml: fix proc-vs-interrupt context spinlock deadlock This spinlock can be taken on interrupt too, so spin_lock_irq[save] must be used. However, Documentation/networking/netdevices.txt explains we are called with rtnl_lock() held - so we don't need to care about other concurrent opens. Verified also in LDD3 and by direct checking. Also verified that the network layer (through a state machine) guarantees us that nobody will close the interface while it's being used. Please correct me if I'm wrong. Also, we must check we don't sleep with irqs disabled!!! But anyway, this is not news - we already can't sleep while holding a spinlock. Who says this is guaranted really by the present code? Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:15 -07:00
Paolo 'Blaisorblade' Giarrusso	06837504de	[PATCH] uml: use -mcmodel=kernel for x86_64 We have never used this flag and recently one user experienced a complaining warning about this (there was a symbol in the positive half of the address space IIRC). So fix it. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:15 -07:00
Hirokazu Takata	85f651794c	[PATCH] m32r: revise __raw_read_trylock() Signed-off-by: Hirokazu Takata <takata@linux-m32r.org> Cc: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:15 -07:00
Hirokazu Takata	a27f311332	[PATCH] m32r: Fix "value computed not used" warnings Fix to remove annoying gcc-4.1 warnings "value computed not used" for m32r; Modify set_mb to cast to void for SMP. Signed-off-by: Hirokazu Takata <takata@linux-m32r.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:15 -07:00
David Howells	f269fdd182	[PATCH] NOMMU: move the fallback arch_vma_name() to a sensible place Move the fallback arch_vma_name() to a sensible place (kernel/signal.c). Currently it's in fs/proc/task_mmu.c, a file that is dependent on both CONFIG_PROC_FS and CONFIG_MMU being enabled, but it's used from kernel/signal.c from where it is called unconditionally. [akpm@osdl.org: build fix] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:15 -07:00
David Howells	930e652a21	[PATCH] NOMMU: Make futexes work under NOMMU conditions Make futexes work under NOMMU conditions. This can be tested by running this in one shell: #define SYSERROR(X, Y) \ do { if ((long)(X) == -1L) { perror(Y); exit(1); }} while(0) int main() { int shmid, tmp, f, n; shmid = shmget(23, 4, IPC_CREAT\|0666); SYSERROR(shmid, "shmget"); f = shmat(shmid, NULL, 0); SYSERROR(f, "shmat"); n = f; printf("WAIT: %p{%x}\n", f, n); tmp = futex(f, FUTEX_WAIT, n, NULL, NULL, 0); SYSERROR(tmp, "futex"); printf("WAITED: %d\n", tmp); tmp = shmdt(f); SYSERROR(tmp, "shmdt"); exit(0); } And then this in the other shell: #define SYSERROR(X, Y) \ do { if ((long)(X) == -1L) { perror(Y); exit(1); }} while(0) int main() { int shmid, tmp, f; shmid = shmget(23, 4, IPC_CREAT\|0666); SYSERROR(shmid, "shmget"); f = shmat(shmid, NULL, 0); SYSERROR(f, "shmat"); (f)++; printf("WAKE: %p{%x}\n", f, *f); tmp = futex(f, FUTEX_WAKE, 1, NULL, NULL, 0); SYSERROR(tmp, "futex"); printf("WOKE: %d\n", tmp); tmp = shmdt(f); SYSERROR(tmp, "shmdt"); exit(0); } The first program will set up a SYSV IPC SHM segment and wait on a futex in it for the number at the start to change. The program will increment that number and wake the first program up. This leads to output of the form: SHELL 1 SHELL 2 ======================= ======================= # /dowait WAIT: 0xc32ac000{0} # /dowake WAKE: 0xc32ac000{1} WAITED: 0 WOKE: 1 Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:15 -07:00
David Howells	0112c4c646	[PATCH] NOMMU: Add docs about shared memory Add documentation about using shared memory in NOMMU mode. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:14 -07:00
David Howells	6fa5f80bc3	[PATCH] NOMMU: Make mremap() partially work for NOMMU kernels Make mremap() partially work for NOMMU kernels. It may resize a VMA provided that it doesn't exceed the size of the slab object in which the storage is allocated that the VMA refers to. Shareable VMAs may not be resized. Moving VMAs (as permitted by MREMAP_MAYMOVE) is not currently supported. This patch also makes use of the fact that the VMA list is now ordered to cut it short when possible. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:14 -07:00
David Howells	3034097a50	[PATCH] NOMMU: Order the per-mm_struct VMA list Order the per-mm_struct VMA list by address so that searching it can be cut short when the appropriate address has been exceeded. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:14 -07:00
David Howells	dbf8685c8e	[PATCH] NOMMU: Implement /proc/pid/maps for NOMMU Implement /proc/pid/maps for NOMMU by reading the vm_area_list attached to current->mm->context.vmlist. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:14 -07:00
David Howells	d00c7b9937	[PATCH] NOMMU: Permit ptrace to ignore non-PROT_WRITE VMAs in NOMMU mode Permit ptrace to modify a section that's non-shared but is marked unwritable, such as is obtained by mapping the text segment of an ELF-FDPIC executable binary with into a binary that's being ptraced[]. [] Under NOMMU conditions ptrace causes read-only MAP_PRIVATE mmaps to become totally private copies because if a private mapping was actually shared then the debugging setting breakpoints in it would potentially crash other processes. This is done by using the VM_MAYWRITE flag rather than the VM_WRITE flag when deciding whether to permit a write. Without this patch a debugger can't set breakpoints in the mapped text sections of executables that are mapped read-only private, even if the mmap() syscall has taken a private copy because PT_PTRACED is set. In addition, VM_MAYREAD is used instead of VM_READ for similar reasons. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:14 -07:00
David Howells	7b4d5b8b39	[PATCH] NOMMU: Check VMA protections Check the VMA protections in get_user_pages() against what's being asked. This checks to see that we don't accidentally write on a non-writable VMA or permit an I/O mapping VMA to be accessed (which may lack page structs). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:14 -07:00
Sonic Zhang	910e46da4b	[PATCH] Check if start address is in vma region in NOMMU function get_user_pages() In NOMMU arch, if run "cat /proc/self/mem", data from physical address 0 are read. This behavior is different from MMU arch. In IA32, message "cat: /proc/self/mem: Input/output error" is reported. This issue is rootcaused by not validate the start address in NOMMU function get_user_pages(). Following patch solves this issue. Signed-off-by: Sonic Zhang <sonic.adi@gmail.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:14 -07:00
David Howells	0159b141d8	[PATCH] NOMMU: Use find_vma() rather than reimplementing a VMA search Use find_vma() in the NOMMU version of access_process_vm() rather than reimplementing it. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:14 -07:00
David Howells	5da6185bca	[PATCH] NOMMU: Set BDI capabilities for /dev/mem and /dev/kmem Set the backing device info capabilities for /dev/mem and /dev/kmem to permit direct sharing under no-MMU conditions and full mapping capabilities under MMU conditions. Make the BDI used by these available to all directly mappable character devices. Also comment the capabilities for /dev/zero. [akpm@osdl.org: ifdef reductions] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:14 -07:00
David Howells	0ec76a110f	[PATCH] NOMMU: Check that access_process_vm() has a valid target Check that access_process_vm() is accessing a valid mapping in the target process. This limits ptrace() accesses and accesses through /proc/<pid>/maps to only those regions actually mapped by a program. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:14 -07:00
Haavard Skinnemoen	361f6ed1d0	[PATCH] AVR32: Use unsigned long flags for saving interrupt state Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:13 -07:00
Rolf Eike Beer	d24afc57d5	[PATCH] Mark __remove_vm_area() static The function is exported but not used from anywhere else. It's also marked as "not for driver use" so noone out there should really care. Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:13 -07:00
Rolf Eike Beer	ead04089b1	[PATCH] Fix kerneldoc comments in mm/vmalloc.c The empty line between the short description and the first argument description causes a section to appear twice in the generated manpage. Also the short description should really be short: the script can't handle multiple lines. Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:13 -07:00
Randy Dunlap	423b41d773	[PATCH] mm/page_alloc: use NULL instead of 0 for ptr Use NULL instead of 0 for pointer value, eliminate sparse warnings. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:13 -07:00
Jes Sorensen	17a3b05047	[PATCH] mspec driver Implement the special memory driver (mspec) based on the do_no_pfn approach. The driver is currently used only on SN2 hardware with special fetchop support but could be beneficial on other architectures using the uncached mode. Signed-off-by: Jes Sorensen <jes@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:13 -07:00
Jes Sorensen	f4b81804a2	[PATCH] do_no_pfn() Implement do_no_pfn() for handling mapping of memory without a struct page backing it. This avoids creating fake page table entries for regions which are not backed by real memory. This feature is used by the MSPEC driver and other users, where it is highly undesirable to have a struct page sitting behind the page (for instance if the page is accessed in cached mode via the struct page in parallel to the the driver accessing it uncached, which can result in data corruption on some architectures, such as ia64). This version uses specific NOPFN_{SIGBUS,OOM} return values, rather than expect all negative pfn values would be an error. It also bugs on cow mappings as this would not work with the VM. [akpm@osdl.org: micro-optimise] Signed-off-by: Jes Sorensen <jes@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:13 -07:00
Christoph Lameter	5d29234362	[PATCH] zone_statistics: Use hot node instead of cold zone_pgdat Now that we have the node in the hot zone of struct zone we can avoid accessing zone_pgdat in zone_statistics. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:13 -07:00
Christoph Lameter	66a550308b	[PATCH] Do not allocate pagesets for unpopulated zones. We do not need to allocate pagesets for unpopulated zones. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:13 -07:00
Christoph Lameter	d5f541ed6e	[PATCH] Add node to zone for the NUMA case Add the node in order to optimize zone_to_nid. Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:13 -07:00
Christoph Lameter	765c4507af	[PATCH] GFP_THISNODE for the slab allocator This patch insures that the slab node lists in the NUMA case only contain slabs that belong to that specific node. All slab allocations use GFP_THISNODE when calling into the page allocator. If an allocation fails then we fall back in the slab allocator according to the zonelists appropriate for a certain context. This allows a replication of the behavior of alloc_pages and alloc_pages node in the slab layer. Currently allocations requested from the page allocator may be redirected via cpusets to other nodes. This results in remote pages on nodelists and that in turn results in interrupt latency issues during cache draining. Plus the slab is handing out memory as local when it is really remote. Fallback for slab memory allocations will occur within the slab allocator and not in the page allocator. This is necessary in order to be able to use the existing pools of objects on the nodes that we fall back to before adding more pages to a slab. The fallback function insures that the nodes we fall back to obey cpuset restrictions of the current context. We do not allocate objects from outside of the current cpuset context like before. Note that the implementation of locality constraints within the slab allocator requires importing logic from the page allocator. This is a mischmash that is not that great. Other allocators (uncached allocator, vmalloc, huge pages) face similar problems and have similar minimal reimplementations of the basic fallback logic of the page allocator. There is another way of implementing a slab by avoiding per node lists (see modular slab) but this wont work within the existing slab. V1->V2: - Use NUMA_BUILD to avoid #ifdef CONFIG_NUMA - Exploit GFP_THISNODE being 0 in the NON_NUMA case to avoid another #ifdef [akpm@osdl.org: build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:12 -07:00
Christoph Lameter	77f700dab4	[PATCH] Disable GFP_THISNODE in the non-NUMA case GFP_THISNODE must be set to 0 in the non numa case otherwise we disable retry and warnings for failing allocations in the SMP and UP case. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:12 -07:00
Christoph Lameter	08e0f6a970	[PATCH] Add NUMA_BUILD definition in kernel.h to avoid #ifdef CONFIG_NUMA The NUMA_BUILD constant is always available and will be set to 1 on NUMA_BUILDs. That way checks valid only under CONFIG_NUMA can easily be done without #ifdef CONFIG_NUMA F.e. if (NUMA_BUILD && <numa_condition>) { ... } [akpm: not a thing we'd normally do, but CONFIG_NUMA is special: it is causing ifdef explosion in core kernel, so let's see if this is a comfortable way in whcih to control that] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:12 -07:00
Jes Sorensen	c72419138f	[PATCH] Condense output of show_free_areas() On larger systems, the amount of output dumped on the console when you do SysRq-M is beyond insane. This patch is trying to reduce it somewhat as even with the smaller NUMA systems that have hit the desktop this seems to be a fair thing to do. The philosophy I have taken is as follows: 1) If a zone is empty, don't tell, we don't need yet another line telling us so. The information is available since one can look up the fact how many zones were initialized in the first place. 2) Put as much information on a line is possible, if it can be done in one line, rahter than two, then do it in one. I tried to format the temperature stuff for easy reading. Change show_free_areas() to not print lines for empty zones. If no zone output is printed, the zone is empty. This reduces the number of lines dumped to the console in sysrq on a large system by several thousand lines. Change the zone temperature printouts to use one line per CPU instead of two lines (one hot, one cold). On a 1024 CPU, 1024 node system, this reduces the console output by over a million lines of output. While this is a bigger problem on large NUMA systems, it is also applicable to smaller desktop sized and mid range NUMA systems. Old format: Mem-info: Node 0 DMA per-cpu: cpu 0 hot: high 42, batch 7 used:24 cpu 0 cold: high 14, batch 3 used:1 cpu 1 hot: high 42, batch 7 used:34 cpu 1 cold: high 14, batch 3 used:0 cpu 2 hot: high 42, batch 7 used:0 cpu 2 cold: high 14, batch 3 used:0 cpu 3 hot: high 42, batch 7 used:0 cpu 3 cold: high 14, batch 3 used:0 cpu 4 hot: high 42, batch 7 used:0 cpu 4 cold: high 14, batch 3 used:0 cpu 5 hot: high 42, batch 7 used:0 cpu 5 cold: high 14, batch 3 used:0 cpu 6 hot: high 42, batch 7 used:0 cpu 6 cold: high 14, batch 3 used:0 cpu 7 hot: high 42, batch 7 used:0 cpu 7 cold: high 14, batch 3 used:0 Node 0 DMA32 per-cpu: empty Node 0 Normal per-cpu: empty Node 0 HighMem per-cpu: empty Node 1 DMA per-cpu: [snip] Free pages: 5410688kB (0kB HighMem) Active:9536 inactive:4261 dirty:6 writeback:0 unstable:0 free:338168 slab:1931 mapped:1900 pagetables:208 Node 0 DMA free:1676304kB min:3264kB low:4080kB high:4896kB active:128048kB inactive:61568kB present:1970880kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 HighMem free:0kB min:512kB low:512kB high:512kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 1 DMA free:1951728kB min:3280kB low:4096kB high:4912kB active:5632kB inactive:1504kB present:1982464kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 .... New format: Mem-info: Node 0 DMA per-cpu: CPU 0: Hot: hi: 42, btch: 7 usd: 41 Cold: hi: 14, btch: 3 usd: 2 CPU 1: Hot: hi: 42, btch: 7 usd: 40 Cold: hi: 14, btch: 3 usd: 1 CPU 2: Hot: hi: 42, btch: 7 usd: 0 Cold: hi: 14, btch: 3 usd: 0 CPU 3: Hot: hi: 42, btch: 7 usd: 0 Cold: hi: 14, btch: 3 usd: 0 CPU 4: Hot: hi: 42, btch: 7 usd: 0 Cold: hi: 14, btch: 3 usd: 0 CPU 5: Hot: hi: 42, btch: 7 usd: 0 Cold: hi: 14, btch: 3 usd: 0 CPU 6: Hot: hi: 42, btch: 7 usd: 0 Cold: hi: 14, btch: 3 usd: 0 CPU 7: Hot: hi: 42, btch: 7 usd: 0 Cold: hi: 14, btch: 3 usd: 0 Node 1 DMA per-cpu: [snip] Free pages: 5411088kB (0kB HighMem) Active:9558 inactive:4233 dirty:6 writeback:0 unstable:0 free:338193 slab:1942 mapped:1918 pagetables:208 Node 0 DMA free:1677648kB min:3264kB low:4080kB high:4896kB active:129296kB inactive:58864kB present:1970880kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 1 DMA free:1948448kB min:3280kB low:4096kB high:4912kB active:6864kB inactive:3536kB present:1982464kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:12 -07:00
Christoph Lameter	de3083ec3e	[PATCH] slab: fix kmalloc_node applying memory policies if nodeid == numa_node_id() kmalloc_node() falls back to ___cache_alloc() under certain conditions and at that point memory policies may be applied redirecting the allocation away from the current node. Therefore kmalloc_node(...,numa_node_id()) or kmalloc_node(...,-1) may not return memory from the local node. Fix this by doing the policy check in __cache_alloc() instead of ____cache_alloc(). This version here is a cleanup of Kiran's patch. - Tested on ia64. - Extra material removed. - Consolidate the exit path if alternate_node_alloc() returned an object. [akpm@osdl.org: warning fix] Signed-off-by: Alok N Kataria <alok.kataria@calsoftinc.com> Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:12 -07:00
Nick Piggin	0fd0e6b05a	[PATCH] page invalidation cleanup Clean up the invalidate code, and use a common function to safely remove the page from pagecache. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:12 -07:00
Heiko Carstens	5b99cd0eff	[PATCH] own header file for struct page This moves the definition of struct page from mm.h to its own header file page-struct.h. This is a prereq to fix SetPageUptodate which is broken on s390: #define SetPageUptodate(_page) do { struct page *__page = (_page); if (!test_and_set_bit(PG_uptodate, &__page->flags)) page_test_and_clear_dirty(_page); } while (0) _page gets used twice in this macro which can cause subtle bugs. Using __page for the page_test_and_clear_dirty call doesn't work since it causes yet another problem with the page_test_and_clear_dirty macro as well. In order to avoid all these problems caused by macros it seems to be a good idea to get rid of them and convert them to static inline functions. Because of header file include order it's necessary to have a seperate header file for the struct page definition. Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:12 -07:00
Andrew Morton	e129b5c23c	[PATCH] vm: add per-zone writeout counter The VM is supposed to minimise the number of pages which get written off the LRU (for IO scheduling efficiency, and for high reclaim-success rates). But we don't actually have a clear way of showing how true this is. So add `nr_vmscan_write' to /proc/vmstat and /proc/zoneinfo - the number of pages which have been written by the vm scanner in this zone and globally. Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:12 -07:00
Mel Gorman	fb01439c5b	[PATCH] Allow an arch to expand node boundaries Arch-independent zone-sizing determines the size of a node (pgdat->node_spanned_pages) based on the physical memory that was registered by the architecture. However, when CONFIG_MEMORY_HOTPLUG_RESERVE is set, the architecture expects that the spanned_pages will be much larger and that mem_map will be allocated that is used lated on memory hot-add. This patch allows an architecture that sets CONFIG_MEMORY_HOTPLUG_RESERVE to call push_node_boundaries() which will set the node beginning and end to at least the requested boundary. Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Andi Kleen <ak@muc.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Keith Mannthey" <kmannth@gmail.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:12 -07:00
Mel Gorman	9c7cd6877c	[PATCH] Account for holes that are outside the range of physical memory absent_pages_in_range() made the assumption that users of the API would not care about holes beyound the end of physical memory. This was not the case. This patch will account for ranges outside of physical memory as holes correctly. Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Andi Kleen <ak@muc.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Keith Mannthey" <kmannth@gmail.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:11 -07:00
Mel Gorman	0e0b864e06	[PATCH] Account for memmap and optionally the kernel image as holes The x86_64 code accounted for memmap and some portions of the the DMA zone as holes. This was because those areas would never be reclaimed and accounting for them as memory affects min watermarks. This patch will account for the memmap as a memory hole. Architectures may optionally use set_dma_reserve() if they wish to account for a portion of memory in ZONE_DMA as a hole. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Andi Kleen <ak@muc.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Keith Mannthey" <kmannth@gmail.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:11 -07:00
Mel Gorman	05e0caad3b	[PATCH] Have ia64 use add_active_range() and free_area_init_nodes Size zones and holes in an architecture independent manner for ia64. [bob.picco@hp.com: fix ia64 FLATMEM+VIRTUAL_MEM_MAP] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Bob Picco <bob.picco@hp.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Andi Kleen <ak@muc.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Keith Mannthey" <kmannth@gmail.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:11 -07:00
Mel Gorman	5cb248abf5	[PATCH] Have x86_64 use add_active_range() and free_area_init_nodes Size zones and holes in an architecture independent manner for x86_64. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Andi Kleen <ak@muc.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Keith Mannthey" <kmannth@gmail.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:11 -07:00
Mel Gorman	4cfee88ad3	[PATCH] Have x86 use add_active_range() and free_area_init_nodes Size zones and holes in an architecture independent manner for x86. [akpm@osdl.org: build fix] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Andi Kleen <ak@muc.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Keith Mannthey" <kmannth@gmail.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:11 -07:00
Mel Gorman	c67c3cb4c9	[PATCH] Have Power use add_active_range() and free_area_init_nodes() Size zones and holes in an architecture independent manner for Power. [judith@osdl.org: build fix] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Andi Kleen <ak@muc.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Keith Mannthey" <kmannth@gmail.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:11 -07:00
Mel Gorman	c713216dee	[PATCH] Introduce mechanism for registering active regions of memory At a basic level, architectures define structures to record where active ranges of page frames are located. Once located, the code to calculate zone sizes and holes in each architecture is very similar. Some of this zone and hole sizing code is difficult to read for no good reason. This set of patches eliminates the similar-looking architecture-specific code. The patches introduce a mechanism where architectures register where the active ranges of page frames are with add_active_range(). When all areas have been discovered, free_area_init_nodes() is called to initialise the pgdat and zones. The zone sizes and holes are then calculated in an architecture independent manner. Patch 1 introduces the mechanism for registering and initialising PFN ranges Patch 2 changes ppc to use the mechanism - 139 arch-specific LOC removed Patch 3 changes x86 to use the mechanism - 136 arch-specific LOC removed Patch 4 changes x86_64 to use the mechanism - 74 arch-specific LOC removed Patch 5 changes ia64 to use the mechanism - 52 arch-specific LOC removed Patch 6 accounts for mem_map as a memory hole as the pages are not reclaimable. It adjusts the watermarks slightly Tony Luck has successfully tested for ia64 on Itanium with tiger_defconfig, gensparse_defconfig and defconfig. Bob Picco has also tested and debugged on IA64. Jack Steiner successfully boot tested on a mammoth SGI IA64-based machine. These were on patches against 2.6.17-rc1 and release 3 of these patches but there have been no ia64-changes since release 3. There are differences in the zone sizes for x86_64 as the arch-specific code for x86_64 accounts the kernel image and the starting mem_maps as memory holes but the architecture-independent code accounts the memory as present. The big benefit of this set of patches is a sizable reduction of architecture-specific code, some of which is very hairy. There should be a greater reduction when other architectures use the same mechanisms for zone and hole sizing but I lack the hardware to test on. Additional credit; Dave Hansen for the initial suggestion and comments on early patches Andy Whitcroft for reviewing early versions and catching numerous errors Tony Luck for testing and debugging on IA64 Bob Picco for fixing bugs related to pfn registration, reviewing a number of patch revisions, providing a number of suggestions on future direction and testing heavily Jack Steiner and Robin Holt for testing on IA64 and clarifying issues related to memory holes Yasunori for testing on IA64 Andi Kleen for reviewing and feeding back about x86_64 Christian Kujau for providing valuable information related to ACPI problems on x86_64 and testing potential fixes This patch: Define the structure to represent an active range of page frames within a node in an architecture independent manner. Architectures are expected to register active ranges of PFNs using add_active_range(nid, start_pfn, end_pfn) and call free_area_init_nodes() passing the PFNs of the end of each zone. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Bob Picco <bob.picco@hp.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Andi Kleen <ak@muc.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Keith Mannthey" <kmannth@gmail.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:11 -07:00

1 2 3 4 5 ...

36365 Commits All Branches Search

36365 Commits

All Branches