linux_old1

Commit Graph

Author	SHA1	Message	Date
Henrique de Moraes Holschuh	f1b23361a0	rfkill: document the rfkill struct locking (v2) Reorder fields in struct rfkill and add comments to make it clear which fields are protected by rfkill->mutex. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:36:33 -04:00
Adrian McMenamin	1795cf48b3	sh/maple: clean maple bus code This patch cleans up the handling of the maple bus queue to remove the risk of races when adding packets. It also removes references to the redundant connect and disconnect functions. Signed-off-by: Adrian McMenamin <adrian@mcmen.demon.co.uk> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2008-07-29 22:10:56 +09:00
FUJITA Tomonori	8978b74253	generic, x86: fix add iommu_num_pages helper function This IOMMU helper function doesn't work for some architectures: http://marc.info/?l=linux-kernel&m=121699304403202&w=2 It also breaks POWER and SPARC builds: http://marc.info/?l=linux-kernel&m=121730388001890&w=2 Currently, only x86 IOMMUs use this so let's move it to x86 for now. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-29 12:12:48 +02:00
Avi Kivity	ed84862433	KVM: Advertise synchronized mmu support to userspace Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-29 12:34:02 +03:00
Andrea Arcangeli	e930bffe95	KVM: Synchronize guest physical memory map to host virtual memory map Synchronize changes to host virtual addresses which are part of a KVM memory slot to the KVM shadow mmu. This allows pte operations like swapping, page migration, and madvise() to transparently work with KVM. Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-29 12:33:53 +03:00
Linus Torvalds	5dfb66ba8c	Merge branch 'for-linus' of git://git.o-hand.com/linux-mfd * 'for-linus' of git://git.o-hand.com/linux-mfd: mfd: accept pure device as a parent, not only platform_device mfd: add platform_data to mfd_cell mfd: Coding style fixes mfd: Use to_platform_device instead of container_of	2008-07-28 18:15:41 -07:00
Linus Torvalds	1d9b9f6a53	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (21 commits) x86/PCI: use dev_printk when possible PCI: add D3 power state avoidance quirk PCI: fix bogus "'device' may be used uninitialized" warning in pci_slot PCI: add an option to allow ASPM enabled forcibly PCI: disable ASPM on pre-1.1 PCIe devices PCI: disable ASPM per ACPI FADT setting PCI MSI: Don't disable MSIs if the mask bit isn't supported PCI: handle 64-bit resources better on 32-bit machines PCI: rewrite PCI BAR reading code PCI: document pci_target_state PCI hotplug: fix typo in pcie hotplug output x86 gart: replace to_pages macro with iommu_num_pages x86, AMD IOMMU: replace to_pages macro with iommu_num_pages iommu: add iommu_num_pages helper function dma-coherent: add documentation to new interfaces Cris: convert to using generic dma-coherent mem allocator Sh: use generic per-device coherent dma allocator ARM: support generic per-device coherent dma mem Generic dma-coherent: fix DMA_MEMORY_EXCLUSIVE x86: use generic per-device dma coherent allocator ...	2008-07-28 18:14:24 -07:00
Dmitry Baryshkov	424f525a12	mfd: accept pure device as a parent, not only platform_device Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com> Signed-off-by: Samuel Ortiz <sameo@openedhand.com>	2008-07-29 01:30:26 +02:00
Hisashi Hifumi	8ab22b9abb	vfs: pagecache usage optimization for pagesize!=blocksize When we read some part of a file through pagecache, if there is a pagecache of corresponding index but this page is not uptodate, read IO is issued and this page will be uptodate. I think this is good for pagesize == blocksize environment but there is room for improvement on pagesize != blocksize environment. Because in this case a page can have multiple buffers and even if a page is not uptodate, some buffers can be uptodate. So I suggest that when all buffers which correspond to a part of a file that we want to read are uptodate, use this pagecache and copy data from this pagecache to user buffer even if a page is not uptodate. This can reduce read IO and improve system throughput. I wrote a benchmark program and got result number with this program. This benchmark do: 1: mount and open a test file. 2: create a 512MB file. 3: close a file and umount. 4: mount and again open a test file. 5: pwrite randomly 300000 times on a test file. offset is aligned by IO size(1024bytes). 6: measure time of preading randomly 100000 times on a test file. The result was: 2.6.26 330 sec 2.6.26-patched 226 sec Arch:i386 Filesystem:ext3 Blocksize:1024 bytes Memory: 1GB On ext3/4, a file is written through buffer/block. So random read/write mixed workloads or random read after random write workloads are optimized with this patch under pagesize != blocksize environment. This test result showed this. The benchmark program is as follows: #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <time.h> #include <stdlib.h> #include <string.h> #include <sys/mount.h> #define LEN 1024 #define LOOP 1024512 / 512MB / main(void) { unsigned long i, offset, filesize; int fd; char buf[LEN]; time_t t1, t2; if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) { perror("cannot mount\n"); exit(1); } memset(buf, 0, LEN); fd = open("/root/test1/testfile", O_CREAT\|O_RDWR\|O_TRUNC); if (fd < 0) { perror("cannot open file\n"); exit(1); } for (i = 0; i < LOOP; i++) write(fd, buf, LEN); close(fd); if (umount("/root/test1/") < 0) { perror("cannot umount\n"); exit(1); } if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) { perror("cannot mount\n"); exit(1); } fd = open("/root/test1/testfile", O_RDWR); if (fd < 0) { perror("cannot open file\n"); exit(1); } filesize = LEN LOOP; for (i = 0; i < 300000; i++){ offset = (random() % filesize) & (~(LEN - 1)); pwrite(fd, buf, LEN, offset); } printf("start test\n"); time(&t1); for (i = 0; i < 100000; i++){ offset = (random() % filesize) & (~(LEN - 1)); pread(fd, buf, LEN, offset); } time(&t2); printf("%ld sec\n", t2-t1); close(fd); if (umount("/root/test1/") < 0) { perror("cannot umount\n"); exit(1); } } Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jan Kara <jack@ucw.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-28 16:30:21 -07:00
Andrea Arcangeli	cddb8a5c14	mmu-notifiers: core With KVM/GFP/XPMEM there isn't just the primary CPU MMU pointing to pages. There are secondary MMUs (with secondary sptes and secondary tlbs) too. sptes in the kvm case are shadow pagetables, but when I say spte in mmu-notifier context, I mean "secondary pte". In GRU case there's no actual secondary pte and there's only a secondary tlb because the GRU secondary MMU has no knowledge about sptes and every secondary tlb miss event in the MMU always generates a page fault that has to be resolved by the CPU (this is not the case of KVM where the a secondary tlb miss will walk sptes in hardware and it will refill the secondary tlb transparently to software if the corresponding spte is present). The same way zap_page_range has to invalidate the pte before freeing the page, the spte (and secondary tlb) must also be invalidated before any page is freed and reused. Currently we take a page_count pin on every page mapped by sptes, but that means the pages can't be swapped whenever they're mapped by any spte because they're part of the guest working set. Furthermore a spte unmap event can immediately lead to a page to be freed when the pin is released (so requiring the same complex and relatively slow tlb_gather smp safe logic we have in zap_page_range and that can be avoided completely if the spte unmap event doesn't require an unpin of the page previously mapped in the secondary MMU). The mmu notifiers allow kvm/GRU/XPMEM to attach to the tsk->mm and know when the VM is swapping or freeing or doing anything on the primary MMU so that the secondary MMU code can drop sptes before the pages are freed, avoiding all page pinning and allowing 100% reliable swapping of guest physical address space. Furthermore it avoids the code that teardown the mappings of the secondary MMU, to implement a logic like tlb_gather in zap_page_range that would require many IPI to flush other cpu tlbs, for each fixed number of spte unmapped. To make an example: if what happens on the primary MMU is a protection downgrade (from writeable to wrprotect) the secondary MMU mappings will be invalidated, and the next secondary-mmu-page-fault will call get_user_pages and trigger a do_wp_page through get_user_pages if it called get_user_pages with write=1, and it'll re-establishing an updated spte or secondary-tlb-mapping on the copied page. Or it will setup a readonly spte or readonly tlb mapping if it's a guest-read, if it calls get_user_pages with write=0. This is just an example. This allows to map any page pointed by any pte (and in turn visible in the primary CPU MMU), into a secondary MMU (be it a pure tlb like GRU, or an full MMU with both sptes and secondary-tlb like the shadow-pagetable layer with kvm), or a remote DMA in software like XPMEM (hence needing of schedule in XPMEM code to send the invalidate to the remote node, while no need to schedule in kvm/gru as it's an immediate event like invalidating primary-mmu pte). At least for KVM without this patch it's impossible to swap guests reliably. And having this feature and removing the page pin allows several other optimizations that simplify life considerably. Dependencies: 1) mm_take_all_locks() to register the mmu notifier when the whole VM isn't doing anything with "mm". This allows mmu notifier users to keep track if the VM is in the middle of the invalidate_range_begin/end critical section with an atomic counter incraese in range_begin and decreased in range_end. No secondary MMU page fault is allowed to map any spte or secondary tlb reference, while the VM is in the middle of range_begin/end as any page returned by get_user_pages in that critical section could later immediately be freed without any further ->invalidate_page notification (invalidate_range_begin/end works on ranges and ->invalidate_page isn't called immediately before freeing the page). To stop all page freeing and pagetable overwrites the mmap_sem must be taken in write mode and all other anon_vma/i_mmap locks must be taken too. 2) It'd be a waste to add branches in the VM if nobody could possibly run KVM/GRU/XPMEM on the kernel, so mmu notifiers will only enabled if CONFIG_KVM=m/y. In the current kernel kvm won't yet take advantage of mmu notifiers, but this already allows to compile a KVM external module against a kernel with mmu notifiers enabled and from the next pull from kvm.git we'll start using them. And GRU/XPMEM will also be able to continue the development by enabling KVM=m in their config, until they submit all GRU/XPMEM GPLv2 code to the mainline kernel. Then they can also enable MMU_NOTIFIERS in the same way KVM does it (even if KVM=n). This guarantees nobody selects MMU_NOTIFIER=y if KVM and GRU and XPMEM are all =n. The mmu_notifier_register call can fail because mm_take_all_locks may be interrupted by a signal and return -EINTR. Because mmu_notifier_reigster is used when a driver startup, a failure can be gracefully handled. Here an example of the change applied to kvm to register the mmu notifiers. Usually when a driver startups other allocations are required anyway and -ENOMEM failure paths exists already. struct kvm kvm_arch_create_vm(void) { struct kvm kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL); + int err; if (!kvm) return ERR_PTR(-ENOMEM); INIT_LIST_HEAD(&kvm->arch.active_mmu_pages); + kvm->arch.mmu_notifier.ops = &kvm_mmu_notifier_ops; + err = mmu_notifier_register(&kvm->arch.mmu_notifier, current->mm); + if (err) { + kfree(kvm); + return ERR_PTR(err); + } + return kvm; } mmu_notifier_unregister returns void and it's reliable. The patch also adds a few needed but missing includes that would prevent kernel to compile after these changes on non-x86 archs (x86 didn't need them by luck). [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix mm/filemap_xip.c build] [akpm@linux-foundation.org: fix mm/mmu_notifier.c build] Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Cc: Jack Steiner <steiner@sgi.com> Cc: Robin Holt <holt@sgi.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Kanoj Sarcar <kanojsarcar@yahoo.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Chris Wright <chrisw@redhat.com> Cc: Marcelo Tosatti <marcelo@kvack.org> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Izik Eidus <izike@qumranet.com> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-28 16:30:21 -07:00
Andrea Arcangeli	7906d00cd1	mmu-notifiers: add mm_take_all_locks() operation mm_take_all_locks holds off reclaim from an entire mm_struct. This allows mmu notifiers to register into the mm at any time with the guarantee that no mmu operation is in progress on the mm. This operation locks against the VM for all pte/vma/mm related operations that could ever happen on a certain mm. This includes vmtruncate, try_to_unmap, and all page faults. The caller must take the mmap_sem in write mode before calling mm_take_all_locks(). The caller isn't allowed to release the mmap_sem until mm_drop_all_locks() returns. mmap_sem in write mode is required in order to block all operations that could modify pagetables and free pages without need of altering the vma layout (for example populate_range() with nonlinear vmas). It's also needed in write mode to avoid new anon_vmas to be associated with existing vmas. A single task can't take more than one mm_take_all_locks() in a row or it would deadlock. mm_take_all_locks() and mm_drop_all_locks are expensive operations that may have to take thousand of locks. mm_take_all_locks() can fail if it's interrupted by signals. When mmu_notifier_register returns, we must be sure that the driver is notified if some task is in the middle of a vmtruncate for the 'mm' where the mmu notifier was registered (mmu_notifier_invalidate_range_start/end is run around the vmtruncation but mmu_notifier_register can run after mmu_notifier_invalidate_range_start and before mmu_notifier_invalidate_range_end). Same problem for rmap paths. And we've to remove page pinning to avoid replicating the tlb_gather logic inside KVM (and GRU doesn't work well with page pinning regardless of needing tlb_gather), so without mm_take_all_locks when vmtruncate frees the page, kvm would have no way to notice that it mapped into sptes a page that is going into the freelist without a chance of any further mmu_notifier notification. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Jack Steiner <steiner@sgi.com> Cc: Robin Holt <holt@sgi.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Kanoj Sarcar <kanojsarcar@yahoo.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Chris Wright <chrisw@redhat.com> Cc: Marcelo Tosatti <marcelo@kvack.org> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Izik Eidus <izike@qumranet.com> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-28 16:30:21 -07:00
Andrea Arcangeli	6beeac76f5	mmu-notifiers: add list_del_init_rcu() Introduce list_del_init_rcu() and document it. Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Jack Steiner <steiner@sgi.com> Cc: Robin Holt <holt@sgi.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Kanoj Sarcar <kanojsarcar@yahoo.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Chris Wright <chrisw@redhat.com> Cc: Marcelo Tosatti <marcelo@kvack.org> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Izik Eidus <izike@qumranet.com> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-28 16:30:20 -07:00
Mike Rapoport	56edb58be1	mfd: add platform_data to mfd_cell Adding platform_data to mfd_cell allows passing of platform data directly to the platform_device created for each cell and thus reuse of existing drivers. On the other side it can be used as a hook to mfd_cell itself removing the need in mfd_get_cell method. Signed-off-by: Mike Rapoport <mike@compulab.co.il> Acked-by: Dmitry Baryshkov <dbaryshkov@gmail.com> Signed-off-by: Samuel Ortiz <sameo@openedhand.com>	2008-07-29 01:23:32 +02:00
Alan Cox	979b1791e5	PCI: add D3 power state avoidance quirk Libata has some hacks to deal with certain controllers going silly in D3 state. The right way to handle this is to keep a PCI device flag for such devices. That can then be generalised for no ATA devices with power problems. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-07-28 15:12:11 -07:00
Shaohua Li	149e16372a	PCI: disable ASPM on pre-1.1 PCIe devices Disable ASPM on pre-1.1 PCIe devices, as many of them don't implement it correctly. Tested-by: Jack Howarth <howarth@bromo.msbb.uc.edu> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-07-28 14:56:57 -07:00
Shaohua Li	5fde244d39	PCI: disable ASPM per ACPI FADT setting The ACPI FADT table includes an ASPM control bit. If the bit is set, do not enable ASPM since it may indicate that the platform doesn't actually support the feature. Tested-by: Jack Howarth <howarth@bromo.msbb.uc.edu> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-07-28 14:56:09 -07:00
Ingo Molnar	9e3ee1c39c	Merge branch 'linus' into cpus4096 Conflicts: kernel/stop_machine.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-28 23:32:00 +02:00
Jesse Barnes	29111f579f	Merge branch 'x86/iommu' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip into for-linus	2008-07-28 14:31:10 -07:00
Linus Torvalds	e56b3bc794	cpu masks: optimize and clean up cpumask_of_cpu() Clean up and optimize cpumask_of_cpu(), by sharing all the zero words. Instead of stupidly generating all possible i=0...NR_CPUS 2^i patterns creating a huge array of constant bitmasks, realize that the zero words can be shared. In other words, on a 64-bit architecture, we only ever need 64 of these arrays - with a different bit set in one single world (with enough zero words around it so that we can create any bitmask by just offsetting in that big array). And then we just put enough zeroes around it that we can point every single cpumask to be one of those things. So when we have 4k CPU's, instead of having 4k arrays (of 4k bits each, with one bit set in each array - 2MB memory total), we have exactly 64 arrays instead, each 8k bits in size (64kB total). And then we just point cpumask(n) to the right position (which we can calculate dynamically). Once we have the right arrays, getting "cpumask(n)" ends up being: static inline const cpumask_t get_cpu_mask(unsigned int cpu) { const unsigned long p = cpu_bit_bitmap[1 + cpu % BITS_PER_LONG]; p -= cpu / BITS_PER_LONG; return (const cpumask_t *)p; } This brings other advantages and simplifications as well: - we are not wasting memory that is just filled with a single bit in various different places - we don't need all those games to re-create the arrays in some dense format, because they're already going to be dense enough. if we compile a kernel for up to 4k CPU's, "wasting" that 64kB of memory is a non-issue (especially since by doing this "overlapping" trick we probably get better cache behaviour anyway). [ mingo@elte.hu: Converted Linus's mails into a commit. See: http://lkml.org/lkml/2008/7/27/156 http://lkml.org/lkml/2008/7/28/320 Also applied a family filter - which also has the side-effect of leaving out the bits where Linus calls me an idio... Oh, never mind ;-) ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-28 22:20:41 +02:00
Ingo Molnar	414f746d23	Merge branch 'linus' into cpus4096	2008-07-28 21:14:43 +02:00
Linus Torvalds	f934fb19ef	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: add driver for Atmel integrated touchscreen controller Input: ads7846 - optimize order of calculating Rt in ads7846_rx() Input: ads7846 - fix sparse endian warnings Input: uinput - remove duplicate include Input: serio - offload resume to kseriod Input: serio - mark serio_register_driver() __must_check	2008-07-28 09:59:26 -07:00
Ben Dooks	7f71ac9374	mfd: Coding style fixes Fix some coding style fixes in the mfd core driver. Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Samuel Ortiz <sameo@openedhand.com>	2008-07-28 18:29:09 +02:00
Linus Torvalds	d9089c296b	Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (25 commits) powerpc: Disable 64K hugetlb support when doing 64K SPU mappings powerpc/powermac: Fixup default serial port device for pmac_zilog powerpc/powermac: Use sane default baudrate for SCC debugging powerpc/mm: Implement _PAGE_SPECIAL & pte_special() for 64-bit powerpc: Show processor cache information in sysfs powerpc: Make core id information available to userspace powerpc: Make core sibling information available to userspace powerpc/vio: More fallout from dma_mapping_error API change ibmveth: Fix multiple errors with dma_mapping_error conversion powerpc/pseries: Fix CMO sysdev attribute API change fallout powerpc: Enable tracehook for the architecture powerpc: Add TIF_NOTIFY_RESUME support for tracehook powerpc: Add asm/syscall.h with the tracehook entry points powerpc: Make syscall tracing use tracehook.h helpers powerpc: Call tracehook_signal_handler() when setting up signal frames powerpc: Update cpu_sibling_maps dynamically powerpc: register_cpu_online should be __cpuinit powerpc: kill useless SMT code in prom_hold_cpus powerpc: Fix 8xx build failure powerpc: Fix vio build warnings ...	2008-07-28 09:05:35 -07:00
Linus Torvalds	b10a8b7238	Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (72 commits) sh: SuperH Mobile CEU and camera platform data for AP325RXA sh: Update smc911x platform data for AP325RXA sh: SuperH Mobile LCDC platform data for AP325RXA sh: Add SuperH Mobile CEU platform data for Migo-R sh: Add SuperH Mobile LCDC platform data for Migo-R sh: Move asid_cache() out of ifdef to fix SH-3/4 nommu build. sh: Workaround for __put_user_asm() bug with gcc 4.x on big-endian. sh: Wire up new syscalls. sh: fix uImage Entry Point sh_keysc: remove request_mem_region() and release_mem_region() sh: Don't miss pending signals returning to user mode after signal processing sh: Use clk_always_enable() on sh7366 sh: Use clk_always_enable() on sh7343 / SE77343 sh: Use clk_always_enable() on sh7722 / Migo-R / SE7722 sh: Use clk_always_enable() on sh7723 / ap325rxa sh: Introduce clk_always_enable() function sh: Show all clocks and their state in /proc/clocks sh: Merge sh7343 and sh7722 clock code sh: Add SuperH Mobile MSTPCR bits to clock framework sh: Use arch_flags to simplify sh7722 siu clock code ...	2008-07-28 08:41:13 -07:00
Linus Torvalds	37eaf8c746	Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: stop_machine: fix up ftrace.c stop_machine: Wean existing callers off stop_machine_run() stop_machine(): stop_machine_run() changed to use cpu mask Hotplug CPU: don't check cpu_online after take_cpu_down Simplify stop_machine stop_machine: add ALL_CPUS option module: fix build warning with !CONFIG_KALLSYMS	2008-07-28 08:37:46 -07:00
Adrian McMenamin	306cfd630a	maple: tidy maple_driver code by removing redundant connect/disconnect The connect and disconnect functions are unnecessary - everything they do can be accomplished in the initial probe - so remove them. Signed-off-by: Adrian McMenamin <adrian@mcmen.demon.co.uk> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2008-07-28 18:10:30 +09:00
Barry Naujok	9403540c06	dcache: Add case-insensitive support d_ci_add() routine This add a dcache entry to the dcache for lookup, but changing the name that is associated with the entry rather than the one passed in to the lookup routine. First, it sees if the case-exact match already exists in the dcache and uses it if one exists. Otherwise, it allocates a new node with the new name and splices it into the dcache. Original code from ntfs_lookup in fs/ntfs/namei.c by Anton Altaparmakov. Signed-off-by: Barry Naujok <bnaujok@sgi.com> Signed-off-by: Anton Altaparmakov <aia21@cantab.net> Acked-by: Christoph Hellwig <hch@infradead.org>	2008-07-28 16:58:39 +10:00
Benjamin Herrenschmidt	d65d830ca0	Merge commit 'gcl/gcl-next'	2008-07-28 16:30:40 +10:00
Rusty Russell	eeec4fad96	stop_machine(): stop_machine_run() changed to use cpu mask Instead of a "cpu" arg with magic values NR_CPUS (any cpu) and ~0 (all cpus), pass a cpumask_t. Allow NULL for the common case (where we don't care which CPU the function is run on): temporary cpumask_t's are usually considered bad for stack space. This deprecates stop_machine_run, to be removed soon when all the callers are dead. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-07-28 12:16:30 +10:00
Rusty Russell	ffdb5976c4	Simplify stop_machine stop_machine creates a kthread which creates kernel threads. We can create those threads directly and simplify things a little. Some care must be taken with CPU hotunplug, which has special needs, but that code seems more robust than it was in the past. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>	2008-07-28 12:16:29 +10:00
Jason Baron	5c2aed6225	stop_machine: add ALL_CPUS option -allow stop_mahcine_run() to call a function on all cpus. Calling stop_machine_run() with a 'ALL_CPUS' invokes this new behavior. stop_machine_run() proceeds as normal until the calling cpu has invoked 'fn'. Then, we tell all the other cpus to call 'fn'. Signed-off-by: Jason Baron <jbaron@redhat.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> CC: Adrian Bunk <bunk@stusta.de> CC: Andi Kleen <andi@firstfloor.org> CC: Alexey Dobriyan <adobriyan@gmail.com> CC: Christoph Hellwig <hch@infradead.org> CC: mingo@elte.hu CC: akpm@osdl.org	2008-07-28 12:16:28 +10:00
Mauro Carvalho Chehab	c2f90e9536	Merge ../linux-2.6	2008-07-27 22:23:18 -03:00
Andrea Righi	940389b8af	task IO accounting: move all IO statistics in struct task_io_accounting Simplify the code of include/linux/task_io_accounting.h. It is also more reasonable to have all the task i/o-related statistics in a single struct (task_io_accounting). Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-27 16:12:28 -07:00
Mauro Carvalho Chehab	eb703027ac	Merge ../linux-2.6	2008-07-27 18:11:53 -03:00
Linus Torvalds	837b41b5de	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: firewire: state userland requirements in Kconfig help firewire: avoid memleak after phy config transmit failure firewire: fw-ohci: TSB43AB22/A dualbuffer workaround firewire: queue the right number of data firewire: warn on unfinished transactions during card removal firewire: small fw_fill_request cleanup firewire: fully initialize fw_transaction before marking it pending firewire: fix race of bus reset with request transmission	2008-07-27 10:24:06 -07:00
Andrea Righi	5995477ab7	task IO accounting: improve code readability Put all i/o statistics in struct proc_io_accounting and use inline functions to initialize and increment statistics, removing a lot of single variable assignments. This also reduces the kernel size as following (with CONFIG_TASK_XACCT=y and CONFIG_TASK_IO_ACCOUNTING=y). text data bss dec hex filename 11651 0 0 11651 2d83 kernel/exit.o.before 11619 0 0 11619 2d63 kernel/exit.o.after 10886 132 136 11154 2b92 kernel/fork.o.before 10758 132 136 11026 2b12 kernel/fork.o.after 3082029 807968 4818600 8708597 84e1f5 vmlinux.o.before 3081869 807968 4818600 8708437 84e155 vmlinux.o.after Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Acked-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-27 09:58:20 -07:00
Mauro Carvalho Chehab	50cb993ea6	Merge ../linux-2.6	2008-07-27 12:25:57 -03:00
Mauro Carvalho Chehab	9fa0f6db3a	V4L/DVB (8522): videodev2: Fix merge conflict Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>	2008-07-27 12:24:02 -03:00
Hans Verkuil	c1d7f4f164	V4L/DVB (8524): videodev: copy the VID_TYPE defines to videodev.h The VID_TYPE defines are V4L1 specific, so copy them back to videodev.h. In videodev2.h ensure that they are not used in the kernel (you need to include videodev.h instead) and mark them are deprecated. Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>	2008-07-27 11:07:12 -03:00
Jean-Francois Moine	1250ac6d4a	V4L/DVB (8518): gspca: Remove the remaining frame decoding functions from the subdrivers. SPCA505 and SPCA508 added in the pixel formats. Decode functions and associated resources removed in spca505, 506 and 508. The decode routines are now found in the V4L library. Signed-off-by: Jean-Francois Moine <moinejf@free.fr> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>	2008-07-27 11:06:42 -03:00
Mauro Carvalho Chehab	9993e51c0c	V4L/DVB (8502): videodev2.h: CodingStyle cleanups Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>	2008-07-27 11:06:20 -03:00
Linus Torvalds	8be1a6d6c7	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: mlx4: Update/add Mellanox Technologies copyright lines to mlx4 driver files mlx4_core: Add VLAN tag field to WQE control segment struct RDMA/nes: CM connection setup/teardown rework IPoIB: Correct help text for INFINIBAND_IPOIB_DEBUG IPoIB/cm: Connected mode is no longer EXPERIMENTAL RDMA/ucm: BKL is not needed for ib_ucm_open() RDMA/ucma: BKL is not needed for ucma_open()	2008-07-26 20:40:36 -07:00
Linus Torvalds	9ee08c2df4	Merge git://git.infradead.org/mtd-2.6 * git://git.infradead.org/mtd-2.6: (57 commits) [MTD] [NAND] subpage read feature as a way to increase performance. CPUFREQ: S3C24XX NAND driver frequency scaling support. [MTD][NAND] au1550nd: remove unused variable [MTD] jedec_probe: Fix SST 16-bit chip detection [MTD][MTDPART] Fix a division by zero bug [MTD][MTDPART] Cleanup and document the erase region handling [MTD][MTDPART] Handle most checkpatch findings [MTD][MTDPART] Seperate main loop from per-partition code in add_mtd_partition [MTD] physmap: resume already suspended chips on failure to suspend [MTD] physmap: Fix suspend/resume/shutdown bugs. [MTD] [NOR] Fix -ETIMEO errors in CFI driver [MTD] [NAND] fsl_elbc_nand: fix section mismatch with CONFIG_MTD_OF_PARTS=y [JFFS2] Use .unlocked_ioctl [MTD] Fix const assignment in the MTD command line partitioning driver [MTD] [NOR] gen_probe: No debug message when debugging is disabled [MTD] [NAND] remove __PPC__ hardcoded address from DiskOnChip drivers [MTD] [MAPS] Remove the bast-flash driver. [MTD] [NAND] fsl_elbc_nand: ecclayout cleanups [MTD] [NAND] fsl_elbc_nand: implement support for flash-based BBT [MTD] [NAND] fsl_elbc_nand: fix OOB workability for large page NAND chips ...	2008-07-26 20:30:56 -07:00
Linus Torvalds	eaf0ba5ef6	Merge branch 'tracehook' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace * 'tracehook' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace: tracehook: comment fixes	2008-07-26 20:29:39 -07:00
Linus Torvalds	bdee6ac7d1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc: atmel-mci: debugfs support mmc: Add per-card debugfs support mmc: Export internal host state through debugfs imxmmc: fix crash when no platform data is provided imxmmc: fix platform resources imxmmc: remove DEBUG definition mmc_spi: put signals to low power off fix	2008-07-26 20:27:31 -07:00
Linus Torvalds	4836e30078	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (39 commits) [PATCH] fix RLIM_NOFILE handling [PATCH] get rid of corner case in dup3() entirely [PATCH] remove remaining namei_{32,64}.h crap [PATCH] get rid of indirect users of namei.h [PATCH] get rid of __user_path_lookup_open [PATCH] f_count may wrap around [PATCH] dup3 fix [PATCH] don't pass nameidata to __ncp_lookup_validate() [PATCH] don't pass nameidata to gfs2_lookupi() [PATCH] new (local) helper: user_path_parent() [PATCH] sanitize __user_walk_fd() et.al. [PATCH] preparation to __user_walk_fd cleanup [PATCH] kill nameidata passing to permission(), rename to inode_permission() [PATCH] take noexec checks to very few callers that care Re: [PATCH 3/6] vfs: open_exec cleanup [patch 4/4] vfs: immutable inode checking cleanup [patch 3/4] fat: dont call notify_change [patch 2/4] vfs: utimes cleanup [patch 1/4] vfs: utimes: move owner check into inode_change_ok() [PATCH] vfs: use kstrdup() and check failing allocation ...	2008-07-26 20:23:44 -07:00
Linus Torvalds	5c7c204aec	Merge git://git.kernel.org/pub/scm/linux/kernel/git/kkeil/ISDN-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/kkeil/ISDN-2.6: Add layer1 over IP support Add mISDN HFC multiport driver Add mISDN HFC PCI driver Add mISDN DSP Add mISDN core files Define AF_ISDN and PF_ISDN Add mISDN driver	2008-07-26 20:19:41 -07:00
Linus Torvalds	2284284281	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: netns: fix ip_rt_frag_needed rt_is_expired netfilter: nf_conntrack_extend: avoid unnecessary "ct->ext" dereferences netfilter: fix double-free and use-after free netfilter: arptables in netns for real netfilter: ip{,6}tables_security: fix future section mismatch selinux: use nf_register_hooks() netfilter: ebtables: use nf_register_hooks() Revert "pkt_sched: sch_sfq: dump a real number of flows" qeth: use dev->ml_priv instead of dev->priv syncookies: Make sure ECN is disabled net: drop unused BUG_TRAP() net: convert BUG_TRAP to generic WARN_ON drivers/net: convert BUG_TRAP to generic WARN_ON	2008-07-26 20:17:56 -07:00
Andrea Righi	510a35d4a4	hugetlb: remove unused variable warning Remove the following warning when CONFIG_HUGETLB_PAGE is not set: ipc/shm.c: In function `shm_get_stat': ipc/shm.c:565: warning: unused variable `h' [akpm@linux-foundation.org: use tabs, not spaces] Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 20:16:47 -07:00
Al Viro	3f8206d496	[PATCH] get rid of indirect users of namei.h fs.h needs path.h, not namei.h; nfs_fs.h doesn't need it at all. Several places in the tree needed direct include. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:42 -04:00
Al Viro	964bd18362	[PATCH] get rid of __user_path_lookup_open Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:41 -04:00
Al Viro	516e0cc564	[PATCH] f_count may wrap around make it atomic_long_t; while we are at it, get rid of useless checks in affs, hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:40 -04:00
Al Viro	2d8f30380a	[PATCH] sanitize __user_walk_fd() et.al. * do not pass nameidata; struct path is all the callers want. * switch to new helpers: user_path_at(dfd, pathname, flags, &path) user_path(pathname, &path) user_lpath(pathname, &path) user_path_dir(pathname, &path) (fail if not a directory) The last 3 are trivial macro wrappers for the first one. * remove nameidata in callers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:34 -04:00
Al Viro	f419a2e3b6	[PATCH] kill nameidata passing to permission(), rename to inode_permission() Incidentally, the name that gives hundreds of false positives on grep is not a good idea... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:31 -04:00
Miklos Szeredi	9767d74957	[patch 1/4] vfs: utimes: move owner check into inode_change_ok() Add a new ia_valid flag: ATTR_TIMES_SET, to handle the UTIMES_OMIT/UTIMES_NOW and UTIMES_NOW/UTIMES_OMIT cases. In these cases neither ATTR_MTIME_SET nor ATTR_ATIME_SET is in the flags, yet the POSIX draft specifies that permission checking is performed the same way as if one or both of the times was explicitly set to a timestamp. See the path "vfs: utimensat(): fix error checking for {UTIME_NOW,UTIME_OMIT} case" by Michael Kerrisk for the patch introducing this behavior. This is a cleanup, as well as allowing filesystems (NFS/fuse/...) to perform their own permission checking instead of the default. CC: Ulrich Drepper <drepper@redhat.com> CC: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:25 -04:00
Li Zefan	88b387824f	[PATCH] vfs: use kstrdup() and check failing allocation - use kstrdup() instead of kmalloc() + memcpy() - return NULL if allocating ->mnt_devname failed - mnt_devname should be const Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:24 -04:00
Al Viro	b77b0646ef	[PATCH] pass MAY_OPEN to vfs_permission() explicitly ... and get rid of the last "let's deduce mask from nameidata->flags" bit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:22 -04:00
Al Viro	a110343f0d	[PATCH] fix MAY_CHDIR/MAY_ACCESS/LOOKUP_ACCESS mess * MAY_CHDIR is redundant - it's an equivalent of MAY_ACCESS * MAY_ACCESS on fuse should affect only the last step of pathname resolution * fchdir() and chroot() should pass MAY_ACCESS, for the same reason why chdir() needs that. * now that we pass MAY_ACCESS explicitly in all cases, LOOKUP_ACCESS can be removed; it has no business being in nameidata. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:21 -04:00
Al Viro	7f2da1e7d0	[PATCH] kill altroot long overdue... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:20 -04:00
Al Viro	8bb79224b8	[PATCH] permission checks for chdir need special treatment only on the last step ... so we ought to pass MAY_CHDIR to vfs_permission() instead of having it triggered on every step of preceding pathname resolution. LOOKUP_CHDIR is killed by that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:19 -04:00
Miklos Szeredi	db2e747b14	[patch 5/5] vfs: remove mode parameter from vfs_symlink() Remove the unused mode parameter from vfs_symlink and callers. Thanks to Tetsuo Handa for noticing. CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2008-07-26 20:53:18 -04:00
Miklos Szeredi	2f1936b877	[patch 3/5] vfs: change remove_suid() to file_remove_suid() All calls to remove_suid() are made with a file pointer, because (similarly to file_update_time) it is called when the file is written. Clean up callers by passing in a file instead of a dentry. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2008-07-26 20:53:16 -04:00
Al Viro	e6305c43ed	[PATCH] sanitize ->permission() prototype * kill nameidata * argument; map the 3 bits in ->flags anybody cares about to new MAY_... ones and pass with the mask. * kill redundant gfs2_iop_permission() * sanitize ecryptfs_permission() * fix remaining places where ->permission() instances might barf on new MAY_... found in mask. The obvious next target in that direction is permission(9) folded fix for nfs_permission() breakage from Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:14 -04:00
Al Viro	9043476f72	[PATCH] sanitize proc_sysctl * keep references to ctl_table_head and ctl_table in /proc/sys inodes * grab the former during operations, use the latter for access to entry if that succeeds * have ->d_compare() check if table should be seen for one who does lookup; that allows us to avoid flipping inodes - if we have the same name resolve to different things, we'll just keep several dentries and ->d_compare() will reject the wrong ones. * have ->lookup() and ->readdir() scan the table of our inode first, then walk all ctl_table_header and scan ->attached_by for those that are attached to our directory. * implement ->getattr(). * get rid of insane amounts of tree-walking * get rid of the need to know dentry in ->permission() and of the contortions induced by that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:12 -04:00
Al Viro	ae7edecc9b	[PATCH] sysctl: keep track of tree relationships In a sense, that's the heart of the series. It's based on the following property of the trees we are actually asked to add: they can be split into stem that is already covered by registered trees and crown that is entirely new. IOW, if a/b and a/c/d are introduced by our tree, then a/c is also introduced by it. That allows to associate tree and table entry with each node in the union; while directory nodes might be covered by many trees, only one will cover the node by its crown. And that will allow much saner logics for /proc/sys in the next patches. This patch introduces the data structures needed to keep track of that. When adding a sysctl table, we find a "parent" one. Which is to say, find the deepest node on its stem that already is present in one of the tables from our table set or its ancestor sets. That table will be our parent and that node in it - attachment point. Add our table to list anchored in parent, have it refer the parent and contents of attachment point. Also remember where its crown lives. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:11 -04:00
Al Viro	f7e6ced406	[PATCH] allow delayed freeing of ctl_table_header Refcount the sucker; instead of freeing it by the end of unregistration just drop the refcount and free only when it hits zero. Make sure that we _always_ make ->unregistering non-NULL in start_unregistering(). That allows anybody to get a reference to such puppy, preventing its freeing and reuse. It does not block unregistration. Anybody who holds such a reference can * try to grab a "use" reference (ctl_head_grab()); that will succeeds if and only if it hadn't entered unregistration yet. If it succeeds, we can use it in all normal ways until we release the "use" reference (with ctl_head_finish()). Note that this relies on having ->unregistering become non-NULL in all cases when one starts to unregister the sucker. * keep pointers to ctl_table entries; they can be freed if the entire thing is unregistered. However, if ctl_head_grab() succeeds, we know that unregistration had not happened (and will not happen until ctl_head_finish()) and such pointers can be used safely. IOW, now we can have inodes under /proc/sys keep references to ctl_table entries, protecting them with references to ctl_table_header and grabbing the latter for the duration of operations that require access to ctl_table. That won't cause deadlocks, since unregistration will not be stopped by mere keeping a reference to ctl_table_header. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:09 -04:00
Al Viro	734550921e	[PATCH] beginning of sysctl cleanup - ctl_table_set New object: set of sysctls [currently - root and per-net-ns]. Contains: pointer to parent set, list of tables and "should I see this set?" method (->is_seen(set)). Current lists of tables are subsumed by that; net-ns contains such a beast. ->lookup() for ctl_table_root returns pointer to ctl_table_set instead of that to ->list of that ctl_table_set. [folded compile fixes by rdd for configs without sysctl] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:08 -04:00
Denys Vlasenko	d2d9648ec6	[PATCH] reuse xxx_fifo_fops for xxx_pipe_fops Merge fifo and pipe file_operations. Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:06 -04:00
Pekka Enberg	93bc4e89c2	netfilter: fix double-free and use-after free As suggested by Patrick McHardy, introduce a __krealloc() that doesn't free the original buffer to fix a double-free and use-after-free bug introduced by me in netfilter that uses RCU. Reported-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Tested-by: Dieter Ries <clip2@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-26 17:49:33 -07:00
Karsten Keil	af69fb3a8f	Add mISDN HFC multiport driver Enable support for cards with Cologne Chip AG's HFC multiport chip. Signed-off-by: Karsten Keil <kkeil@suse.de>	2008-07-27 02:00:43 +02:00
Karsten Keil	960366cf8d	Add mISDN DSP Enable support for digital audio processing capability. This module may be used for special applications that require cross connecting of bchannels, conferencing, dtmf decoding echo cancelation, tone generation, and Blowfish encryption and decryption. It may use hardware features if available. Signed-off-by: Karsten Keil <kkeil@suse.de>	2008-07-27 01:56:38 +02:00
Karsten Keil	1b2b03f8e5	Add mISDN core files Add mISDN core files Signed-off-by: Karsten Keil <kkeil@suse.de>	2008-07-27 01:54:58 +02:00
Karsten Keil	04578dd330	Define AF_ISDN and PF_ISDN Define the address and protocol family value for mISDN. Signed-off-by: Karsten Keil <kkeil@suse.de>	2008-07-27 01:47:00 +02:00
Haavard Skinnemoen	f4b7f927b5	mmc: Add per-card debugfs support For each card successfully added to the bus, create a subdirectory under the host's debugfs root with information about the card. At the moment, only a single file is added to the card directory for all cards: "state". It reflects the "state" field in struct mmc_card, indicating whether the card is present, readonly, etc. For MMC and SD cards (not SDIO), another file is added: "status". Reading this file will ask the card about its current status and return it. This can be useful if the card just refuses to respond to any commands, which might indicate that the card state is not what the MMC core thinks it is (due to a missing stop command, for example.) Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2008-07-27 01:26:17 +02:00
Haavard Skinnemoen	6edd8ee60a	mmc: Export internal host state through debugfs When CONFIG_DEBUG_FS is set, create a few files under /sys/kernel/debug containing information about an mmc host's internal state. Currently, just a single file is created, "ios", which contains information about the current operating parameters for the bus (clock speed, bus width, etc.) Host drivers can add additional files and directories under the host's root directory by passing the debugfs_root field in struct mmc_host as the 'parent' parameter to debugfs_create_*. Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2008-07-27 01:26:16 +02:00
Roland McGrath	a9906a1919	tracehook: comment fixes This fixes some typos and errors in <linux/tracehook.h> comments. No code changes. Signed-off-by: Roland McGrath <roland@redhat.com>	2008-07-26 14:41:26 -07:00
Linus Torvalds	fb3b806144	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, AMD IOMMU: include amd_iommu_last_bdf in device initialization x86: fix IBM Summit based systems' phys_cpu_present_map on 32-bit kernels x86, RDC321x: remove gpio.h complications x86, RDC321x: add to mach-default crashdump: fix undefined reference to `elfcorehdr_addr' flag parameters: fix compile error of sys_epoll_create1	2008-07-26 13:25:05 -07:00
Adrian Bunk	9580d85f9c	drivers/char/rtc.c: make 2 functions static The following functions can now become static: - rtc_interrupt() - rtc_get_rtc_time() Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Bernhard Walle <bwalle@suse.de> Acked-by: Paul Gortmaker <p_gortmaker@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:12 -07:00
Adrian Bunk	7c363b8c65	mm/swapfile.c: make code static This patch makes the following needlessly global code static: - swap_lock - nr_swapfiles - struct swap_list Signed-off-by: Adrian Bunk <bunk@kernel.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:12 -07:00
Adrian Bunk	15f59adae0	make mm/memory.c:print_bad_pte() static This patch makes the needlessly global print_bad_pte() static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:12 -07:00
Adrian Bunk	9d8fddfb17	mm/allocpercpu.c: make 4 functions static This patch makes the following needlessly global functions static: - percpu_depopulate() - __percpu_depopulate_mask() - percpu_populate() - __percpu_populate_mask() Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:12 -07:00
Roland McGrath	bbc698636e	task_current_syscall This adds the new function task_current_syscall() on machines where the asm/syscall.h interface is supported (CONFIG_HAVE_ARCH_TRACEHOOK). It's exported for modules to use in the future. This function safely samples the state of a blocked thread to collect what system call it is blocked in, and the six system call argument registers. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:10 -07:00
Roland McGrath	85ba2d862e	tracehook: wait_task_inactive This extends wait_task_inactive() with a new argument so it can be used in a "soft" mode where it will check for the task changing state unexpectedly and back off. There is no change to existing callers. This lays the groundwork to allow robust, noninvasive tracing that can try to sample a blocked thread but back off safely if it wakes up. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:09 -07:00
Roland McGrath	828c365cc8	tracehook: asm/syscall.h This adds asm-generic/syscall.h, which documents what a real asm-ARCH/syscall.h file should define. This is not used yet, but will provide all the machine-dependent details of examining a user system call about to begin, in progress, or just ended. Each arch should add an asm-ARCH/syscall.h that defines all the entry points documented in asm-generic/syscall.h, as short inlines if possible. This lets us write new tracing code that understands user system call registers, without any new arch-specific work. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:09 -07:00
Roland McGrath	64b1208d5b	tracehook: TIF_NOTIFY_RESUME This adds tracehook.h inlines to enable a new arch feature in support of user debugging/tracing. This is not used yet, but it lays the groundwork for a debugger to be able to wrangle a task that's possibly running, without interrupting its syscalls in progress. Each arch should define TIF_NOTIFY_RESUME, and in their entry.S code treat it much like TIF_SIGPENDING. That is, it causes you to take the slow path when returning to user mode, where you get the full user-mode state accessible as for signal handling or ptrace. The arch code should check TIF_NOTIFY_RESUME after handling TIF_SIGPENDING. When it's set, clear it and then call tracehook_notify_resume(). In future, tracing code will call set_notify_resume() when it wants to get a callback in tracehook_notify_resume(). Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:09 -07:00
Roland McGrath	b787f7ba67	tracehook: force signal_pending() This defines a new hook tracehook_force_sigpending() that lets tracing code decide to force TIF_SIGPENDING on in recalc_sigpending(). This is not used yet, so it compiles away to nothing for now. It lays the groundwork for new tracing code that can interrupt a task synthetically without actually sending a signal. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:09 -07:00
Roland McGrath	2b2a1ff64a	tracehook: death This moves the ptrace logic in task death (exit_notify) into tracehook.h inlines. Some code is rearranged slightly to make things nicer. There is no change, only cleanup. There is one hook called with the tasklist_lock write-locked, as ptrace needs. There is also a new hook called after exit_state changes and without locks. This is a better place for tracing work to be in the future, since it doesn't delay the whole system with locking. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:09 -07:00
Roland McGrath	fa00b80b3c	tracehook: job control This defines the tracehook_notify_jctl() hook to formalize the ptrace effects on the job control notifications. There is no change, only cleanup. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:09 -07:00
Roland McGrath	7bcf6a2ca5	tracehook: get_signal_to_deliver This defines the tracehook_get_signal() hook to allow tracing code to slip in before normal signal dequeuing. This lays the groundwork for new tracing features that can inject synthetic signals outside the normal queue or control the disposition of delivered signals. The calling convention lets tracehook_get_signal() decide both exactly what will happen and what signal number to report in the handler/exit. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:09 -07:00
Roland McGrath	283d7559e7	tracehook: syscall This adds standard tracehook.h inlines for arch code to call when TIF_SYSCALL_TRACE has been set. This replaces having each arch implement the ptrace guts for its syscall tracing support. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:09 -07:00
Roland McGrath	445a91d2fe	tracehook: tracehook_consider_fatal_signal This defines tracehook_consider_fatal_signal() has a fine-grained hook for deciding to skip the special cases for a fatal signal, as ptrace does. There is no change, only cleanup. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:09 -07:00
Roland McGrath	35de254dc6	tracehook: tracehook_consider_ignored_signal This defines tracehook_consider_ignored_signal() has a fine-grained hook for deciding to prevent the normal short-circuit of sending an ignored signal, as ptrace does. There is no change, only cleanup. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:09 -07:00
Roland McGrath	c45aea2761	tracehook: tracehook_signal_handler This defines tracehook_signal_handler() as a hook for the arch signal handling code to call. It gives ptrace the opportunity to stop for a pseudo-single-step trap immediately after signal handler setup is done. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:09 -07:00
Roland McGrath	fa8e26ccd4	tracehook: tracehook_expect_breakpoints This adds tracehook_expect_breakpoints() as a formal hook for the nommu code to use for its, "Is text-poking likely?" check at mmap time. This names the actual semantics the code means to test, and documents it. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:09 -07:00
Roland McGrath	0d094efeb1	tracehook: tracehook_tracer_task This adds the tracehook_tracer_task() hook to consolidate all forms of "Who is using ptrace on me?" logic. This is used for "TracerPid:" in /proc and for permission checks. We also clean up the selinux code the called an identical accessor. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:08 -07:00
Roland McGrath	dae33574dc	tracehook: release_task This moves the ptrace-related logic from release_task into tracehook.h and ptrace.h inlines. It provides clean hooks both before and after locking tasklist_lock, for future tracing logic to do more cleanup without the lock. This also changes release_task() itself in the rare "zap_leader" case to set the leader to EXIT_DEAD before iterating. This maintains the invariant that release_task() only ever handles a task in EXIT_DEAD. This is a common-sense invariant that is already always true except in this one arcane case of zombie leader whose parent ignores SIGCHLD. This change is harmless and only costs one store in this one rare case. It keeps the expected state more consisently sane, which is nicer when debugging weirdness in release_task(). It also lets some future code in the tracehook entry points rely on this invariant for bookkeeping. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:08 -07:00
Roland McGrath	daded34be9	tracehook: vfork-done This moves the PTRACE_EVENT_VFORK_DONE tracing into a tracehook.h inline, tracehook_report_vfork_done(). The change has no effect, just clean-up. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:08 -07:00
Roland McGrath	09a05394fe	tracehook: clone This moves all the ptrace initialization and tracing logic for task creation into tracehook.h and ptrace.h inlines. It reorganizes the code slightly, but should not change any behavior. There are four tracehook entry points, at each important stage of task creation. This keeps the interface from the core fork.c code fairly clean, while supporting the complex setup required for ptrace or something like it. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:08 -07:00
Roland McGrath	30199f5a46	tracehook: exit This moves the PTRACE_EVENT_EXIT tracing into a tracehook.h inline, tracehook_report_exec(). The change has no effect, just clean-up. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:08 -07:00
Roland McGrath	6341c393fc	tracehook: exec This moves all the ptrace hooks related to exec into tracehook.h inlines. This also lifts the calls for tracing out of the binfmt load_binary hooks into search_binary_handler() after it calls into the binfmt module. This change has no effect, since all the binfmt modules' load_binary functions did the call at the end on success, and now search_binary_handler() does it immediately after return if successful. We consolidate the repeated code, and binfmt modules no longer need to import ptrace_notify(). Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:08 -07:00
Roland McGrath	88ac2921a7	tracehook: add linux/tracehook.h This patch series introduces the "tracehook" interface layer of inlines in <linux/tracehook.h>. There are more details in the log entry for patch 01/23 and in the header file comments inside that patch. Most of these changes move code around with little or no change, and they should not break anything or change any behavior. This sets a new standard for uniform arch support to enable clean arch-independent implementations of new debugging and tracing stuff, denoted by CONFIG_HAVE_ARCH_TRACEHOOK. Patch 20/23 adds that symbol to arch/Kconfig, with comments listing everything an arch has to do before setting "select HAVE_ARCH_TRACEHOOK". These are elaborted a bit at: http://sourceware.org/systemtap/wiki/utrace/arch/HowTo The new inlines that arch code must define or call have detailed kerneldoc comments in the generic header files that say what is required. No arch is obligated to do any work, and no arch's build should be broken by these changes. There are several steps that each arch should take so it can set HAVE_ARCH_TRACEHOOK. Most of these are simple. Providing this support will let new things people add for doing debugging and tracing of user-level threads "just work" for your arch in the future. For an arch that does not provide HAVE_ARCH_TRACEHOOK, some new options for such features will not be available for config. I have done some arch work and will submit this to the arch maintainers after the generic tracehook series settles in. For now, that work is available in my GIT repositories, and in patch and mbox-of-patches form at http://people.redhat.com/roland/utrace/2.6-current/ This paves the way for my "utrace" work, to be submitted later. But it is not innately tied to that. I hope that the tracehook series can go in soon regardless of what eventually does or doesn't go on top of it. For anyone implementing any kind of new tracing/debugging plan, or just understanding all the context of the existing ptrace implementation, having tracehook.h makes things much easier to find and understand. This patch: This adds the new kernel-internal header file <linux/tracehook.h>. This is not yet used at all. The comments in the header introduce what the following series of patches is about. The aim is to formalize and consolidate all the places that the core kernel code and the arch code now ties into the ptrace implementation. These patches mostly don't cause any functional change. They just move the details of ptrace logic out of core code into tracehook.h inlines, where they are mostly compiled away to the same as before. All that changes is that everything is thoroughly documented and any future reworking of ptrace, or addition of something new, would not have to touch core code all over, just change the tracehook.h inlines. The new linux/ptrace.h inlines are used by the following patches in the new tracehook_*() inlines. Using these helpers for the ptrace event stops makes it simple to change or disable the old ptrace implementation of these stops conditionally later. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:08 -07:00
Alexey Dobriyan	51cc50685a	SL*B: drop kmem cache argument from constructor Kmem cache passed to constructor is only needed for constructors that are themselves multiplexeres. Nobody uses this "feature", nor does anybody uses passed kmem cache in non-trivial way, so pass only pointer to object. Non-trivial places are: arch/powerpc/mm/init_64.c arch/powerpc/mm/hugetlbpage.c This is flag day, yes. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Jon Tollefson <kniht@linux.vnet.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Matt Mackall <mpm@selenic.com> [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c] [akpm@linux-foundation.org: fix mm/slab.c] [akpm@linux-foundation.org: fix ubifs] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:07 -07:00
Nick Piggin	19fd623127	mm: spinlock tree_lock mapping->tree_lock has no read lockers. convert the lock from an rwlock to a spinlock. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Hugh Dickins <hugh@veritas.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:06 -07:00
Nick Piggin	e286781d5f	mm: speculative page references If we can be sure that elevating the page_count on a pagecache page will pin it, we can speculatively run this operation, and subsequently check to see if we hit the right page rather than relying on holding a lock or otherwise pinning a reference to the page. This can be done if get_page/put_page behaves consistently throughout the whole tree (ie. if we "get" the page after it has been used for something else, we must be able to free it with a put_page). Actually, there is a period where the count behaves differently: when the page is free or if it is a constituent page of a compound page. We need an atomic_inc_not_zero operation to ensure we don't try to grab the page in either case. This patch introduces the core locking protocol to the pagecache (ie. adds page_cache_get_speculative, and tweaks some update-side code to make it work). Thanks to Hugh for pointing out an improvement to the algorithm setting page_count to zero when we have control of all references, in order to hold off speculative getters. [kamezawa.hiroyu@jp.fujitsu.com: fix migration_entry_wait()] [hugh@veritas.com: fix add_to_page_cache] [akpm@linux-foundation.org: repair a comment] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Jeff Garzik <jeff@garzik.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Hugh Dickins <hugh@veritas.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:06 -07:00
Nick Piggin	47feff2c8e	radix-tree: add gang_lookup_slot, gang_lookup_slot_tag Introduce gang_lookup_slot() and gang_lookup_slot_tag() functions, which are used by lockless pagecache. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Hugh Dickins <hugh@veritas.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:06 -07:00
Nick Piggin	21cc199baa	mm: introduce get_user_pages_fast Introduce a new get_user_pages_fast mm API, which is basically a get_user_pages with a less general API (but still tends to be suited to the common case): - task and mm are always current and current->mm - force is always 0 - pages is always non-NULL - don't pass back vmas This restricted API can be implemented in a much more scalable way on many architectures when the ptes are present, by walking the page tables locklessly (no mmap_sem or page table locks). When the ptes are not populated, get_user_pages_fast() could be slower. This is implemented locklessly on x86, and used in some key direct IO call sites, in later patches, which provides nearly 10% performance improvement on a threaded database workload. Lots of other code could use this too, depending on use cases (eg. grep drivers/). And it might inspire some new and clever ways to use it. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <andi@firstfloor.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:05 -07:00
Huang Weiyi	080ccd4573	include/linux/aio.h: removed duplicated include Removed duplicated include <linux/uio.h> in include/linux/aio.h Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:04 -07:00
Eduard - Gabriel Munteanu	20d8b67c06	relay: add buffer-only channels; useful for early logging Allows one to create and use a channel with no associated files. Files can be initialized later. This is useful in scenarios such as logging in early code, before VFS is up. Therefore, such channels can be created and used as soon as kmem_cache_init() completed. This is needed by kmemtrace to do tracing in early kernel code. [kosaki.motohiro@jp.fujitsu.com: build fix] Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Tom Zanussi <tzanussi@gmail.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:04 -07:00
Eduard - Gabriel Munteanu	7babe8db99	Full conversion to early_initcall() interface, remove old interface A previous patch added the early_initcall(), to allow a cleaner hooking of pre-SMP initcalls. Now we remove the older interface, converting all existing users to the new one. [akpm@linux-foundation.org: cleanups] [akpm@linux-foundation.org: build fix] [kosaki.motohiro@jp.fujitsu.com: warning fix] [kosaki.motohiro@jp.fujitsu.com: warning fix] Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Tom Zanussi <tzanussi@gmail.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:04 -07:00
Eduard - Gabriel Munteanu	c2147a5092	Better interface for hooking early initcalls Added early initcall (pre-SMP) support, using an identical interface to that of regular initcalls. Functions called from do_pre_smp_initcalls() could be converted to use this cleaner interface. This is required by CPU hotplug, because early users have to register notifiers before going SMP. One such CPU hotplug user is the relay interface with buffer-only channels, which needs to register such a notifier, to be usable in early code. This in turn is used by kmemtrace. Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Tom Zanussi <tzanussi@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:04 -07:00
Huang Ying	89081d17f7	kexec jump: save/restore device state This patch implements devices state save/restore before after kexec. This patch together with features in kexec_jump patch can be used for following: - A simple hibernation implementation without ACPI support. You can kexec a hibernating kernel, save the memory image of original system and shutdown the system. When resuming, you restore the memory image of original system via ordinary kexec load then jump back. - Kernel/system debug through making system snapshot. You can make system snapshot, jump back, do some thing and make another system snapshot. - Cooperative multi-kernel/system. With kexec jump, you can switch between several kernels/systems quickly without boot process except the first time. This appears like swap a whole kernel/system out/in. - A general method to call program in physical mode (paging turning off). This can be used to invoke BIOS code under Linux. The following user-space tools can be used with kexec jump: - kexec-tools needs to be patched to support kexec jump. The patches and the precompiled kexec can be download from the following URL: source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2 patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2 binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10 - makedumpfile with patches are used as memory image saving tool, it can exclude free pages from original kernel memory image file. The patches and the precompiled makedumpfile can be download from the following URL: source: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-src_cvs_kh10.tar.bz2 patches: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-patches_cvs_kh10.tar.bz2 binary: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile_cvs_kh10 - An initramfs image can be used as the root file system of kexeced kernel. An initramfs image built with "BuildRoot" can be downloaded from the following URL: initramfs image: http://khibernation.sourceforge.net/download/release_v10/initramfs/rootfs_cvs_kh10.gz All user space tools above are included in the initramfs image. Usage example of simple hibernation: 1. Compile and install patched kernel with following options selected: CONFIG_X86_32=y CONFIG_RELOCATABLE=y CONFIG_KEXEC=y CONFIG_CRASH_DUMP=y CONFIG_PM=y CONFIG_HIBERNATION=y CONFIG_KEXEC_JUMP=y 2. Build an initramfs image contains kexec-tool and makedumpfile, or download the pre-built initramfs image, called rootfs.gz in following text. 3. Prepare a partition to save memory image of original kernel, called hibernating partition in following text. 4. Boot kernel compiled in step 1 (kernel A). 5. In the kernel A, load kernel compiled in step 1 (kernel B) with /sbin/kexec. The shell command line can be as follow: /sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x100000 --mem-max=0xffffff --initrd=rootfs.gz 6. Boot the kernel B with following shell command line: /sbin/kexec -e 7. The kernel B will boot as normal kexec. In kernel B the memory image of kernel A can be saved into hibernating partition as follow: jump_back_entry=`cat /proc/cmdline \| tr ' ' '\n' \| grep kexec_jump_back_entry \| cut -d '='` echo $jump_back_entry > kexec_jump_back_entry cp /proc/vmcore dump.elf Then you can shutdown the machine as normal. 8. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as root file system. 9. In kernel C, load the memory image of kernel A as follow: /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf 10. Jump back to the kernel A as follow: /sbin/kexec -e Then, kernel A is resumed. Implementation point: To support jumping between two kernels, before jumping to (executing) the new kernel and jumping back to the original kernel, the devices are put into quiescent state, and the state of devices and CPU is saved. After jumping back from kexeced kernel and jumping to the new kernel, the state of devices and CPU are restored accordingly. The devices/CPU state save/restore code of software suspend is called to implement corresponding function. Known issues: - Because the segment number supported by sys_kexec_load is limited, hibernation image with many segments may not be load. This is planned to be eliminated by adding a new flag to sys_kexec_load to make a image can be loaded with multiple sys_kexec_load invoking. Now, only the i386 architecture is supported. Signed-off-by: Huang Ying <ying.huang@intel.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:04 -07:00
Huang Ying	3ab8352137	kexec jump This patch provides an enhancement to kexec/kdump. It implements the following features: - Backup/restore memory used by the original kernel before/after kexec. - Save/restore CPU state before/after kexec. The features of this patch can be used as a general method to call program in physical mode (paging turning off). This can be used to call BIOS code under Linux. kexec-tools needs to be patched to support kexec jump. The patches and the precompiled kexec can be download from the following URL: source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2 patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2 binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10 Usage example of calling some physical mode code and return: 1. Compile and install patched kernel with following options selected: CONFIG_X86_32=y CONFIG_KEXEC=y CONFIG_PM=y CONFIG_KEXEC_JUMP=y 2. Build patched kexec-tool or download the pre-built one. 3. Build some physical mode executable named such as "phy_mode" 4. Boot kernel compiled in step 1. 5. Load physical mode executable with /sbin/kexec. The shell command line can be as follow: /sbin/kexec --load-preserve-context --args-none phy_mode 6. Call physical mode executable with following shell command line: /sbin/kexec -e Implementation point: To support jumping without reserving memory. One shadow backup page (source page) is allocated for each page used by kexeced code image (destination page). When do kexec_load, the image of kexeced code is loaded into source pages, and before executing, the destination pages and the source pages are swapped, so the contents of destination pages are backupped. Before jumping to the kexeced code image and after jumping back to the original kernel, the destination pages and the source pages are swapped too. C ABI (calling convention) is used as communication protocol between kernel and called code. A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to indicate that the loaded kernel image is used for jumping back. Now, only the i386 architecture is supported. Signed-off-by: Huang Ying <ying.huang@intel.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:04 -07:00
Alex Dubov	17017d8d2c	memstick: add "start" and "stop" methods to memstick device In some cases it may be desirable to ensure that associated driver is not going to access the media in some period of time. "start" and "stop" methods are provided therefore to allow it. Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:04 -07:00
Alex Dubov	b77899985b	memstick: allow "set_param" method to return an error code Some controllers (Jmicron, for instance) can report temporal failure condition during power-on. It is desirable to account for this using a return value of "set_param" device method. The return value can also be handy to distinguish between supported and unsupported device parameters in run time. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:04 -07:00
Adrian Bunk	929dfb24fb	parport/share.c: proper externs This patch adds proper externs for parport_default_timeslice and parport_default_spintime in include/linux/parport.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:03 -07:00
FUJITA Tomonori	8d8bb39b9e	dma-mapping: add the device argument to dma_mapping_error() Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER architecture does: This enables us to cleanly fix the Calgary IOMMU issue that some devices are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423). I think that per-device dma_mapping_ops support would be also helpful for KVM people to support PCI passthrough but Andi thinks that this makes it difficult to support the PCI passthrough (see the above thread). So I CC'ed this to KVM camp. Comments are appreciated. A pointer to dma_mapping_ops to struct dev_archdata is added. If the pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's NULL, the system-wide dma_ops pointer is used as before. If it's useful for KVM people, I plan to implement a mechanism to register a hook called when a new pci (or dma capable) device is created (it works with hot plugging). It enables IOMMUs to set up an appropriate dma_mapping_ops per device. The major obstacle is that dma_mapping_error doesn't take a pointer to the device unlike other DMA operations. So x86 can't have dma_mapping_ops per device. Note all the POWER IOMMUs use the same dma_mapping_error function so this is not a problem for POWER but x86 IOMMUs use different dma_mapping_error functions. The first patch adds the device argument to dma_mapping_error. The patch is trivial but large since it touches lots of drivers and dma-mapping.h in all the architecture. This patch: dma_mapping_error() doesn't take a pointer to the device unlike other DMA operations. So we can't have dma_mapping_ops per device. Note that POWER already has dma_mapping_ops per device but all the POWER IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device argument. [akpm@linux-foundation.org: fix sge] [akpm@linux-foundation.org: fix svc_rdma] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix bnx2x] [akpm@linux-foundation.org: fix s2io] [akpm@linux-foundation.org: fix pasemi_mac] [akpm@linux-foundation.org: fix sdhci] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix sparc] [akpm@linux-foundation.org: fix ibmvscsi] Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:03 -07:00
Andrew Morton	16d69265b9	uninline arch_pick_mmap_layout() Fix this, on avr32: include/linux/utsname.h:35, from init/main.c:20: include/linux/sched.h: In function 'arch_pick_mmap_layout': include/linux/sched.h:2149: error: implicit declaration of function 'PAGE_ALIGN' Reported-by: Adrian Bunk <bunk@kernel.org> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:01 -07:00
Mauro Carvalho Chehab	fdd2a7e2da	V4L/DVB (8500a): videotext.h: whitespace cleanup Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>	2008-07-26 13:25:25 -03:00
Mike Travis	b8d317d10c	cpumask: make cpumask_of_cpu_map generic If an arch doesn't define cpumask_of_cpu_map, create a generic statically-initialized one for them. This allows removal of the buggy cpumask_of_cpu() macro (&cpumask_of_cpu() gives address of out-of-scope var). An arch with NR_CPUS of 4096 probably wants to allocate this itself based on the actual number of CPUs, since otherwise they're using 2MB of rodata (1024 cpus means 128k). That's what CONFIG_HAVE_CPUMASK_OF_CPU_MAP is for (only x86/64 does so at the moment). In future as we support more CPUs, we'll need to resort to a get_cpu_map()/put_cpu_map() allocation scheme. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jack Steiner <steiner@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-26 16:40:32 +02:00
Joerg Roedel	3bc9f79ee1	iommu: add iommu_num_pages helper function Calculating the number of pages from given address and length numbers is a task required in multiple IOMMU implementations. So implement this as a generic function into the IOMMU helper code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: iommu@lists.linux-foundation.org Cc: bhavna.sarathy@amd.com Cc: robert.richter@amd.com Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-26 15:43:05 +02:00
Ingo Molnar	36ac26171a	crashdump: fix undefined reference to `elfcorehdr_addr' fix build bug introduced by `95b68dec0d` "calgary iommu: use the first kernels TCE tables in kdump": arch/x86/kernel/built-in.o: In function `calgary_iommu_init': (.init.text+0x8399): undefined reference to `elfcorehdr_addr' arch/x86/kernel/built-in.o: In function `calgary_iommu_init': (.init.text+0x856c): undefined reference to `elfcorehdr_addr' arch/x86/kernel/built-in.o: In function `detect_calgary': (.init.text+0x8c68): undefined reference to `elfcorehdr_addr' arch/x86/kernel/built-in.o: In function `detect_calgary': (.init.text+0x8d0c): undefined reference to `elfcorehdr_addr' make elfcorehdr_addr a generally available symbol. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-26 11:26:23 +02:00
Ilpo Järvinen	ec34c702ca	net: drop unused BUG_TRAP() Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-25 21:45:49 -07:00
Grant Likely	284b018973	spi: Add OF binding support for SPI busses This patch adds support for populating an SPI bus based on data in the OF device tree. This is useful for powerpc platforms which use the device tree instead of discrete code for describing platform layout. Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2008-07-25 22:34:40 -04:00
Grant Likely	dc87c98e8f	spi: split up spi_new_device() to allow two stage registration. spi_new_device() allocates and registers an spi device all in one swoop. If the driver needs to add extra data to the spi_device before it is registered, then this causes problems. This is needed for OF device tree support so that the SPI device tree helper can add a pointer to the device node after the device is allocated, but before the device is registered. OF aware SPI devices can then retrieve data out of the device node to populate a platform data structure. This patch splits the allocation and registration portions of code out of spi_new_device() and creates two new functions; spi_alloc_device() and spi_register_device(). spi_new_device() is modified to use the new functions for allocation and registration. None of the existing users of spi_new_device() should be affected by this change. Drivers using the new API can forego the use of spi_board_info structure to describe the device layout and populate data into the spi_device structure directly. This change is in preparation for adding an OF device tree parser to generate spi_devices based on data in the device tree. Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: David Brownell <dbrownell@users.sourceforge.net>	2008-07-25 22:34:29 -04:00
Grant Likely	3f07af494d	of: adapt of_find_i2c_driver() to be usable by SPI also SPI has a similar problem as I2C in that it needs to determine an appropriate modalias value for each device node. This patch adapts the of_i2c of_find_i2c_driver() function to be usable by of_spi also. Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2008-07-25 22:25:13 -04:00
Harvey Harrison	b4615e69b6	sys_paccept definition missing __user annotation Introduced by commit `aaca0bdca5` ("flag parameters: paccept"): net/socket.c:1515:17: error: symbol 'sys_paccept' redeclared with different type (originally declared at include/linux/syscalls.h:413) - incompatible argument 4 (different address spaces) Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 17:28:49 -07:00
Linus Torvalds	ff5d48a6d1	Merge git://git.infradead.org/embedded-2.6 * git://git.infradead.org/embedded-2.6: Make console charset translation optional	2008-07-25 12:02:08 -07:00
Linus Torvalds	762b8291be	Merge git://git.infradead.org/~dwmw2/random-2.6 * git://git.infradead.org/~dwmw2/random-2.6: remove dummy asm/kvm.h files firmware: create firmware binaries during 'make modules'.	2008-07-25 12:01:37 -07:00
Johannes Weiner	c6af5e9f8a	bootmem: Move node allocation macros back to !HAVE_ARCH_BOOTMEM_NODE These got unintentionally moved, put them back as x86 provides its own versions. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 11:36:44 -07:00
Adrian Bunk	7dcf2a9fce	remove dummy asm/kvm.h files This patch removes the dummy asm/kvm.h files on architectures not (yet) supporting KVM and uses the same conditional headers installation as already used for a.out.h . Also removed are superfluous install rules in the s390 and x86 Kbuild files (they are already in Kbuild.asm). Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2008-07-25 14:35:50 -04:00
Linus Torvalds	5047887caf	Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (34 commits) powerpc: Wireup new syscalls Move update_mmu_cache() declaration from tlbflush.h to pgtable.h powerpc/pseries: Remove kmalloc call in handling writes to lparcfg powerpc/pseries: Update arch vector to indicate support for CMO ibmvfc: Add support for collaborative memory overcommit ibmvscsi: driver enablement for CMO ibmveth: enable driver for CMO ibmveth: Automatically enable larger rx buffer pools for larger mtu powerpc/pseries: Verify CMO memory entitlement updates with virtual I/O powerpc/pseries: vio bus support for CMO powerpc/pseries: iommu enablement for CMO powerpc/pseries: Add CMO paging statistics powerpc/pseries: Add collaborative memory manager powerpc/pseries: Utilities to set firmware page state powerpc/pseries: Enable CMO feature during platform setup powerpc/pseries: Split retrieval of processor entitlement data into a helper routine powerpc/pseries: Add memory entitlement capabilities to /proc/ppc64/lparcfg powerpc/pseries: Split processor entitlement retrieval and gathering to helper routines powerpc/pseries: Remove extraneous error reporting for hcall failures in lparcfg powerpc: Fix compile error with binutils 2.15 ... Fixed up conflict in arch/powerpc/platforms/52xx/Kconfig manually.	2008-07-25 11:08:17 -07:00
Linus Torvalds	996abf053e	Merge branch 'linux-next' of git://git.infradead.org/~dedekind/ubi-2.6 * 'linux-next' of git://git.infradead.org/~dedekind/ubi-2.6: (22 commits) UBI: always start the background thread UBI: fix gcc warning UBI: remove pre-sqnum images support UBI: fix kernel-doc errors and warnings UBI: fix checkpatch.pl errors and warnings UBI: bugfix - do not torture PEB needlessly UBI: rework scrubbing messages UBI: implement multiple volumes rename UBI: fix and re-work debugging stuff UBI: amend commentaries UBI: fix error message UBI: improve mkvol request validation UBI: add ubi_sync() interface UBI: fix 64-bit calculations UBI: fix LEB locking UBI: fix memory leak on error path UBI: do not forget to free internal volumes UBI: fix memory leak UBI: avoid unnecessary division operations UBI: fix buffer padding ...	2008-07-25 11:02:17 -07:00
Arthur Jones	8f421c595a	edac: i5100 new intel chipset driver Preliminary support for the Intel 5100 MCH. CE and UE errors are reported along with the current DIMM label information and other memory parameters. Reasons why this is preliminary: 1) This chip has 2 independent memory controllers which, for best perforance, use interleaved accesses to the DDR2 memory. This architecture does not map very well to the current edac data structures which depend on symmetric channel access to the interleaved data. Without core changes, the best I could do for now is to map both memory controllers to different csrows (first all ranks of controller 0, then all ranks of controller 1). Someone much more familiar with the edac core than I will probably need to come up with a more general data structure to handle the interleaving and de-interleaving of the two memory controllers. 2) I have not yet tackled the de-interleaving of the rank/controller address space into the physical address space of the CPU. There is nothing fundamentally missing, it is just ending up to be a lot of code, and I'd rather keep it separate for now, esp since it doesn't work yet... 3) The code depends on a particular i5100 chip select to DIMM mainboard chip select mapping. This mapping seems obvious to me in order to support dual and single ranked memory, but it is not unique and DIMM labels could be wrong on other mainboards. There is no way to query this mapping that I know of. 4) The code requires that the i5100 is in 32GB mode. Only 4 ranks per controller, 2 ranks per DIMM are supported. I do not have hardware (nor do I expect to have hardware anytime soon) for the 48GB (6 ranks per controller) mode. 5) The serial presence detect code should be broken out into a "real" i2c driver so that decode-dimms.pl can work. Signed-off-by: Arthur Jones <ajones@riverbed.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:48 -07:00
Miklos Szeredi	33670fa296	fuse: nfs export special lookups Implement the get_parent export operation by sending a LOOKUP request with ".." as the name. Implement looking up an inode by node ID after it has been evicted from the cache. This is done by seding a LOOKUP request with "." as the name (for all file types, not just directories). The filesystem can set the FUSE_EXPORT_SUPPORT flag in the INIT reply, to indicate that it supports these special lookups. Thanks to John Muir for the original implementation of this feature. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Matthew Wilcox <matthew@wil.cx> Cc: David Teigland <teigland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:48 -07:00
Miklos Szeredi	bde74e4bc6	locks: add special return value for asynchronous locks Use a special error value FILE_LOCK_DEFERRED to mean that a locking operation returned asynchronously. This is returned by posix_lock_file() for sleeping locks to mean that the lock has been queued on the block list, and will be woken up when it might become available and needs to be retried (either fl_lmops->fl_notify() is called or fl_wait is woken up). f_op->lock() to mean either the above, or that the filesystem will call back with fl_lmops->fl_grant() when the result of the locking operation is known. The filesystem can do this for sleeping as well as non-sleeping locks. This is to make sure, that return values of -EAGAIN and -EINPROGRESS by filesystems are not mistaken to mean an asynchronous locking. This also makes error handling in fs/locks.c and lockd/svclock.c slightly cleaner. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Matthew Wilcox <matthew@wil.cx> Cc: David Teigland <teigland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:47 -07:00
Keika Kobayashi	016ae219b9	per-task-delay-accounting: update taskstats for memory reclaim delay Add members for memory reclaim delay to taskstats, and accumulate them in __delayacct_add_tsk() . Signed-off-by: Keika Kobayashi <kobayashi.kk@ncos.nec.co.jp> Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:47 -07:00
Keika Kobayashi	873b477177	per-task-delay-accounting: add memory reclaim delay Sometimes, application responses become bad under heavy memory load. Applications take a bit time to reclaim memory. The statistics, how long memory reclaim takes, will be useful to measure memory usage. This patch adds accounting memory reclaim to per-task-delay-accounting for accounting the time of do_try_to_free_pages(). <i.e> - When System is under low memory load, memory reclaim may not occur. $ free total used free shared buffers cached Mem: 8197800 1577300 6620500 0 4808 1516724 -/+ buffers/cache: 55768 8142032 Swap: 16386292 0 16386292 $ vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 5069748 10612 3014060 0 0 0 0 3 26 0 0 100 0 0 0 0 5069748 10612 3014060 0 0 0 0 4 22 0 0 100 0 0 0 0 5069748 10612 3014060 0 0 0 0 3 18 0 0 100 0 Measure the time of tar command. $ ls -s test.dat 1501472 test.dat $ time tar cvf test.tar test.dat real 0m13.388s user 0m0.116s sys 0m5.304s $ ./delayget -d -p <pid> CPU count real total virtual total delay total 428 5528345500 5477116080 62749891 IO count delay total 338 8078977189 SWAP count delay total 0 0 RECLAIM count delay total 0 0 - When system is under heavy memory load memory reclaim may occur. $ vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 7159032 49724 1812 3012 0 0 0 0 3 24 0 0 100 0 0 0 7159032 49724 1812 3012 0 0 0 0 4 24 0 0 100 0 0 0 7159032 49848 1812 3012 0 0 0 0 3 22 0 0 100 0 In this case, one process uses more 8G memory by execution of malloc() and memset(). $ time tar cvf test.tar test.dat real 1m38.563s <- increased by 85 sec user 0m0.140s sys 0m7.060s $ ./delayget -d -p <pid> CPU count real total virtual total delay total 9021 7140446250 7315277975 923201824 IO count delay total 8965 90466349669 SWAP count delay total 3 21036367 RECLAIM count delay total 740 61011951153 In the later case, the value of RECLAIM is increasing. So, taskstats can show how much memory reclaim influences TAT. Signed-off-by: Keika Kobayashi <kobayashi.kk@ncos.nec.co.jp> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujistu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:47 -07:00
Andrea Righi	297c5d9263	task IO accounting: provide distinct tgid/tid I/O statistics Report per-thread I/O statistics in /proc/pid/task/tid/io and aggregate parent I/O statistics in /proc/pid/io. This approach follows the same model used to account per-process and per-thread CPU times. As a practial application, this allows for example to quickly find the top I/O consumer when a process spawns many child threads that perform the actual I/O work, because the aggregated I/O statistics can always be found in /proc/pid/io. [ Oleg Nesterov points out that we should check that the task is still alive before we iterate over the threads, but also says that we can do that fixup on top of this later. - Linus ] Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Cc: Matt Heaton <matt@hostmonster.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Acked-by-with-comments: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:47 -07:00
Pavel Emelyanov	0b6b030fc3	bsdacct: switch from global bsd_acct_struct instance to per-pidns one Allocate the structure on the first call to sys_acct(). After this each namespace, that ordered the accounting, will live with this structure till its own death. Two notes - routines, that close the accounting on fs umount time use the init_pid_ns's acct by now; - accounting routine accounts to dying task's namespace (also by now). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Balbir Singh <balbir@in.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:47 -07:00
Pavel Emelyanov	20fad13ac6	pidns: add the struct bsd_acct_struct pointer on pid_namespace struct All the bsdacct-related info will be stored in the area, pointer by this one. It will be NULL automatically for all new namespaces. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Balbir Singh <balbir@in.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:46 -07:00
Jonathan Lim	49b5cf3472	accounting: account for user time when updating memory integrals Adapt acct_update_integrals() to include user time when calculating the time difference. The units of acct_rss_mem1 and acct_vm_mem1 are also changed from pages-jiffies to pages-usecs to avoid calling jiffies_to_usecs() in xacct_add_tsk() which might overflow. Signed-off-by: Jonathan Lim <jlim@sgi.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:46 -07:00
Pavel Emelyanov	dbda0de526	pidns: remove find_task_by_pid, unused for a long time It seems to me that it was a mistake marking this function as deprecated and scheduling it for removal, rather than resolutely removing it after the last caller's death. Anyway - better late, then never. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:45 -07:00
Pavel Emelyanov	e49859e71e	pidns: remove now unused find_pid function. This one had the only users so far - the kill_proc, which is removed, so drop this (invalid in namespaced world) call too. And of course - erase all references on it from comments. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:45 -07:00
Pavel Emelyanov	19b0cfcca4	pidns: remove now unused kill_proc function This function operated on a pid_t to kill a task, which is no longer valid in a containerized system. It has finally lost all its users and we can safely remove it from the tree. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:45 -07:00
Richard Kennedy	33166b1ffc	shrink struct pid by removing padding on 64 bit builds When struct pid is built on a 64 bit platform gcc has to insert padding to maintain the correct alignment, by simply reordering its members the memory usage shrinks from 88 bytes to 80. I've successfully run with this patch on my desktop AMD64 machine. There are no significant kernel size changes to a default config.X86_64 on the latest git v2.6.26-rc1 text data bss dec hex filename 5404828 976760 734280 7115868 6c945c vmlinux 5404811 976760 734280 7115851 6c944b vmlinux.pid-patch Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:45 -07:00
Adrian Bunk	3ae4eed34b	proper pid{hash,map}_init() prototypes This patch adds proper prototypes for pid{hash,map}_init() in include/linux/pid_namespace.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:45 -07:00
Alexey Dobriyan	881adb8535	proc: always do ->release Current two-stage scheme of removing PDE emphasizes one bug in proc: open rmmod remove_proc_entry close ->release won't be called because ->proc_fops were cleared. In simple cases it's small memory leak. For every ->open, ->release has to be done. List of openers is introduced which is traversed at remove_proc_entry() if neeeded. Discussions with Al long ago (sigh). Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:44 -07:00
Adrian Bunk	6e644c3126	move proc_kmsg_operations to fs/proc/internal.h This patch moves the extern of struct proc_kmsg_operations to fs/proc/internal.h and adds an #include "internal.h" to fs/proc/kmsg.c so that the latter sees the former. Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:44 -07:00
Abdel Benamrouche	d805dda412	fs/partition/check.c: fix return value warning fs/partitions/check.c:381: warning: ignoring return value of ___device_add___, declared with attribute warn_unused_result [akpm@linux-foundation.org: multiple-return-statements-per-function are evil] Signed-off-by: Abdel Benamrouche <draconux@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:44 -07:00
Nadia Derbey	9eefe520c8	ipc: do not use a negative value to re-enable msgmni automatic recomputing This patch proposes an alternative to the "magical positive-versus-negative number trick" Andrew complained about last week in http://lkml.org/lkml/2008/6/24/418. This had been introduced with the patches that scale msgmni to the amount of lowmem. With these patches, msgmni has a registered notification routine that recomputes msgmni value upon memory add/remove or ipc namespace creation/ removal. When msgmni is changed from user space (i.e. value written to the proc file), that notification routine is unregistered, and the way to make it registered back is to write a negative value into the proc file. This is the "magical positive-versus-negative number trick". To fix this, a new proc file is introduced: /proc/sys/kernel/auto_msgmni. This file acts as ON/OFF for msgmni automatic recomputing. With this patch, the process is the following: 1) kernel boots in "automatic recomputing mode" /proc/sys/kernel/msgmni contains the value that has been computed (depends on lowmem) /proc/sys/kernel/automatic_msgmni contains "1" 2) echo <val> > /proc/sys/kernel/msgmni . sets msg_ctlmni to <val> . de-activates automatic recomputing (i.e. if, say, some memory is added msgmni won't be recomputed anymore) . /proc/sys/kernel/automatic_msgmni now contains "0" 3) echo "0" > /proc/sys/kernel/automatic_msgmni . de-activates msgmni automatic recomputing this has the same effect as 2) except that msg_ctlmni's value stays blocked at its current value) 3) echo "1" > /proc/sys/kernel/automatic_msgmni . recomputes msgmni's value based on the current available memory size and number of ipc namespaces . re-activates automatic recomputing for msgmni. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:42 -07:00
Manfred Spraul	380af1b33b	ipc/sem.c: rewrite undo list locking The attached patch: - reverses the locking order of ulp->lock and sem_lock: Previously, it was first ulp->lock, then inside sem_lock. Now it's the other way around. - converts the undo structure to rcu. Benefits: - With the old locking order, IPC_RMID could not kfree the undo structures. The stale entries remained in the linked lists and were released later. - The patch fixes a a race in semtimedop(): if both IPC_RMID and a semget() that recreates exactly the same id happen between find_alloc_undo() and sem_lock, then semtimedop() would access already kfree'd memory. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Reviewed-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:42 -07:00
Manfred Spraul	a1193f8ec0	ipc/sem.c: convert sem_array.sem_pending to struct list_head sem_array.sem_pending is a double linked list, the attached patch converts it to struct list_head. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Reviewed-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:42 -07:00
Manfred Spraul	2c0c29d414	ipc/sem.c: remove unused entries from struct sem_queue sem_queue.sma and sem_queue.id were never used, the attached patch removes them. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Reviewed-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:42 -07:00
Manfred Spraul	4daa28f6d8	ipc/sem.c: convert undo structures to struct list_head The undo structures contain two linked lists, the attached patch replaces them with generic struct list_head lists. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:42 -07:00
Nadia Derbey	f9c46d6ea5	idr: make idr_find rcu-safe Make idr_find rcu-safe: it can now be called inside an rcu_read critical section. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Jim Houston <jim.houston@comcast.net> Cc: Pierre Peiffer <peifferp@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:42 -07:00
Nadia Derbey	944ca05c7b	idr: error checking factorization Do some code factorization in the return code analysis. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Jim Houston <jim.houston@comcast.net> Cc: Pierre Peiffer <peifferp@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:41 -07:00
Nadia Derbey	2027d1abc2	idr: change the idr structure After scalability problems have been detected when using the sysV ipcs, I have proposed to use an RCU based implementation of the IDR api instead (see threads http://lkml.org/lkml/2008/4/11/212 and http://lkml.org/lkml/2008/4/29/295). This resulted in many people asking to convert the idr API and make it rcu safe (because most of the code was duplicated and thus unmaintanable and unreviewable). So here is a first attempt. The important change wrt to the idr API itself is during idr removes: idr layers are freed after a grace period, instead of being moved to the free list. The important change wrt to ipcs, is that idr_find() can now be called locklessly inside a rcu read critical section. Here are the results I've got for the pmsg test sent by Manfred: 2.6.25-rc3-mm1 2.6.25-rc3-mm1+ 2.6.25-mm1 Patched 2.6.25-mm1 1 1168441 1064021 876000 947488 2 1094264 921059 1549592 1730685 3 2082520 1738165 1694370 2324880 4 2079929 1695521 404553 2400408 5 2898758 406566 391283 3246580 6 2921417 261275 263249 3752148 7 3308761 126056 191742 4243142 8 `3329456` 100129 141722 4275780 1st column: stock 2.6.25-rc3-mm1 2nd column: 2.6.25-rc3-mm1 + ipc patches (store ipcs into idrs) 3nd column: stock 2.6.25-mm1 4th column: 2.6.25-mm1 + this pacth series. This patch: Add an rcu_head to the idr_layer structure in order to free it after a grace period. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Jim Houston <jim.houston@comcast.net> Cc: Pierre Peiffer <peifferp@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:41 -07:00
Chandru	95b68dec0d	calgary iommu: use the first kernels TCE tables in kdump kdump kernel fails to boot with calgary iommu and aacraid driver on a x366 box. The ongoing dma's of aacraid from the first kernel continue to exist until the driver is loaded in the kdump kernel. Calgary is initialized prior to aacraid and creation of new tce tables causes wrong dma's to occur. Here we try to get the tce tables of the first kernel in kdump kernel and use them. While in the kdump kernel we do not allocate new tce tables but instead read the base address register contents of calgary iommu and use the tables that the registers point to. With these changes the kdump kernel and hence aacraid now boots normally. Signed-off-by: Chandru Siddalingappa <chandru@in.ibm.com> Acked-by: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:41 -07:00
Oleg Nesterov	3da1c84c00	workqueues: make get_online_cpus() useable for work->func() workqueue_cpu_callback(CPU_DEAD) flushes cwq->thread under cpu_maps_update_begin(). This means that the multithreaded workqueues can't use get_online_cpus() due to the possible deadlock, very bad and very old problem. Introduce the new state, CPU_POST_DEAD, which is called after cpu_hotplug_done() but before cpu_maps_update_done(). Change workqueue_cpu_callback() to use CPU_POST_DEAD instead of CPU_DEAD. This means that create/destroy functions can't rely on get_online_cpus() any longer and should take cpu_add_remove_lock instead. [akpm@linux-foundation.org: fix CONFIG_SMP=n] Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:40 -07:00
Oleg Nesterov	db70089722	workqueues: implement flush_work() Most of users of flush_workqueue() can be changed to use cancel_work_sync(), but sometimes we really need to wait for the completion and cancelling is not an option. schedule_on_each_cpu() is good example. Add the new helper, flush_work(work), which waits for the completion of the specific work_struct. More precisely, it "flushes" the result of of the last queue_work() which is visible to the caller. For example, this code queue_work(wq, work); /* WINDOW / queue_work(wq, work); flush_work(work); doesn't necessary work "as expected". What can happen in the WINDOW above is - wq starts the execution of work->func() - the caller migrates to another CPU now, after the 2nd queue_work() this work is active on the previous CPU, and at the same time it is queued on another. In this case flush_work(work) may return before the first work->func() completes. It is trivial to add another helper int flush_work_sync(struct work_struct work) { return flush_work(work) \|\| wait_on_work(work); } which works "more correctly", but it has to iterate over all CPUs and thus it much slower than flush_work(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Max Krasnyansky <maxk@qualcomm.com> Acked-by: Jarek Poplawski <jarkao2@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:40 -07:00
Oleg Nesterov	a94e2d408e	coredump: kill mm->core_done Now that we have core_state->dumper list we can use it to wake up the sub-threads waiting for the coredump completion. This uglifies the code and .text grows by 47 bytes, but otoh mm_struct lessens by sizeof(struct completion). Also, with this change we can decouple exit_mm() from the coredumping code. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:40 -07:00
Oleg Nesterov	b564daf806	coredump: construct the list of coredumping threads at startup time binfmt->core_dump() has to iterate over the all threads in system in order to find the coredumping threads and construct the list using the GFP_ATOMIC allocations. With this patch each thread allocates the list node on exit_mm()'s stack and adds itself to the list. This allows us to do further changes: - simplify ->core_dump() - change exit_mm() to clear ->mm first, then wait for ->core_done. this makes the coredumping process visible to oom_kill - kill mm->core_done Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:40 -07:00
Oleg Nesterov	c5f1cc8c18	coredump: turn core_state->nr_threads into atomic_t Turn core_state->nr_threads into atomic_t and kill now unneeded down_write(&mm->mmap_sem) in exit_mm(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:39 -07:00
Oleg Nesterov	999d9fc167	coredump: move mm->core_waiters into struct core_state Move mm->core_waiters into "struct core_state" allocated on stack. This shrinks mm_struct a little bit and allows further changes. This patch mostly does s/core_waiters/core_state. The only essential change is that coredump_wait() must clear mm->core_state before return. The coredump_wait()'s path is uglified and .text grows by 30 bytes, this is fixed by the next patch. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:39 -07:00
Oleg Nesterov	32ecb1f26d	coredump: turn mm->core_startup_done into the pointer to struct core_state mm->core_startup_done points to "struct completion startup_done" allocated on the coredump_wait()'s stack. Introduce the new structure, core_state, which holds this "struct completion". This way we can add more info visible to the threads participating in coredump without enlarging mm_struct. No changes in affected .o files. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:39 -07:00
Oleg Nesterov	246bb0b1de	kill PF_BORROWED_MM in favour of PF_KTHREAD Kill PF_BORROWED_MM. Change use_mm/unuse_mm to not play with ->flags, and do s/PF_BORROWED_MM/PF_KTHREAD/ for a couple of other users. No functional changes yet. But this allows us to do further fixes/cleanups. oom_kill/ptrace/etc often check "p->mm != NULL" to filter out the kthreads, this is wrong because of use_mm(). The problem with PF_BORROWED_MM is that we need task_lock() to avoid races. With this patch we can check PF_KTHREAD directly, or use a simple lockless helper: /* The result must not be dereferenced !!! / struct mm_struct __get_task_mm(struct task_struct *tsk) { if (tsk->flags & PF_KTHREAD) return NULL; return tsk->mm; } Note also ecard_task(). It runs with ->mm != NULL, but it's the kernel thread without PF_BORROWED_MM. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:39 -07:00
Oleg Nesterov	7b34e4283c	introduce PF_KTHREAD flag Introduce the new PF_KTHREAD flag to mark the kernel threads. It is set by INIT_TASK() and copied to the forked childs (we could set it in kthreadd() along with PF_NOFREEZE instead). daemonize() was changed as well. In that case testing of PF_KTHREAD is racy, but daemonize() is hopeless anyway. This flag is cleared in do_execve(), before search_binary_handler(). Probably not the best place, we can do this in exec_mmap() or in start_thread(), or clear it along with PF_FORKNOEXEC. But I think this doesn't matter in practice, and if do_execve() fails kthread should die soon. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:39 -07:00
Oleg Nesterov	364d3c13c1	ptrace: give more respect to SIGKILL ptrace_stop() has some complicated checks to prevent the scheduling in the TASK_TRACED state with the pending SIGKILL, but these checks are racy, and they depend on arch_ptrace_stop_needed(). This patch assumes that the traced task should die asap if it was killed by SIGKILL, in that case schedule()->signal_pending_state() has no reason to ignore the TASK_WAKEKILL part of TASK_TRACED, and we can kill this nasty special case. Note: do_exit()->ptrace_notify() is special, the killed task can already dequeue SIGKILL at this point. Another indication that fatal_signal_pending() is not exactly right. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:39 -07:00
KAMEZAWA Hiroyuki	12b9804419	res_counter: limit change support ebusy Add an interface to set limit. This is necessary to memory resource controller because it shrinks usage at set limit. Other controllers may not need this interface to shrink usage because shrinking is not necessary or impossible. Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:37 -07:00
KAMEZAWA Hiroyuki	c9b0ed5148	memcg: helper function for relcaim from shmem. A new call, mem_cgroup_shrink_usage() is added for shmem handling and relacing non-standard usage of mem_cgroup_charge/uncharge. Now, shmem calls mem_cgroup_charge() just for reclaim some pages from mem_cgroup. In general, shmem is used by some process group and not for global resource (like file caches). So, it's reasonable to reclaim pages from mem_cgroup where shmem is mainly used. [hugh@veritas.com: shmem_getpage release page sooner] [hugh@veritas.com: mem_cgroup_shrink_usage css_put] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Cc: Paul Menage <menage@google.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:37 -07:00
KAMEZAWA Hiroyuki	69029cd550	memcg: remove refcnt from page_cgroup memcg: performance improvements Patch Description 1/5 ... remove refcnt fron page_cgroup patch (shmem handling is fixed) 2/5 ... swapcache handling patch 3/5 ... add helper function for shmem's memory reclaim patch 4/5 ... optimize by likely/unlikely ppatch 5/5 ... remove redundunt check patch (shmem handling is fixed.) Unix bench result. == 2.6.26-rc2-mm1 + memory resource controller Execl Throughput 2915.4 lps (29.6 secs, 3 samples) C Compiler Throughput 1019.3 lpm (60.0 secs, 3 samples) Shell Scripts (1 concurrent) 5796.0 lpm (60.0 secs, 3 samples) Shell Scripts (8 concurrent) 1097.7 lpm (60.0 secs, 3 samples) Shell Scripts (16 concurrent) 565.3 lpm (60.0 secs, 3 samples) File Read 1024 bufsize 2000 maxblocks 1022128.0 KBps (30.0 secs, 3 samples) File Write 1024 bufsize 2000 maxblocks 544057.0 KBps (30.0 secs, 3 samples) File Copy 1024 bufsize 2000 maxblocks 346481.0 KBps (30.0 secs, 3 samples) File Read 256 bufsize 500 maxblocks 319325.0 KBps (30.0 secs, 3 samples) File Write 256 bufsize 500 maxblocks 148788.0 KBps (30.0 secs, 3 samples) File Copy 256 bufsize 500 maxblocks 99051.0 KBps (30.0 secs, 3 samples) File Read 4096 bufsize 8000 maxblocks 2058917.0 KBps (30.0 secs, 3 samples) File Write 4096 bufsize 8000 maxblocks 1606109.0 KBps (30.0 secs, 3 samples) File Copy 4096 bufsize 8000 maxblocks 854789.0 KBps (30.0 secs, 3 samples) Dc: sqrt(2) to 99 decimal places 126145.2 lpm (30.0 secs, 3 samples) INDEX VALUES TEST BASELINE RESULT INDEX Execl Throughput 43.0 2915.4 678.0 File Copy 1024 bufsize 2000 maxblocks 3960.0 346481.0 875.0 File Copy 256 bufsize 500 maxblocks 1655.0 99051.0 598.5 File Copy 4096 bufsize 8000 maxblocks 5800.0 854789.0 1473.8 Shell Scripts (8 concurrent) 6.0 1097.7 1829.5 ========= FINAL SCORE 991.3 == 2.6.26-rc2-mm1 + this set == Execl Throughput 3012.9 lps (29.9 secs, 3 samples) C Compiler Throughput 981.0 lpm (60.0 secs, 3 samples) Shell Scripts (1 concurrent) 5872.0 lpm (60.0 secs, 3 samples) Shell Scripts (8 concurrent) 1120.3 lpm (60.0 secs, 3 samples) Shell Scripts (16 concurrent) 578.0 lpm (60.0 secs, 3 samples) File Read 1024 bufsize 2000 maxblocks 1003993.0 KBps (30.0 secs, 3 samples) File Write 1024 bufsize 2000 maxblocks 550452.0 KBps (30.0 secs, 3 samples) File Copy 1024 bufsize 2000 maxblocks 347159.0 KBps (30.0 secs, 3 samples) File Read 256 bufsize 500 maxblocks 314644.0 KBps (30.0 secs, 3 samples) File Write 256 bufsize 500 maxblocks 151852.0 KBps (30.0 secs, 3 samples) File Copy 256 bufsize 500 maxblocks 101000.0 KBps (30.0 secs, 3 samples) File Read 4096 bufsize 8000 maxblocks 2033256.0 KBps (30.0 secs, 3 samples) File Write 4096 bufsize 8000 maxblocks 1611814.0 KBps (30.0 secs, 3 samples) File Copy 4096 bufsize 8000 maxblocks 847979.0 KBps (30.0 secs, 3 samples) Dc: sqrt(2) to 99 decimal places 128148.7 lpm (30.0 secs, 3 samples) INDEX VALUES TEST BASELINE RESULT INDEX Execl Throughput 43.0 3012.9 700.7 File Copy 1024 bufsize 2000 maxblocks 3960.0 347159.0 876.7 File Copy 256 bufsize 500 maxblocks 1655.0 101000.0 610.3 File Copy 4096 bufsize 8000 maxblocks 5800.0 847979.0 1462.0 Shell Scripts (8 concurrent) 6.0 1120.3 1867.2 ========= FINAL SCORE 1004.6 This patch: Remove refcnt from page_cgroup(). After this, * A page is charged only when !page_mapped() && no page_cgroup is assigned. * Anon page is newly mapped. * File page is added to mapping->tree. * A page is uncharged only when * Anon page is fully unmapped. * File page is removed from LRU. There is no change in behavior from user's view. This patch also removes unnecessary calls in rmap.c which was used only for refcnt mangement. [akpm@linux-foundation.org: fix warning] [hugh@veritas.com: fix shmem_unuse_inode charging] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Cc: Paul Menage <menage@google.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:37 -07:00
KAMEZAWA Hiroyuki	e8589cc189	memcg: better migration handling This patch changes page migration under memory controller to use a different algorithm. (thanks to Christoph for new idea.) Before: - page_cgroup is migrated from an old page to a new page. After: - a new page is accounted , no reuse of page_cgroup. Pros: - We can avoid compliated lock depndencies and races in migration. Cons: - new param to mem_cgroup_charge_common(). - mem_cgroup_getref() is added for handling ref_cnt ping-pong. This version simplifies complicated lock dependency in page migraiton under memory resource controller. new refcnt sequence is following. a mapped page: prepage_migration() ..... +1 to NEW page try_to_unmap() ..... all refs to OLD page is gone. move_pages() ..... +1 to NEW page if page cache. remap... ..... all refs from map is added to NEW one. end_migration() ..... -1 to New page. page's mapcount + (page_is_cache) refs are added to NEW one. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Cc: Hugh Dickins <hugh@veritas.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:37 -07:00
Serge E. Hallyn	e885dcde75	cgroup_clone: use pid of newly created task for new cgroup cgroup_clone creates a new cgroup with the pid of the task. This works correctly for unshare, but for clone cgroup_clone is called from copy_namespaces inside copy_process, which happens before the new pid is created. As a result, the new cgroup was created with current's pid. This patch: 1. Moves the call inside copy_process to after the new pid is created 2. Passes the struct pid into ns_cgroup_clone (as it is not yet attached to the task) 3. Passes a name from ns_cgroup_clone() into cgroup_clone() so as to keep cgroup_clone() itself simpler 4. Uses pid_vnr() to get the process id value, so that the pid used to name the new cgroup is always the pid as it would be known to the task which did the cloning or unsharing. I think that is the most intuitive thing to do. This way, task t1 does clone(CLONE_NEWPID) to get t2, which does clone(CLONE_NEWPID) to get t3, then the cgroup for t3 will be named for the pid by which t2 knows t3. (Thanks to Dan Smith for finding the main bug) Changelog: June 11: Incorporate Paul Menage's feedback: don't pass NULL to ns_cgroup_clone from unshare, and reduce patch size by using 'nodename' in cgroup_clone. June 10: Original version [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Serge Hallyn <serge@us.ibm.com> Acked-by: Paul Menage <menage@google.com> Tested-by: Dan Smith <danms@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:37 -07:00
Paul Menage	856c13aa1f	cgroup files: convert res_counter_write() to be a cgroups write_string() handler Currently res_counter_write() is a raw file handler even though it's ultimately taking a number, since in some cases it wants to pre-process the string when converting it to a number. This patch converts res_counter_write() from a raw file handler to a write_string() handler; this allows some of the boilerplate copying/locking/checking to be removed, and simplies the cleanup path, since these functions are now performed by the cgroups framework. [lizf@cn.fujitsu.com: build fix] Signed-off-by: Paul Menage <menage@google.com> Cc: Paul Jackson <pj@sgi.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:36 -07:00
Paul Menage	84eea84288	cgroups: misc cleanups to write_string patchset This patch contains cleanups suggested by reviewers for the recent write_string() patchset: - pair cgroup_lock_live_group() with cgroup_unlock() in cgroup.c for clarity, rather than directly unlocking cgroup_mutex. - make the return type of cgroup_lock_live_group() a bool - use a #define'd constant for the local buffer size in read/write functions Signed-off-by: Paul Menage <menage@google.com> Cc: Paul Jackson <pj@sgi.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Balbir Singh <balbir@in.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:35 -07:00
Paul Menage	e788e066c6	cgroup files: move the release_agent file to use typed handlers Adds cgroup_release_agent_write() and cgroup_release_agent_show() methods to handle writing/reading the path to a cgroup hierarchy's release agent. As a result, cgroup_common_file_read() is now unnecessary. As part of the change, a previously-tolerated race in cgroup_release_agent() is avoided by copying the current release_agent_path prior to calling call_usermode_helper(). Signed-off-by: Paul Menage <menage@google.com> Cc: Paul Jackson <pj@sgi.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Balbir Singh <balbir@in.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:35 -07:00
Paul Menage	db3b14978a	cgroup files: add write_string cgroup control file method This patch adds a write_string() method for cgroups control files. The semantics are that a buffer is copied from userspace to kernelspace and the handler function invoked on that buffer. The buffer is guaranteed to be nul-terminated, and no longer than max_write_len (defaulting to 64 bytes if unspecified). Later patches will convert existing raw file write handlers in control group subsystems to use this method. Signed-off-by: Paul Menage <menage@google.com> Cc: Paul Jackson <pj@sgi.com> Cc: Pavel Emelyanov <xemul@openvz.org> Acked-by: Balbir Singh <balbir@in.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:35 -07:00
Paul Menage	ce16b49d37	cgroup files: clean up whitespace in struct cftype This patch removes some extraneous spaces from method declarations in struct cftype, to fit in with conventional kernel style. Signed-off-by: Paul Menage <menage@google.com> Cc: Paul Jackson <pj@sgi.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:35 -07:00
Pavel Emelyanov	f2992db2a4	Mark res_counter_charge(_locked) with __must_check Ignoring their return values may result in counter underflow in the future - when the value charged will be uncharged (or in "leaks" - when the value is not uncharged). This also prevents from using charging routines to decrement the counter value (i.e. uncharge it) ;) (Current code works OK with res_counter, however :) ) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Paul Menage <menage@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:35 -07:00
Jan Kara	657d3bfa98	quota: implement sending information via netlink about user below quota Sometimes it may be useful for userspace to know (e.g. for some hosting guys) that some user stopped exceeding his hardlimit or softlimit in quotas. Implement sending of such events to userspace via quota netlink protocol so that they don't have to poll for such events. Based on idea and initial implementation by Vladislav Bogdanov. Cc: Vladislav Bogdanov <slava@nsys.by> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:35 -07:00
Jan Kara	03b063436c	quota: convert macros to inline functions Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:35 -07:00
Jan Kara	74abb9890d	quota: move function-macros from quota.h to quotaops.h Move declarations of some macros, which should be in fact functions to quotaops.h. This way they can be later converted to inline functions because we can now use declarations from quota.h. Also add necessary includes of quotaops.h to a few files. [akpm@linux-foundation.org: fix JFS build] [akpm@linux-foundation.org: fix UFS build] [vegard.nossum@gmail.com: fix QUOTA=n build] Signed-off-by: Jan Kara <jack@suse.cz> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Arjen Pool <arjenpool@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:35 -07:00
Jan Kara	02a55ca871	quota: cleanup loop in sync_dquots() Make loop in sync_dquots() checking whether there's something to write more readable, remove useless variable and macro info_any_dirty() which is used only in this place. Signed-off-by: Jan Kara <jack@suse.cz> Cc: "Vegard Nossum" <vegard.nossum@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:35 -07:00
Jan Kara	b85f4b87a5	quota: rename quota functions from upper case, make bigger ones non-inline Cleanup quotaops.h: Rename functions from uppercase to lowercase (and define backward compatibility macros), move larger functions to dquot.c and make them non-inline. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:35 -07:00
Joe Peterson	b271e067c8	fatfs: add UTC timestamp option Provide a new mount option ("tz=UTC") for DOS (vfat/msdos) filesystems, allowing timestamps to be in coordinated universal time (UTC) rather than local time in applications where doing this is advantageous. In particular, portable devices that use fat/vfat (such as digital cameras) can benefit from using UTC in their internal clocks, thus avoiding daylight saving time errors and general time ambiguity issues. The user of the device does not have to worry about changing the time when moving from place or when daylight saving changes. The new mount option, when set, disables the counter-adjustment that Linux currently makes to FAT timestamp info in anticipation of the normal userspace time zone correction. When used in this new mode, all daylight saving time and time zone handling is done in userspace as is normal for many other filesystems (like ext3). The default mode, which remains unchanged, is still appropriate when mounting volumes written in Windows (because of its use of local time). I originally based this patch on one submitted last year by Paul Collins, but I updated it to work with current source and changed variable/option naming. Ogawa Hirofumi (who maintains these filesystems) and I discussed this patch at length on lkml, and he suggested using the option name in the attached version of the patch. Barry Bouwsma pointed out a good addition to the patch as well. Signed-off-by: Joe Peterson <joe@skyrush.com> Signed-off-by: Paul Collins <paul@ondioline.org> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Barry Bouwsma <free_beer_for_all@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:34 -07:00
Adrian Bunk	e8938a62a8	remove unused #include <linux/dirent.h>'s Remove some unused #include <linux/dirent.h>'s. Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:34 -07:00
Adrian Bunk	cf6ae8b50e	remove the in-kernel struct dirent{,64} The kernel struct dirent{,64} were different from the ones in userspace. Even worse, we exported the kernel ones to userspace. But after the fat usages are fixed we can remove the conflicting kernel versions. Reviewed-by: H. Peter Anvin <hpa@kernel.org> Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:34 -07:00
Rene Scharfe	7557bc66be	msdos fs: remove unsettable atari option It has been impossible to set the option 'atari' of the MSDOS filesystem for several years. Since nobody seems to have missed it, let's remove its remains. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:34 -07:00
OGAWA Hirofumi	4596c8aaf9	fat: fix VFAT_IOCTL_READDIR_xxx and cleanup for userland "struct dirent" is a kernel type here, but is a different type in userspace! This means both the structure and the IOCTL number is wrong! So, this adds new "struct __fat_dirent" to generate correct IOCTL number. And kernel stuff moves to under __KERNEL__. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:34 -07:00
Jeff Mahoney	90415deac7	reiserfs: convert j_commit_lock to mutex j_commit_lock is a semaphore but uses it as if it were a mutex. This patch converts it to a mutex. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Chris Mason <chris.mason@oracle.com> Cc: Edward Shishkin <edward.shishkin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:33 -07:00
Jeff Mahoney	afe7025907	reiserfs: convert j_flush_sem to mutex j_flush_sem is a semaphore but uses it as if it were a mutex. This patch converts it to a mutex. [akpm@linux-foundation.org: fix mutex_trylock retval treatment] Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Chris Mason <chris.mason@oracle.com> Cc: Edward Shishkin <edward.shishkin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:33 -07:00
Jeff Mahoney	f68215c464	reiserfs: convert j_lock to mutex j_lock is a semaphore but uses it as if it were a mutex. This patch converts it to a mutex. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Chris Mason <chris.mason@oracle.com> Cc: Edward Shishkin <edward.shishkin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:33 -07:00
Adrian Bunk	de0ca06a99	coda: remove CODA_FS_OLD_API While fixing CONFIG_ leakages to the userspace kernel headers I ran into CODA_FS_OLD_API. After five years, are there still people using the old API left? Especially considering that you have to choose at compile time which API to support in the kernel (and distributions tend to offer the new API for some time). Jan: "The old API can definitely go. Around the time the new interface went in there were some non-Coda userspace file system implementations that took a while longer to convert to the new API, but by now they all switched to the new interface or in some cases to a FUSE-based solution." Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:33 -07:00
Duane Griffin	ae76dd9a6b	ext3: handle corrupted orphan list at mount If the orphan node list includes valid, untruncatable nodes with nlink > 0 the ext3_orphan_cleanup loop which attempts to delete them will not do so, causing it to loop forever. Fix by checking for such nodes in the ext3_orphan_get function. This patch fixes the second case (image hdb.20000009.softlockup.gz) reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: printk warning fix] Signed-off-by: Duane Griffin <duaneg@dghda.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:32 -07:00
Samuel Thibault	50c33a84db	ext2: fix typo in Hurd part of include/linux/ext2_fs.h Fix typo in Hurd part of include/linux/ext2_fs.h The ';' here is redundant or can even pose problem. This is actually not used by the Linux kernel, but it is exposed in GNU/Hurd. Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:31 -07:00
Eric Miao	bbcd6d543d	gpio: max732x driver This adds a driver supporting a family of I2C port expanders from Maxim, which includes the MAX7319 and MAX7320-7327 chips. [dbrownell@users.sourceforge.net: minor fixes] Signed-off-by: Jack Ren <jack.ren@marvell.com> Signed-off-by: Eric Miao <eric.miao@marvell.com> Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:30 -07:00
David Brownell	8f1cc3b10e	gpio: mcp23s08 handles multiple chips per chipselect Teach the mcp23s08 driver about a curious feature of these chips: up to four of them can share the same chipselect, with the SPI signals wired in parallel, by matching two bits in the first protocol byte against two address lines on the chip. This is handled by three software changes: * Platform data now holds an array of per-chip structs, not just one chip's address and pullup configuration. * Probe() and remove() now use another level of structure, wrapping an instance of the original structure for each mcp23s08 chip sharing that chipselect. * The HAEN bit is set, so that the hardware address bits can no longer be ignored (boot firmware may not have enabled them). The "one struct per chip" preserves the guts of the current code, but platform_data will need minor changes. OLD: /* incorrect "slave" ID may not have mattered / .slave = 3, .pullups = BIT(3) \| BIT(1) \| BIT(0), NEW: / slave address _must_ match chip's wiring */ .chip[3] = { .is_present = true, .pullups = BIT(3) \| BIT(1) \| BIT(0), }, There's no change in how things _behave_ for spi_device nodes with a single mcp23s08 chip. New multi-chip configurations assign GPIOs in sequence, without holes. The spi_device just resembles a bigger controller, but internally it has multiple gpio_chip instances. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:30 -07:00
David Brownell	d8f388d8dc	gpio: sysfs interface This adds a simple sysfs interface for GPIOs. /sys/class/gpio /export ... asks the kernel to export a GPIO to userspace /unexport ... to return a GPIO to the kernel /gpioN ... for each exported GPIO #N /value ... always readable, writes fail for input GPIOs /direction ... r/w as: in, out (default low); write high, low /gpiochipN ... for each gpiochip; #N is its first GPIO /base ... (r/o) same as N /label ... (r/o) descriptive, not necessarily unique /ngpio ... (r/o) number of GPIOs; numbered N .. N+(ngpio - 1) GPIOs claimed by kernel code may be exported by its owner using a new gpio_export() call, which should be most useful for driver debugging. Such exports may optionally be done without a "direction" attribute. Userspace may ask to take over a GPIO by writing to a sysfs control file, helping to cope with incomplete board support or other "one-off" requirements that don't merit full kernel support: echo 23 > /sys/class/gpio/export ... will gpio_request(23, "sysfs") and gpio_export(23); use /sys/class/gpio/gpio-23/direction to (re)configure it, when that GPIO can be used as both input and output. echo 23 > /sys/class/gpio/unexport ... will gpio_free(23), when it was exported as above The extra D-space footprint is a few hundred bytes, except for the sysfs resources associated with each exported GPIO. The additional I-space footprint is about two thirds of the current size of gpiolib (!). Since no /dev node creation is involved, no "udev" support is needed. Related changes: * This adds a device pointer to "struct gpio_chip". When GPIO providers initialize that, sysfs gpio class devices become children of that device instead of being "virtual" devices. * The (few) gpio_chip providers which have such a device node have been updated. * Some gpio_chip drivers also needed to update their module "owner" field ... for which missing kerneldoc was added. * Some gpio_chips don't support input GPIOs. Those GPIOs are now flagged appropriately when the chip is registered. Based on previous patches, and discussion both on and off LKML. A Documentation/ABI/testing/sysfs-gpio update is ready to submit once this merges to mainline. [akpm@linux-foundation.org: a few maintenance build fixes] Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Guennadi Liakhovetski <g.liakhovetski@pengutronix.de> Cc: Greg KH <greg@kroah.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:30 -07:00
Srinivasa D S	ef53d9c5e4	kprobes: improve kretprobe scalability with hashed locking Currently list of kretprobe instances are stored in kretprobe object (as used_instances,free_instances) and in kretprobe hash table. We have one global kretprobe lock to serialise the access to these lists. This causes only one kretprobe handler to execute at a time. Hence affects system performance, particularly on SMP systems and when return probe is set on lot of functions (like on all systemcalls). Solution proposed here gives fine-grain locks that performs better on SMP system compared to present kretprobe implementation. Solution: 1) Instead of having one global lock to protect kretprobe instances present in kretprobe object and kretprobe hash table. We will have two locks, one lock for protecting kretprobe hash table and another lock for kretporbe object. 2) We hold lock present in kretprobe object while we modify kretprobe instance in kretprobe object and we hold per-hash-list lock while modifying kretprobe instances present in that hash list. To prevent deadlock, we never grab a per-hash-list lock while holding a kretprobe lock. 3) We can remove used_instances from struct kretprobe, as we can track used instances of kretprobe instances using kretprobe hash table. Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system with return probes set on all systemcalls looks like this. cacheline non-cacheline Un-patched kernel aligned patch aligned patch =============================================================================== real 9m46.784s 9m54.412s 10m2.450s user 40m5.715s 40m7.142s 40m4.273s sys 2m57.754s 2m58.583s 3m17.430s =========================================================== Time duration for kernel compilation ("make -j 8) on the same system, when kernel is not probed. ========================= real 9m26.389s user 40m8.775s sys 2m7.283s ========================= Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com> Signed-off-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:30 -07:00
Ben Dooks	42cd2366fb	sm501: gpio I2C support Add support for adding the GPIO based I2C resources. Signed-off-by: Ben Dooks <ben-linux@fluff.org> Cc: Arnaud Patard <apatard@mandriva.com> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:30 -07:00
Arnaud Patard	60e540d617	sm501: gpio dynamic registration for PCI devices The SM501 PCI card requires a dyanmic gpio allocation as the number of cards is not known at compile time. Fixup the platform data and registration to deal with this. Acked-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Arnaud Patard <apatard@mandriva.com> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:30 -07:00
Ben Dooks	f61be273d3	sm501: add gpiolib support Add support for exporting the GPIOs on the SM501 via gpiolib. Signed-off-by: Ben Dooks <ben-linux@fluff.org> Cc: Arnaud Patard <apatard@mandriva.com> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:29 -07:00
Ben Dooks	472dba7d11	sm501: add power control callback Add callback to get or set the power control if the device has the sleep connected to some form of GPIO. Signed-off-by: Ben Dooks <ben-linux@fluff.org> Cc: Arnaud Patard <apatard@mandriva.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:29 -07:00
Dave Young	717115e1a5	printk ratelimiting rewrite All ratelimit user use same jiffies and burst params, so some messages (callbacks) will be lost. For example: a call printk_ratelimit(5 * HZ, 1) b call printk_ratelimit(5 * HZ, 1) before the 5*HZ timeout of a, then b will will be supressed. - rewrite __ratelimit, and use a ratelimit_state as parameter. Thanks for hints from andrew. - Add WARN_ON_RATELIMIT, update rcupreempt.h - remove __printk_ratelimit - use __ratelimit in net_ratelimit Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:29 -07:00
Vegard Nossum	2711b793eb	kallsyms: unify 32- and 64-bit code Use the %p format string which already accounts for the padding you need with a pointer type on a particular architecture. Also replace the macro with a static inline function to match the rest of the file. Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:29 -07:00
Arjan van de Ven	b6c6393700	Rename WARN() to WARNING() to clear the namespace We want to use WARN() as a variant of WARN_ON(), however a few drivers are using WARN() internally. This patch renames these to WARNING() to avoid the namespace clash. A few cases were defining but not using the thing, for those cases I just deleted the definition. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Greg KH <greg@kroah.com> Cc: Karsten Keil <kkeil@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:29 -07:00
Robert P. J. Day	4500d067ee	init.h: remove obsolete content Remove apparently obsolete content from init.h referring to gcc 2.9x and to "no_module_init". Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:28 -07:00
KOSAKI Motohiro	ac331d158e	call_usermodehelper(): increase reliability Presently call_usermodehelper_setup() uses GFP_ATOMIC. but it can return NULL _very_ easily. GFP_ATOMIC is needed only when we can't sleep. and, GFP_KERNEL is robust and better. thus, I add gfp_mask argument to call_usermodehelper_setup(). So, its callers pass the gfp_t as below: call_usermodehelper() and call_usermodehelper_keys(): depend on 'wait' argument. call_usermodehelper_pipe(): always GFP_KERNEL because always run under process context. orderly_poweroff(): pass to GFP_ATOMIC because may run under interrupt context. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: "Paul Menage" <menage@google.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:28 -07:00
Andrew Morton	cebbd3fb80	build-kernel-profileo-only-when-requested-cleanups Cc: Adrian Bunk <bunk@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:27 -07:00
Adrian Bunk	b03f6489f9	build kernel/profile.o only when requested Build kernel/profile.o only if CONFIG_PROFILING is enabled. This makes CONFIG_PROFILING=n kernels smaller. As a bonus, some profile_tick() calls and one branch from schedule() are now eliminated with CONFIG_PROFILING=n (but I doubt these are measurable effects). This patch changes the effects of CONFIG_PROFILING=n, but I don't think having more than two choices would be the better choice. This patch also adds the name of the first parameter to the prototypes of profile_{hits,tick}() since I anyway had to add them for the dummy functions. Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:27 -07:00
Robert P. J. Day	e0ce0da9fe	lists: remove a redundant conditional definition of list_add() Remove the conditional surrounding the definition of list_add() from list.h since, if you define CONFIG_DEBUG_LIST, the definition you will subsequently pick up from lib/list_debug.c will be absolutely identical, at which point you can remove that redundant definition from list_debug.c as well. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:27 -07:00
Robert P. J. Day	b39c08cb69	Remove apparently unused fd1772.h header file. This header file has been unused for quite some time, and the corresponding source files appear to have been removed back in commit `99eb8a550d` ("Remove the arm26 port") Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Cc: Adrian Bunk <bunk@stusta.de> Cc: Ian Molton <spyro@f2s.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:27 -07:00
Harvey Harrison	8b5ac31e27	include: use get/put_unaligned_* helpers Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: "John W. Linville" <linville@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:26 -07:00
Steven Rostedt	3f307891ce	locking: add typecheck on irqsave and friends for correct flags There haave been several areas in the kernel where an int has been used for flags in local_irq_save() and friends instead of a long. This can cause some hard to debug problems on some architectures. This patch adds a typecheck inside the irqsave and restore functions to flag these cases. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: build fix] Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:26 -07:00
Andrew Morton	e0deaff470	split the typecheck macros out of include/linux/kernel.h Needed to fix up a recursive include snafu in locking-add-typecheck-on-irqsave-and-friends-for-correct-flags.patch Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:26 -07:00
Yevgeny Petrilin	25c94d010a	mlx4_core: Add VLAN tag field to WQE control segment struct Add fields for VLAN tag and insert VLAN tag flag to the control section struct. These fields will be used for sending ethernet packets. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-25 10:30:06 -07:00
David Miller	3d6f4a20cc	endian: Always evaluate arguments. Changeset `7fa897b91a` ("ide: trivial sparse annotations") created an IDE bootup regression on big-endian systems. In drivers/ide/ide-iops.c, function ide_fixstring() we now have the loop: for (p = end ; p != s;) be16_to_cpus((u16 *)(p -= 2)); which will never terminate on big-endian because in such a configuration be16_to_cpus() evaluates to "do { } while (0)" Therefore, always evaluate the arguments to nop endian transformation operations. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 09:28:09 -07:00
Alexey Korolev	3d45955962	[MTD] [NAND] subpage read feature as a way to increase performance. This patch enables NAND subpage read functionality. If upper layer drivers are requesting to read non page aligned data NAND subpage-read functionality reads the only whose ECC regions which include requested data when original code reads whole page. This significantly improves performance in many cases. Here are some digits : UBI volume mount time No subpage reads: 5.75 seconds Subpage read patch: 2.42 seconds Open/stat time for files on JFFS2 volume: No subpage read 0m 5.36s Subpage read 0m 2.88s Signed-off-by Alexey Korolev <akorolev@infradead.org> Acked-by: Artem Bityutskiy <dedekind@infradead.org> Acked-by: Jörn Engel <joern@logfs.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2008-07-25 10:49:50 -04:00
David Woodhouse	ff877ea80e	Merge branch 'linux-next' of git://git.infradead.org/~dedekind/ubi-2.6	2008-07-25 10:40:14 -04:00
Stefan Richter	95984f62c9	firewire: fw-ohci: TSB43AB22/A dualbuffer workaround Isochronous reception in dualbuffer mode is reportedly broken with TI TSB43AB22A on x86-64. Descriptor addresses above 2G have been determined as the trigger: https://bugzilla.redhat.com/show_bug.cgi?id=435550 Two fixes are possible: - pci_set_consistent_dma_mask(pdev, DMA_31BIT_MASK); at least when IR descriptors are allocated, or - simply don't use dualbuffer. This fix implements the latter workaround. But we keep using dualbuffer on x86-32 which won't give us highmen (and thus physical addresses outside the 31bit range) in coherent DMA memory allocations. Right now we could for example also whitelist PPC32, but DMA mapping implementation details are expected to change there. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>	2008-07-25 15:41:23 +02:00
Nathan Lynch	483fad1c3f	ELF loader support for auxvec base platform string Some IBM POWER-based platforms have the ability to run in a mode which mostly appears to the OS as a different processor from the actual hardware. For example, a Power6 system may appear to be a Power5+, which makes the AT_PLATFORM value "power5+". This means that programs are restricted to the ISA supported by Power5+; Power6-specific instructions are treated as illegal. However, some applications (virtual machines, optimized libraries) can benefit from knowledge of the underlying CPU model. A new aux vector entry, AT_BASE_PLATFORM, will denote the actual hardware. For example, on a Power6 system in Power5+ compatibility mode, AT_PLATFORM will be "power5+" and AT_BASE_PLATFORM will be "power6". The idea is that AT_PLATFORM indicates the instruction set supported, while AT_BASE_PLATFORM indicates the underlying microarchitecture. If the architecture has defined ELF_BASE_PLATFORM, copy that value to the user stack in the same manner as ELF_PLATFORM. Signed-off-by: Nathan Lynch <ntl@pobox.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-07-25 15:44:39 +10:00
Linus Torvalds	832fe9c222	Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: virtio: Add transport feature handling stub for virtio_ring. virtio: Rename set_features to finalize_features virtio: Formally reserve bits 28-31 to be 'transport' features. s390: use virtio_console for KVM on s390 virtio: console as a config option virtio_console: use virtqueue notification for hvc_console hvc_console: rework setup to replace irq functions with callbacks virtio_blk: check for hardsector size from host virtio: Use bus_type probe and remove methods virtio: don't always force a notification when ring is full virtio: clarify that ABI is usable by any implementations virtio: Recycle unused recv buffer pages for large skbs in net driver virtio net: Allow receiving SG packets virtio net: Add ethtool ops for SG/GSO virtio: fix virtio_net xmit of freed skb bug	2008-07-24 19:11:49 -07:00
Rusty Russell	ed9559d38a	Label kthread_create() with printf attribute tag. Obvious misc patch been in my queue (& linux-next) for over a cycle. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 19:11:15 -07:00
Rusty Russell	e34f872567	virtio: Add transport feature handling stub for virtio_ring. To prepare for virtio_ring transport feature bits, hook in a call in all the users to manipulate them. This currently just clears all the bits, since it doesn't understand any features. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-07-25 12:06:14 +10:00
Rusty Russell	c624896e48	virtio: Rename set_features to finalize_features Rather than explicitly handing the features to the lower-level, we just hand the virtio_device and have it set the features. This make it clear that it has the chance to manipulate the features of the device at this point (and that all feature negotiation is already done). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-07-25 12:06:12 +10:00
Rusty Russell	dd7c7bc462	virtio: Formally reserve bits 28-31 to be 'transport' features. We assign feature bits as required, but it makes sense to reserve some for the particular transport, rather than the particular device. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-07-25 12:06:07 +10:00
Christian Borntraeger	066f4d82a6	virtio_blk: check for hardsector size from host Currently virtio_blk assumes a 512 byte hard sector size. This can cause trouble / performance issues if the backing has a different block size (like a file on an ext3 file system formatted with 4k block size or a dasd). Lets add a feature flag that tells the guest to use a different hard sector size than 512 byte. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-07-25 12:06:05 +10:00
Rusty Russell	674bfc23c5	virtio: clarify that ABI is usable by any implementations We want others to implement and use virtio, so it makes sense to BSD license the non-__KERNEL__ parts of the headers to make this crystal clear. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Mark McLoughlin <markmc@redhat.com> Acked-by: Ryan Harper <ryanh@us.ibm.com> Acked-by: Eric Van Hensbergen <ericvh@gmail.com> Acked-by: Anthony Liguori <aliguori@us.ibm.com>	2008-07-25 12:06:04 +10:00
Linus Torvalds	b5684b83b1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (76 commits) ide: use proper printk() KERN_* levels in ide-probe.c ide: fix for EATA SCSI HBA in ATA emulating mode ide: remove stale comments from drivers/ide/Makefile ide: enable local IRQs in all handlers for TASKFILE_NO_DATA data phase ide-scsi: remove kmalloced struct request ht6560b: remove old history ht6560b: update email address ide-cd: fix oops when using growisofs gayle: release resources on ide_host_add() failure palm_bk3710: add UltraDMA/100 support ide: trivial sparse annotations ide: ide-tape.c sparse annotations and unaligned access removal ide: drop 'name' parameter from ->init_chipset method ide: prefix messages from IDE PCI host drivers by driver name it821x: remove DECLARE_ITE_DEV() macro it8213: remove DECLARE_ITE_DEV() macro ide: include PCI device name in messages from IDE PCI host drivers ide: remove <asm/ide.h> for some archs ide-generic: remove ide_default_{io_base,irq}() inlines (take 3) ide-generic: is no longer needed on ppc32 ...	2008-07-24 14:55:09 -07:00
Linus Torvalds	5042d99795	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI: fixup sparse endianness warnings in proc.c PCI PM: make more PCI PM core functionality available to drivers PCI/DMAR: don't assume presence of RMRRs PCI hotplug: fix error path in pci_slot's register_slot	2008-07-24 13:57:13 -07:00
Bartlomiej Zolnierkiewicz	a326b02b0c	ide: drop 'name' parameter from ->init_chipset method There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-24 22:53:33 +02:00
Bartlomiej Zolnierkiewicz	2a8f7450f8	ide: remove <asm/ide.h> for some archs * Remove <linux/irq.h> include from <asm-ia64.h> (<linux/ide.h> includes <linux/interrupt.h> which is enough). * Remove <asm/ide.h> for alpha/blackfin/h8300/ia64/m32r/sh/x86/xtensa (this leaves us with arm/frv/m68k/mips/mn10300/parisc/powerpc/sparc[64]). There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-24 22:53:31 +02:00
Bartlomiej Zolnierkiewicz	d83b8b85cd	ide: define MAX_HWIFS in <linux/ide.h> * Now that ide_hwif_t instances are allocated dynamically the difference between MAX_HWIFS == 2 and MAX_HWIFS == 10 is ~100 bytes (x86-32) so use MAX_HWIFS == 10 on all archs except these ones that use MAX_HWIFS == 1. * Define MAX_HWIFS in <linux/ide.h> instead of <asm/ide.h>. [ Please note that avr32/cris/v850 have no <asm/ide.h> and alpha/ia64/sh always define CONFIG_IDE_MAX_HWIFS. ] Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-24 22:53:30 +02:00
Bartlomiej Zolnierkiewicz	ef0b04276d	ide: add ide_pci_remove() helper * Add 'unsigned long host_flags' field to struct ide_host. * Set ->host_flags in ide_host_alloc_all(). * Always set PCI dev's ->driver_data in ide_pci_init_{one,two}(). * Add ide_pci_remove() helper (the default implementation for struct pci_driver's ->remove method). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-24 22:53:19 +02:00
Bartlomiej Zolnierkiewicz	08da591e14	ide: add ide_device_{get,put}() helpers * Add 'struct ide_host host' field to ide_hwif_t and set it in ide_host_alloc_all(). Add ide_device_{get,put}() helpers loosely based on SCSI's scsi_device_{get,put}() ones. * Convert IDE device drivers to use ide_device_{get,put}(). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-24 22:53:15 +02:00
Bartlomiej Zolnierkiewicz	6cdf6eb357	ide: add ->dev and ->host_priv fields to struct ide_host * Add 'struct device dev[2]' and 'void host_priv' fields to struct ide_host. * Set ->dev[] in ide_host_alloc_all()/ide_setup_pci_device[s](). * Pass 'void priv' argument to ide_setup_pci_device[s]() and use it to set ->host_priv. Set PCI dev's ->driver_data to point to the struct ide_host instance if PCI host driver wants to use ->host_priv. * Rename ide_setup_pci_device[s]() to ide_pci_init_{one,two}(). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-24 22:53:14 +02:00
Linus Torvalds	5c402355ad	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: MAINTAINERS: Remove Glenn Streiff from NetEffect entry mlx4_core: Improve error message when not enough UAR pages are available IB/mlx4: Add support for memory management extensions and local DMA L_Key IB/mthca: Keep free count for MTT buddy allocator mlx4_core: Keep free count for MTT buddy allocator mlx4_code: Add missing FW status return code IB/mlx4: Rename struct mlx4_lso_seg to mlx4_wqe_lso_seg mlx4_core: Add module parameter to enable QoS support RDMA/iwcm: Remove IB_ACCESS_LOCAL_WRITE from remote QP attributes IPoIB: Include err code in trace message for ib_sa_path_rec_get() failures IB/sa_query: Check if sm_ah is NULL in ib_sa_remove_one() IB/ehca: Release mutex in error path of alloc_small_queue_page() IB/ehca: Use default value for Local CA ACK Delay if FW returns 0 IB/ehca: Filter PATH_MIG events if QP was never armed IB/iser: Add support for RDMA_CM_EVENT_ADDR_CHANGE event RDMA/cma: Add RDMA_CM_EVENT_TIMEWAIT_EXIT event RDMA/cma: Add RDMA_CM_EVENT_ADDR_CHANGE event	2008-07-24 12:56:07 -07:00
Linus Torvalds	ecc8b655b3	Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: nohz: adjust tick_nohz_stop_sched_tick() call of s390 as well nohz: prevent tick stop outside of the idle loop	2008-07-24 12:55:01 -07:00
Linus Torvalds	7540081c6b	Merge branch 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc * 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: Remove __DECLARE_SEMAPHORE_GENERIC Remove asm/semaphore.h Remove use of asm/semaphore.h Add missing semaphore.h includes Remove mention of semaphores from kernel-locking	2008-07-24 12:24:40 -07:00
Linus Torvalds	c54554d388	Merge branch 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds * 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds: leds: Ensure led->trigger is set earlier leds: Add support for Philips PCA955x I2C LED drivers leds: Fix sparse warnings in leds-h1940 driver leds: mark led_classdev.default_trigger as const leds: fix unsigned value overflow in atmel pwm driver leds: Add pca9532 platform data for Thecus N2100 leds: Add pca9532 led driver	2008-07-24 12:16:02 -07:00
Steven Whitehouse	f9247273cb	UFS: add const to parser token table This patch adds a "const" to the parser token table. I've done an allmodconfig build to see if this produces any warnings/failures and the patch includes a fix for the only warning that was produced. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Alexander Viro <aviro@redhat.com> Acked-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 11:50:15 -07:00
Philippe De Muyter	5bb49fcd50	video/fb: cleanup FB_MAJOR usage Currently, linux/major.h defines a GRAPHDEV_MAJOR (29) that nobody uses, and linux/fb.h defines the real FB_MAJOR (also 29), that only fbmem.c needs. Drop GRAPHDEV_MAJOR from major.h, move FB_MAJOR definition from fb.h to major.h, and fix fbmem.c to use major.h's definition. Signed-off-by: Philippe De Muyter <phdm@macqel.be> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:41 -07:00
Hans-Christian Egtvedt	3e074058d7	fbdev: LCD backlight driver using Atmel PWM driver This patch adds a platform driver using the ATMEL PWM driver to control a backlight which requires a PWM signal and optional GPIO signal for discrete on/off signal. It has been tested on Favr-32 board from EarthLCD. The driver is configurable by supplying a struct with the platform data. See the include/linux/atmel-pwm-bl.h for details. The board code for Favr-32 will be submitted to the AVR32 kernel list. Signed-off-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:41 -07:00
Ben Dooks	0c531360ed	lcd: add lcd_device to check_fb() entry in lcd_ops Add the lcd_device being checked to the check_fb entry of lcd_ops. This ensures that any driver using this to check against it's own state can do so, and also makes all the calls in lcd_ops more orthogonal in their arguments. Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:40 -07:00
Ben Dooks	206c5d69d0	sm501: add inversion controls for VBIASEN and FPEN Add flags to allow the driver to invert the sense of both VBIASEN and FPEN signals comming from the SM501. Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:40 -07:00
Krzysztof Helt	01a2d9ed85	tridentfb: acceleration constants change This patch replaces deprecated constant FB_ACCELF_TEXT with FBINFO_HWACCEL_DISABLED and adds constants for Trident families of accelerators. The FBINFO_HWACCEL_DISABLED is correctly used so noaccel parameter works now. Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:36 -07:00
David Brownell	d3de851a44	rtc: BCD codeshrink This updates <linux/bcd.h> to define the key routines as constant functions, which the macros will then call. Newer code can now call bcd2bin() instead of SCREAMING BCD2BIN() TO THE FOUR WINDS. This lets each driver shrink their codespace by using N function calls to a single (global) copy of those routines, instead of N inlined copies of these functions per driver. These routines aren't used in speed-critical code. Almost all callers are in the RTC framework. Typical per-driver savings is near 300 bytes. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Adrian Bunk <bunk@kernel.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:33 -07:00
David Brownell	53e84b672c	rtc: ds1305/ds1306 driver Support the Dallas/Maxim DS1305 and DS1306 RTC chips. These use SPI, and support alarms, NVRAM, and a trickle charger for use when their backup power supply is a supercap or rechargeable cell. This basic driver doesn't yet support suspend/resume or wakealarms. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:33 -07:00
David Brownell	5ad31a5751	rtc: remove BKL for ioctl() Remove implicit use of BKL in ioctl() from the RTC framework. Instead, the rtc->ops_lock is used. That's the same lock that already protects the RTC operations when they're issued through the exported rtc_*() calls in drivers/rtc/interface.c ... making this a bugfix, not just a cleanup, since both ioctl calls and set_alarm() need to update IRQ enable flags and that implies a common lock (which RTC drivers as a rule do not provide on their own). A new comment at the declaration of "struct rtc_class_ops" summarizes current locking rules. It's not clear to me that the exceptions listed there should exist ... if not, those are pre-existing problems which can be fixed in a patch that doesn't relate to BKL removal. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Jonathan Corbet <corbet@lwn.net> Acked-by: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:33 -07:00
Ian Kent	aa55ddf340	autofs4: remove unused ioctls The ioctls AUTOFS_IOC_TOGGLEREGHOST and AUTOFS_IOC_ASKREGHOST were added several years ago but what they were intended for has never been implemented (as far as I'm aware noone uses them) so remove them. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:33 -07:00
Grant Likely	102eb97564	spi: make spi_board_info.modalias a char array Currently, 'modalias' in the spi_device structure is a 'const char '. The spi_new_device() function fills in the modalias value from a passed in spi_board_info data block. Since it is a pointer copy, the new spi_device remains dependent on the spi_board_info structure after the new spi_device is registered (no other fields in spi_device directly depend on the spi_board_info structure; all of the other data is copied). This causes a problem when dynamically propulating the list of attached SPI devices. For example, in arch/powerpc, the list of SPI devices can be populated from data in the device tree. With the current code, the device tree adapter must kmalloc() a new spi_board_info structure for each new SPI device it finds in the device tree, and there is no simple mechanism in place for keeping track of these allocations. This patch changes modalias from a 'const char ' to a fixed char array. By copying the modalias string instead of referencing it, the dependency on the spi_board_info structure is eliminated and an outside caller does not need to maintain a separate spi_board_info allocation for each device. If searched through the code to the best of my ability for any references to modalias which may be affected by this change and haven't found anything. It has been tested with the lite5200b platform in arch/powerpc. [dbrownell@users.sourceforge.net: cope with linux-next changes: KOBJ_NAME_LEN obliterated, etc] Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:30 -07:00
Ulrich Drepper	9fe5ad9c8c	flag parameters add-on: remove epoll_create size param Remove the size parameter from the new epoll_create syscall and renames the syscall itself. The updated test program follows. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <time.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_epoll_create2 # ifdef __x86_64__ # define __NR_epoll_create2 291 # elif defined __i386__ # define __NR_epoll_create2 329 # else # error "need __NR_epoll_create2" # endif #endif #define EPOLL_CLOEXEC O_CLOEXEC int main (void) { int fd = syscall (__NR_epoll_create2, 0); if (fd == -1) { puts ("epoll_create2(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("epoll_create2(0) set close-on-exec flag"); return 1; } close (fd); fd = syscall (__NR_epoll_create2, EPOLL_CLOEXEC); if (fd == -1) { puts ("epoll_create2(EPOLL_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("epoll_create2(EPOLL_CLOEXEC) set close-on-exec flag"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:29 -07:00
Ulrich Drepper	510df2dd48	flag parameters: NONBLOCK in inotify_init This patch adds non-blocking support for inotify_init1. The additional changes needed are minimal. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_inotify_init1 # ifdef __x86_64__ # define __NR_inotify_init1 294 # elif defined __i386__ # define __NR_inotify_init1 332 # else # error "need __NR_inotify_init1" # endif #endif #define IN_NONBLOCK O_NONBLOCK int main (void) { int fd = syscall (__NR_inotify_init1, 0); if (fd == -1) { puts ("inotify_init1(0) failed"); return 1; } int fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if (fl & O_NONBLOCK) { puts ("inotify_init1(0) set non-blocking mode"); return 1; } close (fd); fd = syscall (__NR_inotify_init1, IN_NONBLOCK); if (fd == -1) { puts ("inotify_init1(IN_NONBLOCK) failed"); return 1; } fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if ((fl & O_NONBLOCK) == 0) { puts ("inotify_init1(IN_NONBLOCK) set non-blocking mode"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:29 -07:00
Ulrich Drepper	be61a86d72	flag parameters: NONBLOCK in pipe This patch adds O_NONBLOCK support to pipe2. It is minimally more involved than the patches for eventfd et.al but still trivial. The interfaces of the create_write_pipe and create_read_pipe helper functions were changed and the one other caller as well. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_pipe2 # ifdef __x86_64__ # define __NR_pipe2 293 # elif defined __i386__ # define __NR_pipe2 331 # else # error "need __NR_pipe2" # endif #endif int main (void) { int fds[2]; if (syscall (__NR_pipe2, fds, 0) == -1) { puts ("pipe2(0) failed"); return 1; } for (int i = 0; i < 2; ++i) { int fl = fcntl (fds[i], F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if (fl & O_NONBLOCK) { printf ("pipe2(0) set non-blocking mode for fds[%d]\n", i); return 1; } close (fds[i]); } if (syscall (__NR_pipe2, fds, O_NONBLOCK) == -1) { puts ("pipe2(O_NONBLOCK) failed"); return 1; } for (int i = 0; i < 2; ++i) { int fl = fcntl (fds[i], F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if ((fl & O_NONBLOCK) == 0) { printf ("pipe2(O_NONBLOCK) does not set non-blocking mode for fds[%d]\n", i); return 1; } close (fds[i]); } puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:29 -07:00
Ulrich Drepper	6b1ef0e60d	flag parameters: NONBLOCK in timerfd_create This patch adds support for the TFD_NONBLOCK flag to timerfd_create. The additional changes needed are minimal. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <time.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_timerfd_create # ifdef __x86_64__ # define __NR_timerfd_create 283 # elif defined __i386__ # define __NR_timerfd_create 322 # else # error "need __NR_timerfd_create" # endif #endif #define TFD_NONBLOCK O_NONBLOCK int main (void) { int fd = syscall (__NR_timerfd_create, CLOCK_REALTIME, 0); if (fd == -1) { puts ("timerfd_create(0) failed"); return 1; } int fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if (fl & O_NONBLOCK) { puts ("timerfd_create(0) set non-blocking mode"); return 1; } close (fd); fd = syscall (__NR_timerfd_create, CLOCK_REALTIME, TFD_NONBLOCK); if (fd == -1) { puts ("timerfd_create(TFD_NONBLOCK) failed"); return 1; } fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if ((fl & O_NONBLOCK) == 0) { puts ("timerfd_create(TFD_NONBLOCK) set non-blocking mode"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:29 -07:00
Ulrich Drepper	e7d476dfdf	flag parameters: NONBLOCK in eventfd This patch adds support for the EFD_NONBLOCK flag to eventfd2. The additional changes needed are minimal. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_eventfd2 # ifdef __x86_64__ # define __NR_eventfd2 290 # elif defined __i386__ # define __NR_eventfd2 328 # else # error "need __NR_eventfd2" # endif #endif #define EFD_NONBLOCK O_NONBLOCK int main (void) { int fd = syscall (__NR_eventfd2, 1, 0); if (fd == -1) { puts ("eventfd2(0) failed"); return 1; } int fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if (fl & O_NONBLOCK) { puts ("eventfd2(0) sets non-blocking mode"); return 1; } close (fd); fd = syscall (__NR_eventfd2, 1, EFD_NONBLOCK); if (fd == -1) { puts ("eventfd2(EFD_NONBLOCK) failed"); return 1; } fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if ((fl & O_NONBLOCK) == 0) { puts ("eventfd2(EFD_NONBLOCK) does not set non-blocking mode"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:29 -07:00
Ulrich Drepper	5fb5e04926	flag parameters: NONBLOCK in signalfd This patch adds support for the SFD_NONBLOCK flag to signalfd4. The additional changes needed are minimal. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <signal.h> #include <stdio.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_signalfd4 # ifdef __x86_64__ # define __NR_signalfd4 289 # elif defined __i386__ # define __NR_signalfd4 327 # else # error "need __NR_signalfd4" # endif #endif #define SFD_NONBLOCK O_NONBLOCK int main (void) { sigset_t ss; sigemptyset (&ss); sigaddset (&ss, SIGUSR1); int fd = syscall (__NR_signalfd4, -1, &ss, 8, 0); if (fd == -1) { puts ("signalfd4(0) failed"); return 1; } int fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if (fl & O_NONBLOCK) { puts ("signalfd4(0) set non-blocking mode"); return 1; } close (fd); fd = syscall (__NR_signalfd4, -1, &ss, 8, SFD_NONBLOCK); if (fd == -1) { puts ("signalfd4(SFD_NONBLOCK) failed"); return 1; } fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if ((fl & O_NONBLOCK) == 0) { puts ("signalfd4(SFD_NONBLOCK) does not set non-blocking mode"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:29 -07:00
Ulrich Drepper	77d2720059	flag parameters: NONBLOCK in socket and socketpair This patch introduces support for the SOCK_NONBLOCK flag in socket, socketpair, and paccept. To do this the internal function sock_attach_fd gets an additional parameter which it uses to set the appropriate flag for the file descriptor. Given that in modern, scalable programs almost all socket connections are non-blocking and the minimal additional cost for the new functionality I see no reason not to add this code. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <pthread.h> #include <stdio.h> #include <unistd.h> #include <netinet/in.h> #include <sys/socket.h> #include <sys/syscall.h> #ifndef __NR_paccept # ifdef __x86_64__ # define __NR_paccept 288 # elif defined __i386__ # define SYS_PACCEPT 18 # define USE_SOCKETCALL 1 # else # error "need __NR_paccept" # endif #endif #ifdef USE_SOCKETCALL # define paccept(fd, addr, addrlen, mask, flags) \ ({ long args[6] = { \ (long) fd, (long) addr, (long) addrlen, (long) mask, 8, (long) flags }; \ syscall (__NR_socketcall, SYS_PACCEPT, args); }) #else # define paccept(fd, addr, addrlen, mask, flags) \ syscall (__NR_paccept, fd, addr, addrlen, mask, 8, flags) #endif #define PORT 57392 #define SOCK_NONBLOCK O_NONBLOCK static pthread_barrier_t b; static void * tf (void arg) { pthread_barrier_wait (&b); int s = socket (AF_INET, SOCK_STREAM, 0); struct sockaddr_in sin; sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); sin.sin_port = htons (PORT); connect (s, (const struct sockaddr ) &sin, sizeof (sin)); close (s); pthread_barrier_wait (&b); pthread_barrier_wait (&b); s = socket (AF_INET, SOCK_STREAM, 0); sin.sin_port = htons (PORT); connect (s, (const struct sockaddr ) &sin, sizeof (sin)); close (s); pthread_barrier_wait (&b); return NULL; } int main (void) { int fd; fd = socket (PF_INET, SOCK_STREAM, 0); if (fd == -1) { puts ("socket(0) failed"); return 1; } int fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if (fl & O_NONBLOCK) { puts ("socket(0) set non-blocking mode"); return 1; } close (fd); fd = socket (PF_INET, SOCK_STREAM\|SOCK_NONBLOCK, 0); if (fd == -1) { puts ("socket(SOCK_NONBLOCK) failed"); return 1; } fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if ((fl & O_NONBLOCK) == 0) { puts ("socket(SOCK_NONBLOCK) does not set non-blocking mode"); return 1; } close (fd); int fds[2]; if (socketpair (PF_UNIX, SOCK_STREAM, 0, fds) == -1) { puts ("socketpair(0) failed"); return 1; } for (int i = 0; i < 2; ++i) { fl = fcntl (fds[i], F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if (fl & O_NONBLOCK) { printf ("socketpair(0) set non-blocking mode for fds[%d]\n", i); return 1; } close (fds[i]); } if (socketpair (PF_UNIX, SOCK_STREAM\|SOCK_NONBLOCK, 0, fds) == -1) { puts ("socketpair(SOCK_NONBLOCK) failed"); return 1; } for (int i = 0; i < 2; ++i) { fl = fcntl (fds[i], F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if ((fl & O_NONBLOCK) == 0) { printf ("socketpair(SOCK_NONBLOCK) does not set non-blocking mode for fds[%d]\n", i); return 1; } close (fds[i]); } pthread_barrier_init (&b, NULL, 2); struct sockaddr_in sin; pthread_t th; if (pthread_create (&th, NULL, tf, NULL) != 0) { puts ("pthread_create failed"); return 1; } int s = socket (AF_INET, SOCK_STREAM, 0); int reuse = 1; setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse)); sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); sin.sin_port = htons (PORT); bind (s, (struct sockaddr ) &sin, sizeof (sin)); listen (s, SOMAXCONN); pthread_barrier_wait (&b); int s2 = paccept (s, NULL, 0, NULL, 0); if (s2 < 0) { puts ("paccept(0) failed"); return 1; } fl = fcntl (s2, F_GETFL); if (fl & O_NONBLOCK) { puts ("paccept(0) set non-blocking mode"); return 1; } close (s2); close (s); pthread_barrier_wait (&b); s = socket (AF_INET, SOCK_STREAM, 0); sin.sin_port = htons (PORT); setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse)); bind (s, (struct sockaddr *) &sin, sizeof (sin)); listen (s, SOMAXCONN); pthread_barrier_wait (&b); s2 = paccept (s, NULL, 0, NULL, SOCK_NONBLOCK); if (s2 < 0) { puts ("paccept(SOCK_NONBLOCK) failed"); return 1; } fl = fcntl (s2, F_GETFL); if ((fl & O_NONBLOCK) == 0) { puts ("paccept(SOCK_NONBLOCK) does not set non-blocking mode"); return 1; } close (s2); close (s); pthread_barrier_wait (&b); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:29 -07:00
Ulrich Drepper	4006553b06	flag parameters: inotify_init This patch introduces the new syscall inotify_init1 (note: the 1 stands for the one parameter the syscall takes, as opposed to no parameter before). The values accepted for this parameter are function-specific and defined in the inotify.h header. Here the values must match the O_* flags, though. In this patch CLOEXEC support is introduced. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_inotify_init1 # ifdef __x86_64__ # define __NR_inotify_init1 294 # elif defined __i386__ # define __NR_inotify_init1 332 # else # error "need __NR_inotify_init1" # endif #endif #define IN_CLOEXEC O_CLOEXEC int main (void) { int fd; fd = syscall (__NR_inotify_init1, 0); if (fd == -1) { puts ("inotify_init1(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("inotify_init1(0) set close-on-exit"); return 1; } close (fd); fd = syscall (__NR_inotify_init1, IN_CLOEXEC); if (fd == -1) { puts ("inotify_init1(IN_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("inotify_init1(O_CLOEXEC) does not set close-on-exit"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [akpm@linux-foundation.org: add sys_ni stub] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:28 -07:00
Ulrich Drepper	ed8cae8ba0	flag parameters: pipe This patch introduces the new syscall pipe2 which is like pipe but it also takes an additional parameter which takes a flag value. This patch implements the handling of O_CLOEXEC for the flag. I did not add support for the new syscall for the architectures which have a special sys_pipe implementation. I think the maintainers of those archs have the chance to go with the unified implementation but that's up to them. The implementation introduces do_pipe_flags. I did that instead of changing all callers of do_pipe because some of the callers are written in assembler. I would probably screw up changing the assembly code. To avoid breaking code do_pipe is now a small wrapper around do_pipe_flags. Once all callers are changed over to do_pipe_flags the old do_pipe function can be removed. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_pipe2 # ifdef __x86_64__ # define __NR_pipe2 293 # elif defined __i386__ # define __NR_pipe2 331 # else # error "need __NR_pipe2" # endif #endif int main (void) { int fd[2]; if (syscall (__NR_pipe2, fd, 0) != 0) { puts ("pipe2(0) failed"); return 1; } for (int i = 0; i < 2; ++i) { int coe = fcntl (fd[i], F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { printf ("pipe2(0) set close-on-exit for fd[%d]\n", i); return 1; } } close (fd[0]); close (fd[1]); if (syscall (__NR_pipe2, fd, O_CLOEXEC) != 0) { puts ("pipe2(O_CLOEXEC) failed"); return 1; } for (int i = 0; i < 2; ++i) { int coe = fcntl (fd[i], F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { printf ("pipe2(O_CLOEXEC) does not set close-on-exit for fd[%d]\n", i); return 1; } } close (fd[0]); close (fd[1]); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:28 -07:00
Ulrich Drepper	336dd1f70f	flag parameters: dup2 This patch adds the new dup3 syscall. It extends the old dup2 syscall by one parameter which is meant to hold a flag value. Support for the O_CLOEXEC flag is added in this patch. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <time.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_dup3 # ifdef __x86_64__ # define __NR_dup3 292 # elif defined __i386__ # define __NR_dup3 330 # else # error "need __NR_dup3" # endif #endif int main (void) { int fd = syscall (__NR_dup3, 1, 4, 0); if (fd == -1) { puts ("dup3(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("dup3(0) set close-on-exec flag"); return 1; } close (fd); fd = syscall (__NR_dup3, 1, 4, O_CLOEXEC); if (fd == -1) { puts ("dup3(O_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("dup3(O_CLOEXEC) set close-on-exec flag"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:28 -07:00
Ulrich Drepper	a0998b50c3	flag parameters: epoll_create This patch adds the new epoll_create2 syscall. It extends the old epoll_create syscall by one parameter which is meant to hold a flag value. In this patch the only flag support is EPOLL_CLOEXEC which causes the close-on-exec flag for the returned file descriptor to be set. A new name EPOLL_CLOEXEC is introduced which in this implementation must have the same value as O_CLOEXEC. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <time.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_epoll_create2 # ifdef __x86_64__ # define __NR_epoll_create2 291 # elif defined __i386__ # define __NR_epoll_create2 329 # else # error "need __NR_epoll_create2" # endif #endif #define EPOLL_CLOEXEC O_CLOEXEC int main (void) { int fd = syscall (__NR_epoll_create2, 1, 0); if (fd == -1) { puts ("epoll_create2(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("epoll_create2(0) set close-on-exec flag"); return 1; } close (fd); fd = syscall (__NR_epoll_create2, 1, EPOLL_CLOEXEC); if (fd == -1) { puts ("epoll_create2(EPOLL_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("epoll_create2(EPOLL_CLOEXEC) set close-on-exec flag"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:28 -07:00
Ulrich Drepper	11fcb6c146	flag parameters: timerfd_create The timerfd_create syscall already has a flags parameter. It just is unused so far. This patch changes this by introducing the TFD_CLOEXEC flag to set the close-on-exec flag for the returned file descriptor. A new name TFD_CLOEXEC is introduced which in this implementation must have the same value as O_CLOEXEC. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <time.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_timerfd_create # ifdef __x86_64__ # define __NR_timerfd_create 283 # elif defined __i386__ # define __NR_timerfd_create 322 # else # error "need __NR_timerfd_create" # endif #endif #define TFD_CLOEXEC O_CLOEXEC int main (void) { int fd = syscall (__NR_timerfd_create, CLOCK_REALTIME, 0); if (fd == -1) { puts ("timerfd_create(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("timerfd_create(0) set close-on-exec flag"); return 1; } close (fd); fd = syscall (__NR_timerfd_create, CLOCK_REALTIME, TFD_CLOEXEC); if (fd == -1) { puts ("timerfd_create(TFD_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("timerfd_create(TFD_CLOEXEC) set close-on-exec flag"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:27 -07:00
Ulrich Drepper	b087498eb5	flag parameters: eventfd This patch adds the new eventfd2 syscall. It extends the old eventfd syscall by one parameter which is meant to hold a flag value. In this patch the only flag support is EFD_CLOEXEC which causes the close-on-exec flag for the returned file descriptor to be set. A new name EFD_CLOEXEC is introduced which in this implementation must have the same value as O_CLOEXEC. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_eventfd2 # ifdef __x86_64__ # define __NR_eventfd2 290 # elif defined __i386__ # define __NR_eventfd2 328 # else # error "need __NR_eventfd2" # endif #endif #define EFD_CLOEXEC O_CLOEXEC int main (void) { int fd = syscall (__NR_eventfd2, 1, 0); if (fd == -1) { puts ("eventfd2(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("eventfd2(0) sets close-on-exec flag"); return 1; } close (fd); fd = syscall (__NR_eventfd2, 1, EFD_CLOEXEC); if (fd == -1) { puts ("eventfd2(EFD_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("eventfd2(EFD_CLOEXEC) does not set close-on-exec flag"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [akpm@linux-foundation.org: add sys_ni stub] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:27 -07:00
Ulrich Drepper	9deb27baed	flag parameters: signalfd This patch adds the new signalfd4 syscall. It extends the old signalfd syscall by one parameter which is meant to hold a flag value. In this patch the only flag support is SFD_CLOEXEC which causes the close-on-exec flag for the returned file descriptor to be set. A new name SFD_CLOEXEC is introduced which in this implementation must have the same value as O_CLOEXEC. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <signal.h> #include <stdio.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_signalfd4 # ifdef __x86_64__ # define __NR_signalfd4 289 # elif defined __i386__ # define __NR_signalfd4 327 # else # error "need __NR_signalfd4" # endif #endif #define SFD_CLOEXEC O_CLOEXEC int main (void) { sigset_t ss; sigemptyset (&ss); sigaddset (&ss, SIGUSR1); int fd = syscall (__NR_signalfd4, -1, &ss, 8, 0); if (fd == -1) { puts ("signalfd4(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("signalfd4(0) set close-on-exec flag"); return 1; } close (fd); fd = syscall (__NR_signalfd4, -1, &ss, 8, SFD_CLOEXEC); if (fd == -1) { puts ("signalfd4(SFD_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("signalfd4(SFD_CLOEXEC) does not set close-on-exec flag"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [akpm@linux-foundation.org: add sys_ni stub] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:27 -07:00
Ulrich Drepper	7d9dbca342	flag parameters: anon_inode_getfd extension This patch just extends the anon_inode_getfd interface to take an additional parameter with a flag value. The flag value is passed on to get_unused_fd_flags in anticipation for a use with the O_CLOEXEC flag. No actual semantic changes here, the changed callers all pass 0 for now. [akpm@linux-foundation.org: KVM fix] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:27 -07:00
Ulrich Drepper	c019bbc612	flag parameters: paccept w/out set_restore_sigmask Some platforms do not have support to restore the signal mask in the return path from a syscall. For those platforms syscalls like pselect are not defined at all. This is, I think, not a good choice for paccept() since paccept() adds more value on top of accept() than just the signal mask handling. Therefore this patch defines a scaled down version of the sys_paccept function for those platforms. It returns -EINVAL in case the signal mask is non-NULL but behaves the same otherwise. Note that I explicitly included <linux/thread_info.h>. I saw that it is currently included but indirectly two levels down. There is too much risk in relying on this. The header might change and then suddenly the function definition would change without anyone immediately noticing. Signed-off-by: Ulrich Drepper <drepper@redhat.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:27 -07:00
Ulrich Drepper	aaca0bdca5	flag parameters: paccept This patch is by far the most complex in the series. It adds a new syscall paccept. This syscall differs from accept in that it adds (at the userlevel) two additional parameters: - a signal mask - a flags value The flags parameter can be used to set flag like SOCK_CLOEXEC. This is imlpemented here as well. Some people argued that this is a property which should be inherited from the file desriptor for the server but this is against POSIX. Additionally, we really want the signal mask parameter as well (similar to pselect, ppoll, etc). So an interface change in inevitable. The flag value is the same as for socket and socketpair. I think diverging here will only create confusion. Similar to the filesystem interfaces where the use of the O_* constants differs, it is acceptable here. The signal mask is handled as for pselect etc. The mask is temporarily installed for the thread and removed before the call returns. I modeled the code after pselect. If there is a problem it's likely also in pselect. For architectures which use socketcall I maintained this interface instead of adding a system call. The symmetry shouldn't be broken. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <errno.h> #include <fcntl.h> #include <pthread.h> #include <signal.h> #include <stdio.h> #include <unistd.h> #include <netinet/in.h> #include <sys/socket.h> #include <sys/syscall.h> #ifndef __NR_paccept # ifdef __x86_64__ # define __NR_paccept 288 # elif defined __i386__ # define SYS_PACCEPT 18 # define USE_SOCKETCALL 1 # else # error "need __NR_paccept" # endif #endif #ifdef USE_SOCKETCALL # define paccept(fd, addr, addrlen, mask, flags) \ ({ long args[6] = { \ (long) fd, (long) addr, (long) addrlen, (long) mask, 8, (long) flags }; \ syscall (__NR_socketcall, SYS_PACCEPT, args); }) #else # define paccept(fd, addr, addrlen, mask, flags) \ syscall (__NR_paccept, fd, addr, addrlen, mask, 8, flags) #endif #define PORT 57392 #define SOCK_CLOEXEC O_CLOEXEC static pthread_barrier_t b; static void * tf (void arg) { pthread_barrier_wait (&b); int s = socket (AF_INET, SOCK_STREAM, 0); struct sockaddr_in sin; sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); sin.sin_port = htons (PORT); connect (s, (const struct sockaddr ) &sin, sizeof (sin)); close (s); pthread_barrier_wait (&b); s = socket (AF_INET, SOCK_STREAM, 0); sin.sin_port = htons (PORT); connect (s, (const struct sockaddr ) &sin, sizeof (sin)); close (s); pthread_barrier_wait (&b); pthread_barrier_wait (&b); sleep (2); pthread_kill ((pthread_t) arg, SIGUSR1); return NULL; } static void handler (int s) { } int main (void) { pthread_barrier_init (&b, NULL, 2); struct sockaddr_in sin; pthread_t th; if (pthread_create (&th, NULL, tf, (void ) pthread_self ()) != 0) { puts ("pthread_create failed"); return 1; } int s = socket (AF_INET, SOCK_STREAM, 0); int reuse = 1; setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse)); sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); sin.sin_port = htons (PORT); bind (s, (struct sockaddr *) &sin, sizeof (sin)); listen (s, SOMAXCONN); pthread_barrier_wait (&b); int s2 = paccept (s, NULL, 0, NULL, 0); if (s2 < 0) { puts ("paccept(0) failed"); return 1; } int coe = fcntl (s2, F_GETFD); if (coe & FD_CLOEXEC) { puts ("paccept(0) set close-on-exec-flag"); return 1; } close (s2); pthread_barrier_wait (&b); s2 = paccept (s, NULL, 0, NULL, SOCK_CLOEXEC); if (s2 < 0) { puts ("paccept(SOCK_CLOEXEC) failed"); return 1; } coe = fcntl (s2, F_GETFD); if ((coe & FD_CLOEXEC) == 0) { puts ("paccept(SOCK_CLOEXEC) does not set close-on-exec flag"); return 1; } close (s2); pthread_barrier_wait (&b); struct sigaction sa; sa.sa_handler = handler; sa.sa_flags = 0; sigemptyset (&sa.sa_mask); sigaction (SIGUSR1, &sa, NULL); sigset_t ss; pthread_sigmask (SIG_SETMASK, NULL, &ss); sigaddset (&ss, SIGUSR1); pthread_sigmask (SIG_SETMASK, &ss, NULL); sigdelset (&ss, SIGUSR1); alarm (4); pthread_barrier_wait (&b); errno = 0 ; s2 = paccept (s, NULL, 0, &ss, 0); if (s2 != -1 \|\| errno != EINTR) { puts ("paccept did not fail with EINTR"); return 1; } close (s); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [akpm@linux-foundation.org: make it compile] [akpm@linux-foundation.org: add sys_ni stub] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Roland McGrath <roland@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:27 -07:00
Ulrich Drepper	a677a039be	flag parameters: socket and socketpair This patch adds support for flag values which are ORed to the type passwd to socket and socketpair. The additional code is minimal. The flag values in this implementation can and must match the O_* flags. This avoids overhead in the conversion. The internal functions sock_alloc_fd and sock_map_fd get a new parameters and all callers are changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <netinet/in.h> #include <sys/socket.h> #define PORT 57392 /* For Linux these must be the same. */ #define SOCK_CLOEXEC O_CLOEXEC int main (void) { int fd; fd = socket (PF_INET, SOCK_STREAM, 0); if (fd == -1) { puts ("socket(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("socket(0) set close-on-exec flag"); return 1; } close (fd); fd = socket (PF_INET, SOCK_STREAM\|SOCK_CLOEXEC, 0); if (fd == -1) { puts ("socket(SOCK_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("socket(SOCK_CLOEXEC) does not set close-on-exec flag"); return 1; } close (fd); int fds[2]; if (socketpair (PF_UNIX, SOCK_STREAM, 0, fds) == -1) { puts ("socketpair(0) failed"); return 1; } for (int i = 0; i < 2; ++i) { coe = fcntl (fds[i], F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { printf ("socketpair(0) set close-on-exec flag for fds[%d]\n", i); return 1; } close (fds[i]); } if (socketpair (PF_UNIX, SOCK_STREAM\|SOCK_CLOEXEC, 0, fds) == -1) { puts ("socketpair(SOCK_CLOEXEC) failed"); return 1; } for (int i = 0; i < 2; ++i) { coe = fcntl (fds[i], F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { printf ("socketpair(SOCK_CLOEXEC) does not set close-on-exec flag for fds[%d]\n", i); return 1; } close (fds[i]); } puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:27 -07:00
Adrian Bunk	f606ddf42f	remove the v850 port Trying to compile the v850 port brings many compile errors, one of them exists since at least kernel 2.6.19. There also seems to be noone willing to bring this port back into a usable state. This patch therefore removes the v850 port. If anyone ever decides to revive the v850 port the code will still be available from older kernels, and it wouldn't be impossible for the port to reenter the kernel if it would become actively maintained again. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:24 -07:00
Shaohua Li	bdfe6b7c68	pm: acpi hibernation: utilize hardware signature ACPI defines a hardware signature. BIOS calculates the signature according to hardware configure and if hardware changes while hibernated, the signature will change. In that case, S4 resume should fail. Still, there may be systems on which this mechanism does not work correctly, so it is better to provide a workaround for them. For this reason, add a new switch to the acpi_sleep= command line argument allowing one to disable hardware signature checking. [shaohua.li@intel.com: build fix] Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Andi Kleen <andi@firstfloor.org> Cc: Len Brown <lenb@kernel.org> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: <Valdis.Kletnieks@vt.edu> Cc: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:24 -07:00
Zhang Rui	c1a220e7ac	pm: introduce new interfaces schedule_work_on() and queue_work_on() This interface allows adding a job on a specific cpu. Although a work struct on a cpu will be scheduled to other cpu if the cpu dies, there is a recursion if a work task tries to offline the cpu it's running on. we need to schedule the task to a specific cpu in this case. http://bugzilla.kernel.org/show_bug.cgi?id=10897 [oleg@tv-sign.ru: cleanups] Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Rus <harbour@sfinx.od.ua> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:23 -07:00
Alan Stern	8111d1b552	pm: add new PM_EVENT codes for runtime power transitions This patch (as1112) adds some new PM_EVENT_* codes for use by kernel subsystems. They describe runtime power-state transitions of the sort already implemented by the USB subsystem. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:23 -07:00
Rafael J. Wysocki	8c363265d5	pm: drop unnecessary includes from pm.h Drop unnecessary includes from include/linux/pm.h . Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:23 -07:00
Rafael J. Wysocki	e7ecb331e1	pm: remove remaining obsolete definitions from pm.h Remove the remaining obsolete definitions from include/linux/pm.h and move the definitions of PM_SUSPEND and PM_RESUME to the header of h3600 which is the only user of them. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:22 -07:00
Rafael J. Wysocki	558481f038	pm: remove definition of struct pm_dev Remove the definition of 'struct pm_dev', which is not used any more, along with some related stuff from include/linux/pm.h . Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:22 -07:00
Adrian Bunk	d75f65fd24	remove include/linux/pm_legacy.h Remove the obsolete and no longer used include/linux/pm_legacy.h Reviewed-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Pavel Machek <pavel@suse.cz> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:22 -07:00
Hugh Dickins	9b3e43a747	security: remove unused forwards Why would linux/security.h need forward declarations for nfsctl_arg and swap_info_struct? It's hard to imagine: remove them. Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:22 -07:00
Andrew G. Morgan	5459c164f0	security: protect legacy applications from executing with insufficient privilege When cap_bset suppresses some of the forced (fP) capabilities of a file, it is generally only safe to execute the program if it understands how to recognize it doesn't have enough privilege to work correctly. For legacy applications (fE!=0), which have no non-destructive way to determine that they are missing privilege, we fail to execute (EPERM) any executable that requires fP capabilities, but would otherwise get pP' < fP. This is a fail-safe permission check. For some discussion of why it is problematic for (legacy) privileged applications to run with less than the set of capabilities requested for them, see: http://userweb.kernel.org/~morgan/sendmail-capabilities-war-story.html With this iteration of this support, we do not include setuid-0 based privilege protection from the bounding set. That is, the admin can still (ab)use the bounding set to suppress the privileges of a setuid-0 program. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: cleanup] Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:22 -07:00
Gerald Schaefer	83d1674a94	mm: make CONFIG_MIGRATION available w/o CONFIG_NUMA We'd like to support CONFIG_MEMORY_HOTREMOVE on s390, which depends on CONFIG_MIGRATION. So far, CONFIG_MIGRATION is only available with NUMA support. This patch makes CONFIG_MIGRATION selectable for architectures that define ARCH_ENABLE_MEMORY_HOTREMOVE. When MIGRATION is enabled w/o NUMA, the kernel won't compile because migrate_vmas() does not know about vm_ops->migrate() and vma_migratable() does not know about policy_zone. To fix this, those two functions can be restricted to '#ifdef CONFIG_NUMA' because they are not being used w/o NUMA. vma_migratable() is moved over from migrate.h to mempolicy.h. [kosaki.motohiro@jp.fujitsu.com: build fix] Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: KOSAKI Motorhiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:21 -07:00
Milton Miller	9ca908f47b	kcalloc: remove runtime division While in all cases in the kernel we know the size of the elements to be created, we don't always know the count of elements. By commuting the size and count in the overflow check, the compiler can reduce the runtime division of size_t with a compare to a (unique) constant in these cases. Signed-off-by: Milton Miller <miltonm@bga.com> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:21 -07:00
Badari Pulavarty	5c755e9fd8	memory-hotplug: add sysfs removable attribute for hotplug memory remove Memory may be hot-removed on a per-memory-block basis, particularly on POWER where the SPARSEMEM section size often matches the memory-block size. A user-level agent must be able to identify which sections of memory are likely to be removable before attempting the potentially expensive operation. This patch adds a file called "removable" to the memory directory in sysfs to help such an agent. In this patch, a memory block is considered removable if; o It contains only MOVABLE pageblocks o It contains only pageblocks with free pages regardless of pageblock type On the other hand, a memory block starting with a PageReserved() page will never be considered removable. Without this patch, the user-agent is forced to choose a memory block to remove randomly. Sample output of the sysfs files: ./memory/memory0/removable: 0 ./memory/memory1/removable: 0 ./memory/memory2/removable: 0 ./memory/memory3/removable: 0 ./memory/memory4/removable: 0 ./memory/memory5/removable: 0 ./memory/memory6/removable: 0 ./memory/memory7/removable: 1 ./memory/memory8/removable: 0 ./memory/memory9/removable: 0 ./memory/memory10/removable: 0 ./memory/memory11/removable: 0 ./memory/memory12/removable: 0 ./memory/memory13/removable: 0 ./memory/memory14/removable: 0 ./memory/memory15/removable: 0 ./memory/memory16/removable: 0 ./memory/memory17/removable: 1 ./memory/memory18/removable: 1 ./memory/memory19/removable: 1 ./memory/memory20/removable: 1 ./memory/memory21/removable: 1 ./memory/memory22/removable: 1 Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:21 -07:00
Yasunori Goto	af370fb8cb	memory hotplug: small fixes to bootmem freeing for memory hotremove - Change some naming * Magic -> types * MIX_INFO -> MIX_SECTION_INFO * Change definition of bootmem type from direct hex value - __free_pages_bootmem() becomes __meminit. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:21 -07:00
Andrea Righi	27ac792ca0	PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architectures On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit boundary. For example: u64 val = PAGE_ALIGN(size); always returns a value < 4GB even if size is greater than 4GB. The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for example): #define PAGE_SHIFT 12 #define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT) #define PAGE_MASK (~(PAGE_SIZE-1)) ... #define PAGE_ALIGN(addr) (((addr)+PAGE_SIZE-1)&PAGE_MASK) The "~" is performed on a 32-bit value, so everything in "and" with PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary. Using the ALIGN() macro seems to be the right way, because it uses typeof(addr) for the mask. Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in include/linux/mm.h. See also lkml discussion: http://lkml.org/lkml/2008/6/11/237 [akpm@linux-foundation.org: fix drivers/media/video/uvc/uvc_queue.c] [akpm@linux-foundation.org: fix v850] [akpm@linux-foundation.org: fix powerpc] [akpm@linux-foundation.org: fix arm] [akpm@linux-foundation.org: fix mips] [akpm@linux-foundation.org: fix drivers/media/video/pvrusb2/pvrusb2-dvb.c] [akpm@linux-foundation.org: fix drivers/mtd/maps/uclinux.c] [akpm@linux-foundation.org: fix powerpc] Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:21 -07:00
Timur Tabi	2be0ffe2b2	mm: add alloc_pages_exact() and free_pages_exact() alloc_pages_exact() is similar to alloc_pages(), except that it allocates the minimum number of pages to fulfill the request. This is useful if you want to allocate a very large buffer that is slightly larger than an even power-of-two number of pages. In that case, alloc_pages() will waste a lot of memory. I have a video driver that wants to allocate a 5MB buffer. alloc_pages() wiill waste 3MB of physically-contiguous memory. Signed-off-by: Timur Tabi <timur@freescale.com> Cc: Andi Kleen <andi@firstfloor.org> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:20 -07:00
Johannes Weiner	3560e249ab	bootmem: replace node_boot_start in struct bootmem_data Almost all users of this field need a PFN instead of a physical address, so replace node_boot_start with node_min_pfn. [Lee.Schermerhorn@hp.com: fix spurious BUG_ON() in mark_bootmem()] Signed-off-by: Johannes Weiner <hannes@saeureba.de> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:20 -07:00
Johannes Weiner	5f2809e69c	bootmem: clean up alloc_bootmem_core alloc_bootmem_core has become quite nasty to read over time. This is a clean rewrite that keeps the semantics. bdata->last_pos has been dropped. bdata->last_success has been renamed to hint_idx and it is now an index relative to the node's range. Since further block searching might start at this index, it is now set to the end of a succeeded allocation rather than its beginning. bdata->last_offset has been renamed to last_end_off to be more clear that it represents the ending address of the last allocation relative to the node. [y-goto@jp.fujitsu.com: fix new alloc_bootmem_core()] Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:20 -07:00
Johannes Weiner	223e8dc924	bootmem: reorder code to match new bootmem structure This only reorders functions so that further patches will be easier to read. No code changed. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:19 -07:00
Jon Tollefson	53ba51d21d	hugetlb: allow arch overridden hugepage allocation Allow alloc_bootmem_huge_page() to be overridden by architectures that can't always use bootmem. This requires huge_boot_pages to be available for use by this function. This is required for powerpc 16G pages, which have to be reserved prior to boot-time. The location of these pages are indicated in the device tree. Acked-by: Adam Litke <agl@us.ibm.com> Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:19 -07:00
Andi Kleen	ceb8687961	hugetlb: introduce pud_huge Straight forward extensions for huge pages located in the PUD instead of PMDs. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:18 -07:00
Andi Kleen	b54bbf7b81	mm: introduce non panic alloc_bootmem Straight forward variant of the existing __alloc_bootmem_node, only subsequent patch when allocating giant hugepages at boot -- don't want to panic if we can't allocate as many as the user asked for. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:17 -07:00
Nishanth Aravamudan	a343787016	hugetlb: new sysfs interface Provide new hugepages user APIs that are more suited to multiple hstates in sysfs. There is a new directory, /sys/kernel/hugepages. Underneath that directory there will be a directory per-supported hugepage size, e.g.: /sys/kernel/hugepages/hugepages-64kB /sys/kernel/hugepages/hugepages-16384kB /sys/kernel/hugepages/hugepages-16777216kB corresponding to 64k, 16m and 16g respectively. Within each hugepages-size directory there are a number of files, corresponding to the tracked counters in the hstate, e.g.: /sys/kernel/hugepages/hugepages-64/nr_hugepages /sys/kernel/hugepages/hugepages-64/nr_overcommit_hugepages /sys/kernel/hugepages/hugepages-64/free_hugepages /sys/kernel/hugepages/hugepages-64/resv_hugepages /sys/kernel/hugepages/hugepages-64/surplus_hugepages Of these files, the first two are read-write and the latter three are read-only. The size of the hugepage being manipulated is trivially deducible from the enclosing directory and is always expressed in kB (to match meminfo). [dave@linux.vnet.ibm.com: fix build] [nacc@us.ibm.com: hugetlb: hang off of /sys/kernel/mm rather than /sys/kernel] [nacc@us.ibm.com: hugetlb: remove CONFIG_SYSFS dependency] Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:17 -07:00
Andi Kleen	a137e1cc6d	hugetlbfs: per mount huge page sizes Add the ability to configure the hugetlb hstate used on a per mount basis. - Add a new pagesize= option to the hugetlbfs mount that allows setting the page size - This option causes the mount code to find the hstate corresponding to the specified size, and sets up a pointer to the hstate in the mount's superblock. - Change the hstate accessors to use this information rather than the global_hstate they were using (requires a slight change in mm/memory.c so we don't NULL deref in the error-unmap path -- see comments). [np: take hstate out of hugetlbfs inode and vma->vm_private_data] Acked-by: Adam Litke <agl@us.ibm.com> Acked-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:17 -07:00
Andi Kleen	e5ff215941	hugetlb: multiple hstates for multiple page sizes Add basic support for more than one hstate in hugetlbfs. This is the key to supporting multiple hugetlbfs page sizes at once. - Rather than a single hstate, we now have an array, with an iterator - default_hstate continues to be the struct hstate which we use by default - Add functions for architectures to register new hstates [akpm@linux-foundation.org: coding-style fixes] Acked-by: Adam Litke <agl@us.ibm.com> Acked-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:17 -07:00
Andi Kleen	a551643895	hugetlb: modular state for hugetlb page size The goal of this patchset is to support multiple hugetlb page sizes. This is achieved by introducing a new struct hstate structure, which encapsulates the important hugetlb state and constants (eg. huge page size, number of huge pages currently allocated, etc). The hstate structure is then passed around the code which requires these fields, they will do the right thing regardless of the exact hstate they are operating on. This patch adds the hstate structure, with a single global instance of it (default_hstate), and does the basic work of converting hugetlb to use the hstate. Future patches will add more hstate structures to allow for different hugetlbfs mounts to have different page sizes. [akpm@linux-foundation.org: coding-style fixes] Acked-by: Adam Litke <agl@us.ibm.com> Acked-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:17 -07:00
Nishanth Aravamudan	ff7ea79cf7	mm: create /sys/kernel/mm Add a kobject to create /sys/kernel/mm when sysfs is mounted. The kobject will exist regardless. This will allow for the hugepage related sysfs directories to exist under the mm "subsystem" directory. Add an ABI file appropriately. [kosaki.motohiro@jp.fujitsu.com: fix build] Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:17 -07:00
Andy Whitcroft	cdfd4325c0	mm: record MAP_NORESERVE status on vmas and fix small page mprotect reservations With Mel's hugetlb private reservation support patches applied, strict overcommit semantics are applied to both shared and private huge page mappings. This can be a problem if an application relied on unlimited overcommit semantics for private mappings. An example of this would be an application which maps a huge area with the intention of using it very sparsely. These application would benefit from being able to opt-out of the strict overcommit. It should be noted that prior to hugetlb supporting demand faulting all mappings were fully populated and so applications of this type should be rare. This patch stack implements the MAP_NORESERVE mmap() flag for huge page mappings. This flag has the same meaning as for small page mappings, suppressing reservations for that mapping. Thanks to Mel Gorman for reviewing a number of early versions of these patches. This patch: When a small page mapping is created with mmap() reservations are created by default for any memory pages required. When the region is read/write the reservation is increased for every page, no reservation is needed for read-only regions (as they implicitly share the zero page). Reservations are tracked via the VM_ACCOUNT vma flag which is present when the region has reservation backing it. When we convert a region from read-only to read-write new reservations are aquired and VM_ACCOUNT is set. However, when a read-only map is created with MAP_NORESERVE it is indistinguishable from a normal mapping. When we then convert that to read/write we are forced to incorrectly create reservations for it as we have no record of the original MAP_NORESERVE. This patch introduces a new vma flag VM_NORESERVE which records the presence of the original MAP_NORESERVE flag. This allows us to distinguish these two circumstances and correctly account the reserve. As well as fixing this FIXME in the code, this makes it much easier to introduce MAP_NORESERVE support for huge pages as this flag is available consistantly for the life of the mapping. VM_ACCOUNT on the other hand is heavily used at the generic level in association with small pages. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Adam Litke <agl@us.ibm.com> Cc: Johannes Weiner <hannes@saeurebad.de> Cc: Andy Whitcroft <apw@shadowen.org> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:16 -07:00
Mel Gorman	04f2cbe356	hugetlb: guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed After patch 2 in this series, a process that successfully calls mmap() for a MAP_PRIVATE mapping will be guaranteed to successfully fault until a process calls fork(). At that point, the next write fault from the parent could fail due to COW if the child still has a reference. We only reserve pages for the parent but a copy must be made to avoid leaking data from the parent to the child after fork(). Reserves could be taken for both parent and child at fork time to guarantee faults but if the mapping is large it is highly likely we will not have sufficient pages for the reservation, and it is common to fork only to exec() immediatly after. A failure here would be very undesirable. Note that the current behaviour of mainline with MAP_PRIVATE pages is pretty bad. The following situation is allowed to occur today. 1. Process calls mmap(MAP_PRIVATE) 2. Process calls mlock() to fault all pages and makes sure it succeeds 3. Process forks() 4. Process writes to MAP_PRIVATE mapping while child still exists 5. If the COW fails at this point, the process gets SIGKILLed even though it had taken care to ensure the pages existed This patch improves the situation by guaranteeing the reliability of the process that successfully calls mmap(). When the parent performs COW, it will try to satisfy the allocation without using reserves. If that fails the parent will steal the page leaving any children without a page. Faults from the child after that point will result in failure. If the child COW happens first, an attempt will be made to allocate the page without reserves and the child will get SIGKILLed on failure. To summarise the new behaviour: 1. If the original mapper performs COW on a private mapping with multiple references, it will attempt to allocate a hugepage from the pool or the buddy allocator without using the existing reserves. On fail, VMAs mapping the same area are traversed and the page being COW'd is unmapped where found. It will then steal the original page as the last mapper in the normal way. 2. The VMAs the pages were unmapped from are flagged to note that pages with data no longer exist. Future no-page faults on those VMAs will terminate the process as otherwise it would appear that data was corrupted. A warning is printed to the console that this situation occured. 2. If the child performs COW first, it will attempt to satisfy the COW from the pool if there are enough pages or via the buddy allocator if overcommit is allowed and the buddy allocator can satisfy the request. If it fails, the child will be killed. If the pool is large enough, existing applications will not notice that the reserves were a factor. Existing applications depending on the no-reserves been set are unlikely to exist as for much of the history of hugetlbfs, pages were prefaulted at mmap(), allocating the pages at that point or failing the mmap(). [npiggin@suse.de: fix CONFIG_HUGETLB=n build] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:16 -07:00
Mel Gorman	a1e78772d7	hugetlb: reserve huge pages for reliable MAP_PRIVATE hugetlbfs mappings until fork() This patch reserves huge pages at mmap() time for MAP_PRIVATE mappings in a similar manner to the reservations taken for MAP_SHARED mappings. The reserve count is accounted both globally and on a per-VMA basis for private mappings. This guarantees that a process that successfully calls mmap() will successfully fault all pages in the future unless fork() is called. The characteristics of private mappings of hugetlbfs files behaviour after this patch are; 1. The process calling mmap() is guaranteed to succeed all future faults until it forks(). 2. On fork(), the parent may die due to SIGKILL on writes to the private mapping if enough pages are not available for the COW. For reasonably reliable behaviour in the face of a small huge page pool, children of hugepage-aware processes should not reference the mappings; such as might occur when fork()ing to exec(). 3. On fork(), the child VMAs inherit no reserves. Reads on pages already faulted by the parent will succeed. Successful writes will depend on enough huge pages being free in the pool. 4. Quotas of the hugetlbfs mount are checked at reserve time for the mapper and at fault time otherwise. Before this patch, all reads or writes in the child potentially needs page allocations that can later lead to the death of the parent. This applies to reads and writes of uninstantiated pages as well as COW. After the patch it is only a write to an instantiated page that causes problems. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:16 -07:00
Johannes Weiner	9109fb7b35	mm: drop unneeded pgdat argument from free_area_init_node() free_area_init_node() gets passed in the node id as well as the node descriptor. This is redundant as the function can trivially get the node descriptor itself by means of NODE_DATA() and the node's id. I checked all the users and NODE_DATA() seems to be usable everywhere from where this function is called. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:16 -07:00
Andrew Morton	2185e69f68	mapping_set_error: add unlikely() This is called on a per-page basis and in the vast majority of cases `error' is zero. Cc: Guillaume Chazarain <guichaz@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:15 -07:00
Andy Whitcroft	9023cb7e85	slob: record page flag overlays explicitly SLOB reuses two page bits for internal purposes, it overlays PG_active and PG_private. This is hidden away in slob.c. Document these overlays explicitly in the main page-flags enum along with all the others. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:15 -07:00
Andy Whitcroft	8a38082d21	slub: record page flag overlays explicitly SLUB reuses two page bits for internal purposes, it overlays PG_active and PG_error. This is hidden away in slub.c. Document these overlays explicitly in the main page-flags enum along with all the others. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:15 -07:00
Andy Whitcroft	0cad47cf13	page-flags: record page flag overlays explicitly With the recent page flag reorganisation we have a single enum which defines the valid page flags and their values, nice and clear. However there are a number of bits which are overloaded by different subsystems. Firstly there is PG_owner_priv_1 which is used by filesystems and by XEN. Secondly both SLOB and SLUB use a couple of extra page bits to manage internal state for pages they own; both overlay other bits. All of these "aliases" are scattered about the source making it very hard for a reader to know if the bits are safe to rely on in all contexts; confusion here is bad. As we now have a single place where the bits are clearly assigned it makes sense to clarify the reuse of bits by making the aliases explicit and visible with the original bit assignments. This patch creates explicit aliases within the enum itself for the overloaded bits, creates standard bit accessors PageFoo etc. and uses those throughout. This version pulls the bit manipulation out to standard named page bit accessors as suggested by Christoph, it retains the explicit mapping to the overlayed bits. A fusion of both ideas. This has been SLUB and SLOB have been compile tested on x86_64 only, and SLUB boot tested. If people feel this is worth doing then I can run a fuller set of testing. This patch: Some page flags are used for more than one purpose, for example PG_owner_priv_1. Currently there are individual accessors for each user, each built using the common flag name far away from the bit definitions. This makes it hard to see all possible uses of these bits. Now that we have a single enum to generate the bit orders it makes sense to express overlays in the same place. So create per use aliases for this bit in the main page-flags enum and use those in the accessors. [akpm@linux-foundation.org: fix xen] Signed-off-by: Andy Whitcroft <apw@shadowen.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:15 -07:00
Kentaro Makita	da3bbdd463	fix soft lock up at NFS mount via per-SB LRU-list of unused dentries [Summary] Split LRU-list of unused dentries to one per superblock to avoid soft lock up during NFS mounts and remounting of any filesystem. Previously I posted here: http://lkml.org/lkml/2008/3/5/590 [Descriptions] - background dentry_unused is a list of dentries which are not referenced. dentry_unused grows up when references on directories or files are released. This list can be very long if there is huge free memory. - the problem When shrink_dcache_sb() is called, it scans all dentry_unused linearly under spin_lock(), and if dentry->d_sb is differnt from given superblock, scan next dentry. This scan costs very much if there are many entries, and very ineffective if there are many superblocks. IOW, When we need to shrink unused dentries on one dentry, but scans unused dentries on all superblocks in the system. For example, we scan 500 dentries to unmount a filesystem, but scans 1,000,000 or more unused dentries on other superblocks. In our case , At mounting NFS, shrink_dcache_sb() is called to shrink unused dentries on NFS, but scans 100,000,000 unused dentries on superblocks in the system such as local ext3 filesystems. I hear NFS mounting took 1 min on some system in use. : NFS uses virtual filesystem in rpc layer, so NFS is affected by this problem. 100,000,000 is possible number on large systems. Per-superblock LRU of unused dentried can reduce the cost in reasonable manner. - How to fix I found this problem is solved by David Chinner's "Per-superblock unused dentry LRU lists V3"(1), so I rebase it and add some fix to reclaim with fairness, which is in Andrew Morton's comments(2). 1) http://lkml.org/lkml/2006/5/25/318 2) http://lkml.org/lkml/2006/5/25/320 Split LRU-list of unused dentries to each superblocks. Then, NFS mounting will check dentries under a superblock instead of all. But this spliting will break LRU of dentry-unused. So, I've attempted to make reclaim unused dentrins with fairness by calculate number of dentries to scan on this sb based on following way number of dentries to scan on this sb = count * (number of dentries on this sb / number of dentries in the machine) - ToDo - I have to measuring performance number and do stress tests. - When unmount occurs during prune_dcache(), scanning on same superblock, It is unable to reach next superblock because it is gone away. We restart scannig superblock from first one, it causes unfairness of reclaim unused dentries on first superblock. But I think this happens very rarely. - Test Results Result on 6GB boxes with excessive unused dentries. Without patch: $ cat /proc/sys/fs/dentry-state 10181835 10180203 45 0 0 0 # mount -t nfs 10.124.60.70:/work/kernel-src nfs real 0m1.830s user 0m0.001s sys 0m1.653s With this patch: $ cat /proc/sys/fs/dentry-state 10236610 10234751 45 0 0 0 # mount -t nfs 10.124.60.70:/work/kernel-src nfs real 0m0.106s user 0m0.002s sys 0m0.032s [akpm@linux-foundation.org: fix comments] Signed-off-by: Kentaro Makita <k-makita@np.css.fujitsu.com> Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: David Chinner <dgc@sgi.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:15 -07:00
Jan Beulich	42b7772812	mm: remove double indirection on tlb parameter to free_pgd_range() & Co The double indirection here is not needed anywhere and hence (at least) confusing. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:15 -07:00
Rik van Riel	28b2ee20c7	access_process_vm device memory infrastructure In order to be able to debug things like the X server and programs using the PPC Cell SPUs, the debugger needs to be able to access device memory through ptrace and /proc/pid/mem. This patch: Add the generic_access_phys access function and put the hooks in place to allow access_process_vm to access device or PPC Cell SPU memory. [riel@redhat.com: Add documentation for the vm_ops->access function] Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Benjamin Herrensmidt <benh@kernel.crashing.org> Cc: Dave Airlie <airlied@linux.ie> Cc: Hugh Dickins <hugh@veritas.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:15 -07:00
Nick Piggin	0d71d10a42	mm: remove nopfn There are no users of nopfn in the tree. Remove it. [hugh@veritas.com: fix build error] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:15 -07:00
Adrian Bunk	c748e1340e	mm/vmstat.c: proper externs This patch adds proper extern declarations for five variables in include/linux/vmstat.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:14 -07:00
KOSAKI Motohiro	e4048e5dc4	page allocator: inline some __alloc_pages() wrappers Two zonelist patch series rewrote __page_alloc() largely. Now, it is just a wrapper function. Inlining them will save a function call. [akpm@linux-foundation.org: export __alloc_pages_internal] Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:14 -07:00
Johannes Weiner	ffc6421f07	mm: unexport __alloc_bootmem_core() This function has no external callers, so unexport it. Also fix its naming inconsistency. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:14 -07:00
Johannes Weiner	b61bfa3c46	mm: move bootmem descriptors definition to a single place There are a lot of places that define either a single bootmem descriptor or an array of them. Use only one central array with MAX_NUMNODES items instead. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kyle McMartin <kyle@parisc-linux.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David S. Miller <davem@davemloft.net> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:14 -07:00
FUJITA Tomonori	8b05c7e6e1	add a helper function to test if an object is on the stack lib/debugobjects.c has a function to test if an object is on the stack. The block layer and ide needs it (they need to avoid DMA from/to stack buffers). This patch moves the function to include/linux/sched.h so that everyone can use it. lib/debugobjects.c uses current->stack but this patch uses a task_stack_page() accessor, which is a preferable way to access the stack. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:14 -07:00
Akinobu Mita	e108526e77	move memory_read_from_buffer() from fs.h to string.h James Bottomley warns that inclusion of linux/fs.h in a low level driver was always a danger signal. This patch moves memory_read_from_buffer() from fs.h to string.h and fixes includes in existing memory_read_from_buffer() users. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Bob Moore <robert.moore@intel.com> Cc: Thomas Renninger <trenn@suse.de> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:13 -07:00
Matthew Wilcox	b552068999	Remove __DECLARE_SEMAPHORE_GENERIC There are no users of __DECLARE_SEMAPHORE_GENERIC in the kernel Signed-off-by: Matthew Wilcox <willy@linux.intel.com>	2008-07-24 08:31:21 -04:00
Artem Bityutskiy	85c6e6e282	UBI: amend commentaries Hch asked not to use "unit" for sub-systems, let it be so. Also some other commentaries modifications. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2008-07-24 13:32:56 +03:00
Artem Bityutskiy	a5bf619041	UBI: add ubi_sync() interface To flush MTD device caches. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2008-07-24 13:32:56 +03:00
Linus Torvalds	7f9dce3837	Merge branch 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: hrtick_enabled() should use cpu_active() sched, x86: clean up hrtick implementation sched: fix build error, provide partition_sched_domains() unconditionally sched: fix warning in inc_rt_tasks() to not declare variable 'rq' if it's not needed cpu hotplug: Make cpu_active_map synchronization dependency clear cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2) sched: rework of "prioritize non-migratable tasks over migratable ones" sched: reduce stack size in isolated_cpu_setup() Revert parts of "ftrace: do not trace scheduler functions" Fixed up conflicts in include/asm-x86/thread_info.h (due to the TIF_SINGLESTEP unification vs TIF_HRTICK_RESCHED removal) and kernel/sched_fair.c (due to cpu_active_map vs for_each_cpu_mask_nr() introduction).	2008-07-23 19:36:53 -07:00
Linus Torvalds	26dcce0fab	Merge branch 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits) NR_CPUS: Replace NR_CPUS in speedstep-centrino.c cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP NR_CPUS: Replace NR_CPUS in cpufreq userspace routines NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target cpumask: Provide a generic set of CPUMASK_ALLOC macros cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr Revert "cpumask: introduce new APIs" cpumask: make for_each_cpu_mask a bit smaller net: Pass reference to cpumask variable in net/sunrpc/svc.c ... Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually	2008-07-23 18:37:44 -07:00
Linus Torvalds	d7b6de14a0	Merge branch 'core/softlockup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core/softlockup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: softlockup: fix invalid proc_handler for softlockup_panic softlockup: fix watchdog task wakeup frequency softlockup: fix watchdog task wakeup frequency softlockup: show irqtrace softlockup: print a module list on being stuck softlockup: fix NMI hangs due to lock race - 2.6.26-rc regression softlockup: fix false positives on nohz if CPU is 100% idle for more than 60 seconds softlockup: fix softlockup_thresh fix softlockup: fix softlockup_thresh unaligned access and disable detection at runtime softlockup: allow panic on lockup	2008-07-23 18:34:13 -07:00
Linus Torvalds	30d38542ec	Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm * 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm: (85 commits) [ARM] pxa: add base support for PXA930 Handheld Platform (aka SAAR) [ARM] pxa: add base support for PXA930 Evaluation Board (aka TavorEVB) [ARM] pxa: add base support for PXA930 (aka Tavor-P) [ARM] Update mach-types [ARM] pxa: make littleton to use the new smc91x platform data [ARM] pxa: make zylonite to use the new smc91x platform data [ARM] pxa: make mainstone to use the new smc91x platform data [ARM] pxa: make lubbock to use new smc91x platform data [NET] smc91x: prepare SMC_USE_PXA_DMA to be specified in platform data [NET] smc91x: prepare for SMC_IO_SHIFT to be a platform configurable variable [NET] smc91x: add SMC91X_NOWAIT flag to platform data [NET] smc91x: favor the use of SMC91X_USE_* instead of SMC_CAN_USE_* [NET] smc91x: remove "irq_flags" from "struct smc91x_platdata" [ARM] 5146/1: pxa2xx: convert all boards to call pxa2xx_transceiver_mode helper Support for LCD on e740 e750 e400 and e800 e-series PDAs E-series UDC support PXA UDC - allow use of inverted GPIO for pullup Add e350 support Fix broken e-series build E-series GPIO / IRQ definitions. ...	2008-07-23 18:24:08 -07:00
Dan Williams	d8e64406a0	md: delay notification of 'active_idle' to the recovery thread sysfs_notify might sleep, so do not call it from md_safemode_timeout. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-07-23 13:09:48 -07:00
Linus Torvalds	20b7997e8a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc: sdhci: highmem capable PIO routines sg: reimplement sg mapping iterator mmc_test: print message when attaching to card mmc: Remove Russell as primecell mci maintainer mmc_block: bounce buffer highmem support sdhci: fix bad warning from commit `c8b3e02` sdhci: add warnings for bad buffers in ADMA path mmc_test: test oversized sg lists mmc_test: highmem tests s3cmci: ensure host stopped on machine shutdown au1xmmc: suspend/resume implementation s3cmci: fixes for section mismatch warnings pxamci: trivial fix of DMA alignment register bit clearing	2008-07-23 12:04:34 -07:00
Linus Torvalds	5554b35933	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (24 commits) I/OAT: I/OAT version 3.0 support I/OAT: tcp_dma_copybreak default value dependent on I/OAT version I/OAT: Add watchdog/reset functionality to ioatdma iop_adma: cleanup iop_chan_xor_slot_count iop_adma: document how to calculate the minimum descriptor pool size iop_adma: directly reclaim descriptors on allocation failure async_tx: make async_tx_test_ack a boolean routine async_tx: remove depend_tx from async_tx_sync_epilog async_tx: export async_tx_quiesce async_tx: fix handling of the "out of descriptor" condition in async_xor async_tx: ensure the xor destination buffer remains dma-mapped async_tx: list_for_each_entry_rcu() cleanup dmaengine: Driver for the Synopsys DesignWare DMA controller dmaengine: Add slave DMA interface dmaengine: add DMA_COMPL_SKIP_{SRC,DEST}_UNMAP flags to control dma unmap dmaengine: Add dma_client parameter to device_alloc_chan_resources dmatest: Simple DMA memcpy test client dmaengine: DMA engine driver for Marvell XOR engine iop-adma: fix platform driver hotplug/coldplug dmaengine: track the number of clients using a channel ... Fixed up conflict in drivers/dca/dca-sysfs.c manually	2008-07-23 12:03:18 -07:00
Linus Torvalds	e669e8179d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (60 commits) ide: small whitespace fixes ide: ide-cd_ioctl.c fix sparse integer as NULL pointer warnings ide: ide-cd.c fix sparse endianness warnings ide-cd: convert to using the new atapi_flags ide: remove unused PC_FLAG_DRQ_INTERRUPT ide-scsi: convert to using the new atapi_flags ide-tape: convert to using the new atapi_flags ide-floppy: convert to using the new atapi_flags (take 2) ide: add per-device flags ide: use rq->cmd instead of pc->c in atapi common code ide-scsi: pass packet command in rq->cmd ide-tape: pass packet command in rq->cmd ide-tape: make room for packet command ids in rq->cmd ide-floppy: pass packet command in rq->cmd ide: remove pc->callback member from ide_atapi_pc ide-scsi: use drive->pc_callback instead of pc->callback ide-tape: use drive->pc_callback instead of pc->callback ide-floppy: use drive->pc_callback instead of pc->callback ide: push pc callback pointer into the ide_drive_t structure drivers/ide/ide-tape.c: remove double kfree ...	2008-07-23 11:59:09 -07:00
Dmitry Torokhov	a822bea796	Input: serio - mark serio_register_driver() __must_check Also remove extra declaration of serio_register_driver(). Signed-off-by: Dmitry Torokhov <dtor@mail.ru>	2008-07-23 14:01:49 -04:00
Borislav Petkov	ac77ef8b03	ide: remove unused PC_FLAG_DRQ_INTERRUPT There should be no functionality change resulting from this patch. Signed-off-by: Borislav Petkov <petkovbb@gmail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:56:01 +02:00
Borislav Petkov	ea68d270ff	ide-floppy: convert to using the new atapi_flags (take 2) while at it, remove PC_FLAG_ZIP_DRIVE from the packed command flags altogether and query the drive type through drive->atapi_flags. v2: ide-floppy fix. There should be no functionality change resulting from this patch. [bart: IDE_FLAG_* -> IDE_AFLAG_*, dev_flags -> atapi_flags] Signed-off-by: Borislav Petkov <petkovbb@gmail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:56:01 +02:00
Borislav Petkov	3b8ac5398c	ide: add per-device flags Push device flags up into ide_drive_t. There should be no functionality change resulting from this patch. [bart: IDE_FLAG_* -> IDE_AFLAG_*, dev_flags -> atapi_flags] Signed-off-by: Borislav Petkov <petkovbb@gmail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:56:01 +02:00
Borislav Petkov	8bcda3bc49	ide: remove pc->callback member from ide_atapi_pc There should be no functionality change resulting from this patch. Signed-off-by: Borislav Petkov <petkovbb@gmail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:56:00 +02:00
Borislav Petkov	d7c26ebb5b	ide: push pc callback pointer into the ide_drive_t structure Refrain from carrying the callback ptr with every packet command since the callback function is only one anyways. ide_drive_t is probably not the most suitable place for it right now but is the more sane solution. Besides, these structs are going to be reorganized anyways during the generic ide rewrite. Signed-off-by: Borislav Petkov <petkovbb@gmail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:55:59 +02:00
Bartlomiej Zolnierkiewicz	8a69580e1e	ide: add ide_host_free() helper (take 2) * Add ide_host_free() helper and convert ide_host_remove() to use it. * Fix handling of ide_host_register() failure in ide_host_add(), icside.c, ide-generic.c, falconide.c and sgiioc4.c. While at it: * Fix handling of ide_host_alloc_all() failure in ide-generic.c. * Fix handling of ide_host_alloc() failure in falconide.c (also return the correct error value if no device is found). v2: * falconide build fix. (From Stephen Rothwell) Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:55:59 +02:00
Bartlomiej Zolnierkiewicz	6f904d0152	ide: add ide_host_add() helper Add ide_host_add() helper which does ide_host_alloc()+ide_host_register(), then convert ide_setup_pci_device[s](), ide_legacy_device_add() and some host drivers to use it. While at it: * Fix ide_setup_pci_device[s](), ide_arm.c, gayle.c, ide-4drives.c, macide.c, q40ide.c, cmd640.c and cs5520.c to return correct error value. * -ENOENT -> -ENOMEM in rapide.c, ide-h8300.c, ide-generic.c, au1xxx-ide.c and pmac.c * -ENODEV -> -ENOMEM in palm_bk3710.c, ide_platform.c and delkin_cb.c * -1 -> -ENOMEM in ide-pnp.c Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:55:57 +02:00
Bartlomiej Zolnierkiewicz	48c3c10726	ide: add struct ide_host (take 3) * Add struct ide_host which keeps pointers to host's ports. * Add ide_host_alloc[_all]() and ide_host_remove() helpers. * Pass 'struct ide_host host' instead of 'u8 idx' to ide_device_add[_all]() and rename it to ide_host_register[_all](). * Convert host drivers and core code to use struct ide_host. * Remove no longer needed ide_find_port(). * Make ide_find_port_slot() static. * Unexport ide_unregister(). v2: * Add missing 'struct ide_host host' to macide.c. v3: Fix build problem in pmac.c (s/ide_alloc_host/ide_host_alloc/) (Noticed by Stephen Rothwell). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:55:57 +02:00
Bartlomiej Zolnierkiewicz	374e042c3e	ide: add struct ide_tp_ops (take 2) * Add struct ide_tp_ops for transport methods. * Add 'const struct ide_tp_ops tp_ops' to struct ide_port_info and ide_hwif_t. Set the default hwif->tp_ops in ide_init_port_data(). * Set host driver specific hwif->tp_ops in ide_init_port(). * Export ide_exec_command(), ide_read_status(), ide_read_altstatus(), ide_read_sff_dma_status(), ide_set_irq(), ide_tf_{load,read}() and ata_{in,out}put_data(). * Convert host drivers and core code to use struct ide_tp_ops. * Remove no longer needed default_hwif_transport(). * Cleanup ide_hwif_t from methods that are now in struct ide_tp_ops. While at it: * Use struct ide_port_info in falconide.c and q40ide.c. * Rename ata_{in,out}put_data() to ide_{in,out}put_data(). v2: * Fix missing convertion in ns87415.c. There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:55:56 +02:00
Bartlomiej Zolnierkiewicz	d6276b5f5c	ide: add 'config' field to hw_regs_t Add 'config' field to hw_regs_t and use it to set hwif->config_data in ide_init_port_hw(), then convert ide_legacy_init_one() to use hw->config. There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:55:56 +02:00
Bartlomiej Zolnierkiewicz	3b2a5c7149	ide: filter out "default" transfer mode values in set_xfer_rate() * Filter out "default" transfer mode values (0x00 - default PIO mode, 0x01 - default PIO mode w/ IORDY disabled) in write handler for obsoleted /proc/ide/hd?/settings:current_speed setting. Allowing "default" transfer mode values is a dangerous thing to do as we don't support programming controller to the "default" transfer mode and devices often use different values for the default and maximum PIO mode (i.e. PIO2 default and PIO4 maximum) so the controller will stay programmed for higher PIO mode while device will use the lower PIO mode. There is no functionality loss as by using special IOCTLs device can still be programmed to "default" transfer modes (it is only useful for debugging/testing purposes anyway). * Remove no longer needed IDE_HFLAG_ABUSE_SET_DMA_MODE host flag, it was previously used by few host drivers to program the controller to PIO0 timings for "default" transfer mode == 0x01 (although some host drivers would program invalid PIO timings instead). * Cleanup ide_set_xfer_rate() and add BUG_ON(). Suggested-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:55:56 +02:00
Bartlomiej Zolnierkiewicz	ba4b2e607e	ide: remove dead Virtual DMA support Lets remove dead Virtual DMA support for now so it doesn't clutter core IDE code (it can be bring back when there is a need for it): * Remove IDE_HFLAG_VDMA host flag. * Remove ide_drive_t.vdma flag. * cs5520.c: remove stale FIXMEs, cs5520_dma_host_set() and cs5520_dma_ops (also there is no longer a need to set IDE_HFLAG_NO_ATAPI_DMA). There should be no functional changes caused by this patch. Cc: TAKADA Yoshihito <takada@mbf.nifty.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:55:55 +02:00
Bartlomiej Zolnierkiewicz	761052e676	ide: remove ->INB, ->OUTB and ->OUTBSYNC methods * Remove no longer needed ->INB, ->OUTB and ->OUTBSYNC methods. Then: * Remove no longer used default_hwif_[mm]iops() and ide_[mm_]outbsync(). * Cleanup SuperIO handling in ns87415.c. There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:55:54 +02:00
Bartlomiej Zolnierkiewicz	1823649b5a	ide: add ide_read_bcount_and_ireason() helper Add ide_read_bcount_and_ireason() helper and use it instead of ->INB in {cdrom_newpc,ide_pc}_intr(). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:55:54 +02:00
Bartlomiej Zolnierkiewicz	92eb43800a	ide: use ->tf_read in ide_read_error() * Add IDE_TFLAG_IN_FEATURE taskfile flag for reading Feature register and handle it in ->tf_read. * Convert ide_read_error() to use ->tf_read instead of ->INB, then uninline and export it. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:55:53 +02:00
Bartlomiej Zolnierkiewicz	6e6afb3b74	ide: add ->set_irq method Add ->set_irq method for setting nIEN bit of ATA Device Control register and use it instead of ide_set_irq(). While at it: * Use ->set_irq in init_irq() and do_reset1(). * Don't use HWIF() macro in ide_check_pm_state(). There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:55:52 +02:00
Bartlomiej Zolnierkiewicz	1f6d8a0fd8	ide: add ->read_altstatus method * Remove ide_read_altstatus() inline helper. * Add ->read_altstatus method for reading ATA Alternate Status register and use it instead of ->INB. There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:55:52 +02:00
Bartlomiej Zolnierkiewicz	b73c7ee25d	ide: add ->read_status method * Remove ide_read_status() inline helper. * Add ->read_status method for reading ATA Status register and use it instead of ->INB. While at it: * Don't use HWGROUP() macro. There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:55:52 +02:00
Bartlomiej Zolnierkiewicz	c6dfa867bb	ide: add ->exec_command method Add ->exec_command method for writing ATA Command register and use it instead of ->OUTBSYNC. There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:55:51 +02:00
Bartlomiej Zolnierkiewicz	ebb00fb55d	ide: factor out simplex handling from ide_pci_dma_base() * Factor out simplex handling from ide_pci_dma_base() to ide_pci_check_simplex(). * Set hwif->dma_base early in ->init_dma method / ide_hwif_setup_dma() and reset it in ide_init_port() if DMA initialization fails. * Use ->read_sff_dma_status instead of ->INB in ide_pci_dma_base(). There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:55:51 +02:00
Bartlomiej Zolnierkiewicz	81e8d5a34f	ide: remove ide_setup_dma() Export sff_dma_ops and then remove ide_setup_dma(). There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:55:51 +02:00
Bartlomiej Zolnierkiewicz	cab7f8eda4	ide: remove ->dma_{status,command} fields from ide_hwif_t * Use ->dma_base + offset instead of ->dma_{status,command} and remove no longer needed ->dma_{status,command}. While at it: * Use ATA_DMA_* defines. There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:55:51 +02:00
Bartlomiej Zolnierkiewicz	b2f951aabc	ide: add ->read_sff_dma_status method Add ->read_sff_dma_status method for reading DMA Status register and use it instead of ->INB. While at it: * Use inb() directly in ns87415.c::ns87415_dma_end(). There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:55:50 +02:00
Bartlomiej Zolnierkiewicz	c97c6aca75	ide: pass hw_regs_t-s to ide_device_add[_all]() (take 3) * Add 'hw_regs_t *hws' argument to ide_device_add[_all]() and convert host drivers + ide_legacy_init_one() + ide_setup_pci_device[s]() to use it instead of calling ide_init_port_hw() directly. [ However if host has > 1 port we must still set hwif->chipset to hint consecutive ide_find_port() call that the previous slot is occupied. ] Unexport ide_init_port_hw(). v2: * Use defines instead of hard-coded values in buddha.c, gayle.c and q40ide.c. (Suggested by Geert Uytterhoeven) * Better patch description. v3: * Fix build problem in ide-cs.c. (Noticed by Stephen Rothwell) There should be no functional changes caused by this patch. Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-23 19:55:50 +02:00

... 5 6 7 8 9 ...

12086 Commits