It only really needs to define a few constants and include <asm/mman.h>
when it's used by userspace. Move the rest within #ifdef __KERNEL__
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
It was unconditionally including a whole bunch of headers which aren't
user-visible, and also exposing a lot of private internal stuff of its
own. Also fix some legacy character set to UTF-8 while we're at it.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
It doesn't need to include i2c.h, because a forward declaration of
struct i2c_adapter is perfectly sufficient. And it can be inside
#ifdef __KERNEL__ along with the kernel-internal structure definition.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Some (?) non-x86 architectures require 8byte alignment for u_int64_t
even when compiled for 32bit, using u_int32_t in compat_xt_counters
breaks on these architectures, use u_int64_t for everything but x86.
Reported by Andreas Schwab <schwab@suse.de>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't have to #if guard prototypes.
This also fixes a bug observed by Randy Dunlap due to a misspelled
option in the #if.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add some sanity checking. truesize should be at least sizeof(struct
sk_buff) plus the current packet length. If not, then truesize is
seriously mangled and deserves a kernel log message.
Currently we'll do the check for release of stream socket buffers.
But we can add checks to more spots over time.
Incorporating ideas from Herbert Xu.
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.linux-nfs.org/pub/linux/nfs-2.6:
SUNRPC: Dead code in net/sunrpc/auth_gss/auth_gss.c
NFS: remove needless check in nfs_opendir()
NFS: nfs_show_stats; for_each_possible_cpu(), not NR_CPUS
NFS: make 2 functions static
NFS,SUNRPC: Fix compiler warnings if CONFIG_PROC_FS & CONFIG_SYSCTL are unset
NFS: fix PROC_FS=n compile error
VFS: Fix another open intent Oops
RPCSEC_GSS: fix leak in krb5 code caused by superfluous kmalloc
fs/built-in.o: In function `nfs_show_stats':inode.c:(.text+0x15481a): undefined reference to `rpc_print_iostats'
net/built-in.o: In function `rpc_destroy_client': undefined reference to `rpc_free_iostats'
net/built-in.o: In function `rpc_clone_client': undefined reference to `rpc_alloc_iostats'
net/built-in.o: In function `rpc_new_client': undefined reference to `rpc_alloc_iostats'
net/built-in.o: In function `xprt_release': undefined reference to `rpc_count_iostats'
make: *** [.tmp_vmlinux1] Error 1
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Noted by Sergei Shtylylov <sshtylyov@ru.mvista.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for the IDE device on ATI SB600
Signed-off-by: Felix Kuehling <fkuehlin@ati.com>
Acked-by: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While we can currently walk through thread groups, process groups, and
sessions with just the rcu_read_lock, this opens the door to walking the
entire task list.
We already have all of the other RCU guarantees so there is no cost in
doing this, this should be enough so that proc can stop taking the
tasklist lock during readdir.
prev_task was killed because it has no users, and using it will miss new
tasks when doing an rcu traversal.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Somehow in the midst of dotting i's and crossing t's during
the merge up to rc1 we wound up keeping __put_task_struct_cb
when it should have been killed as it no longer has any users.
Sorry I probably should have caught this while it was
still in the -mm tree.
Having the old code there gets confusing when reading
through the code and trying to understand what is
happening.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: (679 commits)
commit 7676f83aeb
Author: James Bottomley <James.Bottomley@steeleye.com>
Date: Fri Apr 14 09:47:59 2006 -0500
[SCSI] scsi_transport_sas: don't scan a non-existent end device
Any end device that can't support any of the scanning protocols
shouldn't be scanned, so set its id to -1 to prevent
scsi_scan_target() being called for it.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
commit 3c0c25b97c
Author: Moore, Eric <Eric.Moore@lsil.com>
Date: Thu Apr 13 16:08:17 2006 -0600
[SCSI] mptfusion - fix panic in mptsas_slave_configure
Driver panic when RAID logical volume was present when driver
loaded, or when a RAID logical volume was created on the fly.
...
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (169 commits)
commit 78a596b449
Author: Adrian Bunk <bunk@stusta.de>
Date: Fri Mar 31 01:38:12 2006 -0800
[PATCH] remove kernel/power/pm.c:pm_unregister()
Since the last user is removed in -mm, we can now remove this long deprecated
function.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 21440d3133
Author: David Brownell <david-b@pacbell.net>
Date: Sat Apr 1 10:21:52 2006 -0800
[PATCH] dma doc updates
...
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (158 commits)
commit 4f705ae3e9
Author: Bjorn Helgaas <bjorn.helgaas@hp.com>
Date: Mon Apr 3 17:09:22 2006 -0700
[PATCH] DMI: move dmi_scan.c from arch/i386 to drivers/firmware/
dmi_scan.c is arch-independent and is used by i386, x86_64, and ia64.
Currently all three arches compile it from arch/i386, which means that ia64
and x86_64 depend on things in arch/i386 that they wouldn't otherwise care
about.
This is simply "mv arch/i386/kernel/dmi_scan.c drivers/firmware/" (removing
trailing whitespace) and the associated Makefile changes. All three
architectures already set CONFIG_DMI in their top-level Kconfig files.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andrey Panin <pazke@orbita1.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
...
Since the last user is removed in -mm, we can now remove this long deprecated
function.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Sparse warns about casting to a __bitwise type. However, it's correct
to do when defining the enum for pci_bus_flags_t, so add a __force to
quiet the warnings. This will fix getting
include/linux/pci.h💯26: warning: cast to restricted type
from sparse all over the build.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The naming of the constant defined for PCI ID 1022:7450 does not seem
to match the information at http://pciids.sourceforge.net/:
http://pci-ids.ucw.cz/iii/?i=1022
There 1022:7450 is listed as "AMD-8131 PCI-X Bridge" while 1022:7451
is listed as "AMD-8131 PCI-X IOAPIC". Yet, the current definition for
0x7450 is PCI_DEVICE_ID_AMD_8131_APIC. It seems to me like that name
should map to 0x7451, while a name like PCI_DEVICE_ID_AMD_8131_BRIDGE
should map to 0x7450.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Print more diagnostic info to help identify the source of power management
suspend failures.
Example:
usb_hcd_pci_suspend(): pci_set_power_state+0x0/0x1af() returns -22
pci_device_suspend(): usb_hcd_pci_suspend+0x0/0x11b() returns -22
suspend_device(): pci_device_suspend+0x0/0x34() returns -22
Work-in-progress. It needs lots more suspend_report_result() calls sprinkled
everywhere.
Cc: Patrick Mochel <mochel@digitalimplant.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@suspend2.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[BLOCK] delay all uevents until partition table is scanned
Here we delay the annoucement of all block device events until the
disk's partition table is scanned and all partition devices are already
created and sysfs is populated.
We have a bunch of old bugs for removable storage handling where we
probe successfully for a filesystem on the raw disk, but at the
same time the kernel recognizes a partition table and creates partition
devices.
Currently there is no sane way to tell if partitions will show up or not
at the time the disk device is announced to userspace. With the delayed
events we can simply skip any probe for a filesystem on the raw disk when
we find already present partitions.
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It works like this:
Open the file
Read all the contents.
Call poll requesting POLLERR or POLLPRI (so select/exceptfds works)
When poll returns,
close the file and go to top of loop.
or lseek to start of file and go back to the 'read'.
Events are signaled by an object manager calling
sysfs_notify(kobj, dir, attr);
If the dir is non-NULL, it is used to find a subdirectory which
contains the attribute (presumably created by sysfs_create_group).
This has a cost of one int per attribute, one wait_queuehead per kobject,
one int per open file.
The name "sysfs_notify" may be confused with the inotify
functionality. Maybe it would be nice to support inotify for sysfs
attributes as well?
This patch also uses sysfs_notify to allow /sys/block/md*/md/sync_action
to be pollable
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Move common definitions for NET2280 to <linux/usb/net2280.h>, so that I can
use them in prism54usb (it is not merged yet, but I plan to do it soon).
Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'tee' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] splice: add support for sys_tee()
[PATCH] splice: pass offset around for ->splice_read() and ->splice_write()
We currently have two implementations of this obsolete ioctl, one in
the block layer and one in the scsi code. Both of them have drawbacks.
This patch kills the scsi layer version after updating the block version
with the missing bits:
- argument checking
- use scatterlist I/O
- set number of retries based on the submitted command
This is the last user of non-S/G I/O except for the gdth driver, so
getting this in ASAP and through the scsi tree would be nie to kill
the non-S/G I/O path. Jens, what do you think about adding a check
for non-S/G I/O in the midlayer?
Thanks to Or Gerlitz for testing this patch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The pen down IRQ will toggle during each X,Y,Z measurement cycle.
Even though the IRQ is disabled it will be latched and delivered
when after enable_irq. Thus in the IRQ handler we must avoid
starting a new measurement cycle when such an "unwanted" IRQ happens.
Add a get_pendown_state platform function, which will probably
determine this by reading the current GPIO level of the pen IRQ pin.
Move the IRQ reenabling from the SPI RX function to the timer. After
the last power down message the pen IRQ pin is still active for a
while and get_pendown_state would report incorrectly a pen down state.
When suspending we should check the ts->pending flag instead of
ts->pendown, since the timer can be pending regardless of ts->pendown.
Also if ts->pending is set we can be sure that the timer is running,
so no need to rearm it. Similarly if ts->pending is not set we can
be sure that the IRQ is enabled (and the timer is not).
Signed-off-by: Imre Deak <imre.deak@nokia.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Some touchscreens seem to oscillate heavily for a while after touching
the screen. Implement support for sampling the screen until we get two
consecutive values that are close enough.
Signed-off-by: Imre Deak <imre.deak@nokia.com>
Signed-off-by: Juha Yrjola <juha.yrjola@nokia.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
The in-kernel Sangoma drivers are both not compiling and marked as BROKEN
since at least kernel 2.6.0.
Sangoma offers out-of-tree drivers, and David Mandelstam told me Sangoma
does no longer maintain the in-kernel drivers and prefers to provide them
as a separate installation package.
This patch therefore removes these drivers.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Basically an in-kernel implementation of tee, which uses splice and the
pipe buffers as an intelligent way to pass data around by reference.
Where the user space tee consumes the input and produces a stdout and
file output, this syscall merely duplicates the data inside a pipe to
another pipe. No data is copied, the output just grabs a reference to the
input pipe data.
Signed-off-by: Jens Axboe <axboe@suse.de>
We need not use ->f_pos as the offset for the file input/output. If the
user passed an offset pointer in through sys_splice(), just use that and
leave ->f_pos alone.
Signed-off-by: Jens Axboe <axboe@suse.de>
In vsyscall function do_vgettimeofday(), some functions are declared as
inlined, which is a hint for gcc to compile the function inlined but it
not forced. Sometimes compiler does not compile the function as
inlined, so here inline is replaced by __always_inline prefix.
It does not happen in gcc compiler actually, but it possibly happens.
Signed-off-by: bibo mao <bibo.mao@intel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] vfs: add splice_write and splice_read to documentation
[PATCH] Remove sys_ prefix of new syscalls from __NR_sys_*
[PATCH] splice: warning fix
[PATCH] another round of fs/pipe.c cleanups
[PATCH] splice: comment styles
[PATCH] splice: add Ingo as addition copyright holder
[PATCH] splice: unlikely() optimizations
[PATCH] splice: speedups and optimizations
[PATCH] pipe.c/fifo.c code cleanups
[PATCH] get rid of the PIPE_*() macros
[PATCH] splice: speedup __generic_file_splice_read
[PATCH] splice: add direct fd <-> fd splicing support
[PATCH] splice: add optional input and output offsets
[PATCH] introduce a "kernel-internal pipe object" abstraction
[PATCH] splice: be smarter about calling do_page_cache_readahead()
[PATCH] splice: optimize the splice buffer mapping
[PATCH] splice: cleanup __generic_file_splice_read()
[PATCH] splice: only call wake_up_interruptible() when we really have to
[PATCH] splice: potential !page dereference
[PATCH] splice: mark the io page as accessed
Bugzilla Bug 6299:
A pixel size of 8 bits produces wrong logo colors in x86_64.
The driver has 2 methods for setting the color map, using the protected
mode interface provided by the video BIOS and directly writing to the VGA
registers. The former is not supported in x86_64 and the latter is enabled
only in i386.
Fix by enabling the latter method in x86_64 only if supported by the BIOS.
If both methods are unsupported, change the visual of vesafb to
STATIC_PSEUDOCOLOR.
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Every caller of svc_take_page ignores its return value and assumes it
succeeded. So just WARN() instead of returning an ignored error. This would
have saved some time debugging a recent nfsd4 problem.
If there are still failure cases here, then the result is probably that we
overwrite an earlier part of the reply while xdr-encoding.
While the corrupted reply is a nasty bug, it would be worse to panic here and
create the possibility of a remote DOS; hence WARN() instead of BUG().
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Ingo Oeser <ioe-lkml@rameria.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
An UML user reported (against 2.6.13.3/UML) he got kernel Oopses when
trying to rmmod (on a kernel with module unloading enabled) a module
compiled with module unloading disabled. As crashing is a very correct
thing to do in that case, a solution is altering the vermagic string to
include this too.
Possibly, however, the code should not crash in this case, even if the
module didn't support unloading - it should simply abort the module
removal. In this case, fixing that bug would be a better solution. I've
not investigated though.
(akpm: a bit marginal - root screwed up and shot himself in the foot).
Cc: Hayim Shaul <hayim@post.tau.ac.il>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These are the last conversions of pci_set_dma_mask(),
pci_set_consistent_dma_mask() and pci_dma_supported() to use DMA_xBIT_MASK
constants from linux/dma-mapping.h
Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A couple of /proc/vmcore data structures overflow with 32bit systems having
memory more than 4G. This patch fixes those.
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Before commit 47e65328a7, next_thread() took
a const task_t. Reinstate the const qualifier, getting the next thread
never changes the current thread.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We changed the wrong symbol. It's tty_insert_flip_string_flags() which is
called from the previously-non-GPL'ed now-inlined tty_insert_flip_char().
Fix that up, and uninline tty_schedule_flip() while we're there.
Cc: Tobias Powalowski <t.powa@gmx.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Lay out the structure definitions in include/linux/leds.h to be aligned as
much as possible. Also minor updates to the comments to make them more
concise.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Acked-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement the scheduled unexport of panic_timeout.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ulrich suggested that the `flags' arg to sync_file_range() become unsigned.
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some string functions were safely overrideable in lib/string.c, but their
corresponding declarations in linux/string.h were not. Correct this, and
make strcspn overrideable.
Odds of someone wanting to do optimized assembly of these are small, but
for the sake of cleanliness, might as well bring them into line with the
rest of the file.
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Current implementations define NODES_SHIFT in include/asm-xxx/numnodes.h for
each arch. Its definition is sometimes configurable. Indeed, ia64 defines 5
NODES_SHIFT values in the current git tree. But it looks a bit messy.
SGI-SN2(ia64) system requires 1024 nodes, and the number of nodes already has
been changeable by config. Suitable node's number may be changed in the
future even if it is other architecture. So, I wrote configurable node's
number.
This patch set defines just default value for each arch which needs multi
nodes except ia64. But, it is easy to change to configurable if necessary.
On ia64 the number of nodes can be already configured in generic ia64 and SN2
config. But, NODES_SHIFT is defined for DIG64 and HP'S machine too. So, I
changed it so that all platforms can be configured via CONFIG_NODES_SHIFT. It
would be simpler.
See also: http://marc.theaimsgroup.com/?l=linux-kernel&m=114358010523896&w=2
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce GFP_NOWAIT, as an alias for GFP_ATOMIC & ~__GFP_HIGH.
This also changes XFS, which is the only in-tree user of this idiom that I
could find. The XFS piece is compile-tested only.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Acked-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add some documentation regarding the utilisation of the flags field in
struct page. This field is overloaded for per page bits and to hold node,
zone and SPARSEMEM information. Make it clear which areas are used for
what and how many bits are in each area.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These patches are an enhancement of OVERCOMMIT_GUESS algorithm in
__vm_enough_memory().
- why the kernel needed patching
When the kernel can't allocate anonymous pages in practice, currnet
OVERCOMMIT_GUESS could return success. This implementation might be
the cause of oom kill in memory pressure situation.
If the Linux runs with page reservation features like
/proc/sys/vm/lowmem_reserve_ratio and without swap region, I think
the oom kill occurs easily.
- the overall design approach in the patch
When the OVERCOMMET_GUESS algorithm calculates number of free pages,
the reserved free pages are regarded as non-free pages.
This change helps to avoid the pitfall that the number of free pages
become less than the number which the kernel tries to keep free.
- testing results
I tested the patches using my test kernel module.
If the patches aren't applied to the kernel, __vm_enough_memory()
returns success in the situation but autual page allocation is
failed.
On the other hand, if the patches are applied to the kernel, memory
allocation failure is avoided since __vm_enough_memory() returns
failure in the situation.
I checked that on i386 SMP 16GB memory machine. I haven't tested on
nommu environment currently.
This patch adds totalreserve_pages for __vm_enough_memory().
Calculate_totalreserve_pages() checks maximum lowmem_reserve pages and
pages_high in each zone. Finally, the function stores the sum of each
zone to totalreserve_pages.
The totalreserve_pages is calculated when the VM is initilized.
And the variable is updated when /proc/sys/vm/lowmem_reserve_raito
or /proc/sys/vm/min_free_kbytes are changed.
Signed-off-by: Hideo Aoki <haoki@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
reshape_position is a 64bit field that was not 64bit aligned. So swap with
new_level.
NOTE: this is a user-visible change. However:
- The bad code has not appeared in a released kernel
- This code is still marked 'experimental'
- This only affects version-1 superblock, which are not in wide use
- These field are only used (rather than simply reported) by user-space
tools in extemely rare circumstances : after a reshape crashes in the
first second of the reshape process.
So I believe that, at this stage, the change is safe. Especially if people
heed the 'help' message on use mdadm-2.4.1.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: Andrew Morton <akpm@osdl.org>
net/socket.c:148: warning: initialization from incompatible pointer type
extern declarations in .c files! Bad boy.
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jens Axboe <axboe@suse.de>
It's more efficient for sendfile() emulation. Basically we cache an
internal private pipe and just use that as the intermediate area for
pages. Direct splicing is not available from sys_splice(), it is only
meant to be used for sendfile() emulation.
Additional patch from Ingo Molnar to avoid the PIPE_BUFFERS loop at
exit for the normal fast path.
Signed-off-by: Jens Axboe <axboe@suse.de>
Oleg Nesterov spotted two interesting bugs with the current de_thread
code. The simplest is a long standing double decrement of
__get_cpu_var(process_counts) in __unhash_process. Caused by
two processes exiting when only one was created.
The other is that since we no longer detach from the thread_group list
it is possible for do_each_thread when run under the tasklist_lock to
see the same task_struct twice. Once on the task list as a
thread_group_leader, and once on the thread list of another
thread.
The double appearance in do_each_thread can cause a double increment
of mm_core_waiters in zap_threads resulting in problems later on in
coredump_wait.
To remedy those two problems this patch takes the simple approach
of changing the old thread group leader into a child thread.
The only routine in release_task that cares is __unhash_process,
and it can be trivially seen that we handle cleaning up a
thread group leader properly.
Since de_thread doesn't change the pid of the exiting leader process
and instead shares it with the new leader process. I change
thread_group_leader to recognize group leadership based on the
group_leader field and not based on pids. This should also be
slightly cheaper then the existing thread_group_leader macro.
I performed a quick audit and I couldn't see any user of
thread_group_leader that cared about the difference.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Overriding the whole EH code is a per-transport, not per-host thing.
Move ->eh_strategy_handler to the transport class, same as
->eh_timed_out.
Downside is that scsi_host_alloc can't check for the total lack of EH
anymore, but the transition period from old EH where we needed it is
long gone already.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Rohit found an obscure bug causing buddy list corruption.
page_is_buddy is using a non-atomic test (PagePrivate && page_count == 0)
to determine whether or not a free page's buddy is itself free and in the
buddy lists.
Each of the conjuncts may be true at different times due to unrelated
conditions, so the non-atomic page_is_buddy test may find each conjunct to
be true even if they were not both true at the same time (ie. the page was
not on the buddy lists).
Signed-off-by: Martin Bligh <mbligh@google.com>
Signed-off-by: Rohit Seth <rohitseth@google.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
add optional input and output offsets to sys_splice(), for seekable file
descriptors:
asmlinkage long sys_splice(int fd_in, loff_t __user *off_in,
int fd_out, loff_t __user *off_out,
size_t len, unsigned int flags);
semantics are straightforward: f_pos will be updated with the offset
provided by user-space, before the splice transfer is about to begin.
Providing a NULL offset pointer means the existing f_pos will be used
(and updated in situ). Providing an offset for a pipe results in
-ESPIPE. Providing an invalid offset pointer results in -EFAULT.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jens Axboe <axboe@suse.de>
separate out the 'internal pipe object' abstraction, and make it
usable to splice. This cleans up and fixes several aspects of the
internal splice APIs and the pipe code:
- pipes: the allocation and freeing of pipe_inode_info is now more symmetric
and more streamlined with existing kernel practices.
- splice: small micro-optimization: less pointer dereferencing in splice
methods
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Update XFS for the ->splice_read/->splice_write changes.
Signed-off-by: Jens Axboe <axboe@suse.de>
Add checksum operation which takes care of verifying the checksum and
dealing with HW checksum errors and avoids multiple checksum
operations by setting ip_summed to CHECKSUM_UNNECESSARY after
successful verification.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the queue rerouter intrastructure to a generic usable
infrastructure for address family specific operations as a base for
some cleanups.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move prototypes of NAT callbacks to ip_conntrack_h323.h. Because the
use of typedefs as arguments, some header files need to be moved as
well.
Signed-off-by: Jing Min Zhao <zhaojingmin@users.sourceforge.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the HPET timer is enabled, the clock can drift by ~3 seconds a day.
This is due to the HPET timer not being initialized with the correct
setting (still using PIT count).
If HZ changes, this drift can become even more pronounced.
HPET patch initializes tick_nsec with correct tick_nsec settings for
HPET timer.
Vojtech comments:
"It's not entirely correct (it assumes the HPET ticks totally
exactly), but it's significantly better than assuming the PIT error
there."
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The node setup code would try to allocate the node metadata in the node
itself, but that fails if there is no memory in there.
This can happen with memory hotplug when the hotplug area defines an so
far empty node.
Now use bootmem to try to allocate the mem_map in other nodes.
And if it fails don't panic, but just ignore the node.
To make this work I added a new __alloc_bootmem_nopanic function that
does what its name implies.
TBD should try to use nearby nodes here. Currently we just use any.
It's hard to do it better because bootmem doesn't have proper fallback
lists yet.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Memory hotadd doesn't need SPARSEMEM, but can be handled by just preallocating
mem_maps. This only needs some untangling of ifdefs to enable the necessary
code even without SPARSEMEM.
Originally from Keith Mannthey, hacked by AK.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Originally from Nick Piggin, just adapted to the newer branch.
You can't check PageLRU without holding zone->lru_lock. The page
release code can get away with it only because the page refcount is 0 at
that point. Also, you can't reliably remove pages from the LRU unless
the refcount is 0. Ever.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Jens Axboe <axboe@suse.de>
By cleaning up the writeback logic (killing write_one_page() and the manual
set_page_dirty()), we can get rid of ->stolen inside the pipe_buffer and
just keep it local in pipe_to_file().
This also adds dirty page balancing logic and O_SYNC handling.
Signed-off-by: Jens Axboe <axboe@suse.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (48 commits)
Documentation: fix minor kernel-doc warnings
BUG_ON() Conversion in drivers/net/
BUG_ON() Conversion in drivers/s390/net/lcs.c
BUG_ON() Conversion in mm/slab.c
BUG_ON() Conversion in mm/highmem.c
BUG_ON() Conversion in kernel/signal.c
BUG_ON() Conversion in kernel/signal.c
BUG_ON() Conversion in kernel/ptrace.c
BUG_ON() Conversion in ipc/shm.c
BUG_ON() Conversion in fs/freevxfs/
BUG_ON() Conversion in fs/udf/
BUG_ON() Conversion in fs/sysv/
BUG_ON() Conversion in fs/inode.c
BUG_ON() Conversion in fs/fcntl.c
BUG_ON() Conversion in fs/dquot.c
BUG_ON() Conversion in md/raid10.c
BUG_ON() Conversion in md/raid6main.c
BUG_ON() Conversion in md/raid5.c
Fix minor documentation typo
BFP->BPF in Documentation/networking/tuntap.txt
...
It doesn't make the splice itself necessarily nonblocking (because the
actual file descriptors that are spliced from/to may block unless they
have the O_NONBLOCK flag set), but it makes the splice pipe operations
nonblocking.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch updates the comments to match the actual code.
Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
The sliced VBI defines added in videodev2.h are removed since requires
more discussion.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
- Add KEY_BRL_* input keys and K_BRL_* keycodes;
- Add emulation of how braille keyboards usually combine braille dots
to the console keyboard driver;
- Add handling of unicode U+28xy diacritics.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
This patch extends current iptables compatibility layer in order to get
32bit iptables to work on 64bit kernel. Current layer is insufficient due
to alignment checks both in kernel and user space tools.
Patch is for current net-2.6.17 with addition of move of ipt_entry_{match|
target} definitions to xt_entry_{match|target}.
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Acked-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This unifies ipt_multiport and ip6t_multiport to xt_multiport.
As a result, this addes support for inversion and port range match
to IPv6 packets.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This unifies ipt_esp and ip6t_esp to xt_esp. Please note that now
a user program needs to specify IPPROTO_ESP as protocol to use esp match
with IPv6. This means that ip6tables requires '-p esp' like iptables.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
I was grepping through the code and some `grep ganularity -R .` didn't
catch what I thought. Then looking closer I saw the term "granuality"
used in only four places (in comments) and granularity in many more
places describing the same idea. Some other facts:
dictionary.com does not know such a word
define:granuality on google is not found (and pages for granuality are
mostly related to patches to the kernel)
it has not been discussed as a term on LKML, AFAICS (=Can Search)
To be consistent, I think granularity should be used everywhere.
Signed-off-by: Kalin KOZHUHAROV <kalin@thinrope.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
As announced, lookup_hash() can now become static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The monochrome->color expansion routine that handles bitmaps which have
(widths % 8) != 0 (slow_imageblit) produces corrupt characters in big-endian.
This is caused by a bogus bit test in slow_imageblit().
Fix.
This patch may deserve to go to the stable tree. The code has already been
well tested in little-endian machines. It's only in big-endian where there is
uncertainty and Herbert confirmed that this is the correct way to go.
It should not introduce regressions.
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Acked-by: Herbert Poetzl <herbert@13thfloor.at>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Backlight class attributes are currently easy to implement incorrectly.
Moving certain handling into the backlight core prevents this whilst at the
same time makes the drivers simpler and consistent. The following changes are
included:
The brightness attribute only sets and reads the brightness variable in the
backlight_properties structure.
The power attribute only sets and reads the power variable in the
backlight_properties structure.
Any framebuffer blanking events change a variable fb_blank in the
backlight_properties structure.
The backlight driver has only two functions to implement. One function is
called when any of the above properties change (to update the backlight
brightness), the second is called to return the current backlight brightness
value. A new attribute "actual_brightness" is added to return this brightness
as determined by the driver having combined all the above factors (and any
driver/device specific factors).
Additionally, the backlight core takes care of checking the maximum brightness
is not exceeded and of turning off the backlight before device removal.
The corgi backlight driver is updated to reflect these changes.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It is very common to hash a dentry and then to call lookup. If we take fs
specific hash functions into account the full hash logic can get ugly.
Further full_name_hash as an inline function is almost 100 bytes on x86 so
having a non-inline choice in some cases can measurably decrease code size.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Simplifies the code, reduces the need for 4 pid hash tables, and makes the
code more capable.
In the discussions I had with Oleg it was felt that to a large extent the
cleanup itself justified the work. With struct pid being dynamically
allocated meant we could create the hash table entry when the pid was
allocated and free the hash table entry when the pid was freed. Instead of
playing with the hash lists when ever a process would attach or detach to a
process.
For myself the fact that it gave what my previous task_ref patch gave for free
with simpler code was a big win. The problem is that if you hold a reference
to struct task_struct you lock in 10K of low memory. If you do that in a user
controllable way like /proc does, with an unprivileged but hostile user space
application with typical resource limits of 1000 fds and 100 processes I can
trigger the OOM killer by consuming all of low memory with task structs, on a
machine wight 1GB of low memory.
If I instead hold a reference to struct pid which holds a pointer to my
task_struct, I don't suffer from that problem because struct pid is 2 orders
of magnitude smaller. In fact struct pid is small enough that most other
kernel data structures dwarf it, so simply limiting the number of referring
data structures is enough to prevent exhaustion of low memory.
This splits the current struct pid into two structures, struct pid and struct
pid_link, and reduces our number of hash tables from PIDTYPE_MAX to just one.
struct pid_link is the per process linkage into the hash tables and lives in
struct task_struct. struct pid is given an indepedent lifetime, and holds
pointers to each of the pid types.
The independent life of struct pid simplifies attach_pid, and detach_pid,
because we are always manipulating the list of pids and not the hash table.
In addition in giving struct pid an indpendent life it makes the concept much
more powerful.
Kernel data structures can now embed a struct pid * instead of a pid_t and
not suffer from pid wrap around problems or from keeping unnecessarily
large amounts of memory allocated.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A big problem with rcu protected data structures that are also reference
counted is that you must jump through several hoops to increase the reference
count. I think someone finally implemented atomic_inc_not_zero(&count) to
automate the common case. Unfortunately this means you must special case the
rcu access case.
When data structures are only visible via rcu in a manner that is not
determined by the reference count on the object (i.e. tasks are visible until
their zombies are reaped) there is a much simpler technique we can employ.
Simply delaying the decrement of the reference count until the rcu interval is
over.
What that means is that the proc code that looks up a task and later
wants to sleep can now do:
rcu_read_lock();
task = find_task_by_pid(some_pid);
if (task) {
get_task_struct(task);
}
rcu_read_unlock();
The effect on the rest of the kernel is that put_task_struct becomes cheaper
and immediate, and in the case where the task has been reaped it frees the
task immediate instead of unnecessarily waiting an until the rcu interval is
over.
Cleanup of task_struct does not happen when its reference count drops to
zero, instead cleanup happens when release_task is called. Tasks can only
be looked up via rcu before release_task is called. All rcu protected
members of task_struct are freed by release_task.
Therefore we can move call_rcu from put_task_struct into release_task. And
we can modify release_task to not immediately release the reference count
but instead have it call put_task_struct from the function it gives to
call_rcu.
The end result:
- get_task_struct is safe in an rcu context where we have just looked
up the task.
- put_task_struct() simplifies into its old pre rcu self.
This reorganization also makes put_task_struct uncallable from modules as
it is not exported but it does not appear to be called from any modules so
this should not be an issue, and is trivially fixed.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This just got nuked in mainline. Bring it back because Eric's patches use it.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
To increase the strength of SCHED_BATCH as a scheduling hint we can
activate batch tasks on the expired array since by definition they are
latency insensitive tasks.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>