Unless we're doing O_APPEND writes, we really don't care about revalidating
the file length. Just make sure that we catch any page cache invalidations.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Instead of looking at whether or not the file is open for writes before
we accept to update the length using the server value, we should rather
be looking at whether or not we are currently caching any writes.
Failure to do so means in particular that we're not updating the file
length correctly after obtaining a POSIX or BSD lock.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NFSv3 currently returns the unsigned 64-bit cookie directly to
userspace. The following patch causes the kernel to generate
loff_t offsets for the benefit of userland.
The current server-generated READDIR cookie is cached in the
nfs_open_context instead of in filp->f_pos, so we still end up work
correctly under directory insertions/deletion.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Attach acls to inodes in the icache to avoid unnecessary GETACL RPC
round-trips. As long as the client doesn't retrieve any acls itself, only the
default acls of exiting directories and the default and access acls of new
directories will end up in the cache, which preserves some memory compared to
always caching the access and default acl of all files.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NFSv3 has no concept of a umask on the server side: The client applies
the umask locally, and sends the effective permissions to the server.
This behavior is wrong when files are created in a directory that has a
default ACL. In this case, the umask is supposed to be ignored, and
only the default ACL determines the file's effective permissions.
Usually its the server's task to conditionally apply the umask. But
since the server knows nothing about the umask, we have to do it on the
client side. This patch tries to fetch the parent directory's default
ACL before creating a new file, computes the appropriate create mode to
send to the server, and finally sets the new file's access and default
acl appropriately.
Many thanks to Buck Huppmann <buchk@pobox.com> for sending the initial
version of this patch, as well as for arguing why we need this change.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This adds acl support fo nfs clients via the NFSACL protocol extension, by
implementing the getxattr, listxattr, setxattr, and removexattr iops for the
system.posix_acl_access and system.posix_acl_default attributes. This patch
implements a dumb version that uses no caching (and thus adds some overhead).
(Another patch in this patchset adds caching as well.)
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This adds functions for encoding and decoding POSIX ACLs for the NFSACL
protocol extension, and the GETACL and SETACL RPCs. The implementation is
compatible with NFSACL in Solaris.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The NFS and NFSACL programs run on the same RPC transport. This patch adds
support for this by converting svc_program into a chained list of programs
(server-side).
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add nfs4_acl field to the nfs_inode, and use it to cache acls. Only cache
acls of size up to a page. Also prepare for up to a page of acl data even
when the user doesn't pass in a buffer, as when they want to get the acl
length to decide what size buffer to allocate.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Client-side support for NFSv4 acls: xdr encoding and decoding routines for
writing acls
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Client-side support for NFSv4 acls: xdr encoding and decoding routines for
reading acls
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ACL support will require supporting additional inode operations in v4
(getxattr, setxattr, listxattr). This patch allows different protocol versions
to support different inode operations by adding a file_inode_ops to the
nfs_rpc_ops (to match the existing dir_inode_ops).
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Ensure that we don't create an RPC client without checking that the server
does indeed support the RPC program + version that we are trying to set up.
This enables us to immediately return an error to "mount" if it turns out
that the server is only supporting NFSv2, when we requested NFSv3 or NFSv4.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The patch just changes the order of structure members.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for Maxim/Dallas DS1374 Real-Time Clock Chip
This change adds support for the Maxim/Dallas DS1374 RTC chip. This chip
is an I2C-based RTC that maintains a simple 32-bit binary seconds count
with battery backup support.
Signed-off-by: Randy Vinson <rvinson@mvista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch renames the new linux/i2c-sysfs.h header file to
linux/hwmon-sysfs.h. This names seems to be more appropriate since this
file defines macros and structures not related to i2c but to hardware
monitoring drivers. The patch also updates the five hardware monitoring
driver which include that header file already.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Adds conversion from VID (mV) to register value. Used by the atxp1 I2C module.
Removed uneeded switch case.
Signed-off-by: Sebastian Witt <se.witt@gmx.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Some months ago, you killed the address ranges mechanism from all
sensors i2c chip drivers (both the module parameters and the in-code
address lists). I think it was a very good move, as the ranges can
easily be replaced by individual addresses, and this allowed for
significant cleanups in the i2c core (let alone the impressive size
shrink for all these drivers).
Unfortunately you did not do the same for non-sensors i2c chip drivers.
These need the address ranges even less, so we could get rid of the
ranges here as well for another significant i2c core cleanup. Here comes
a patch which does just that. Since the process is exactly the same as
what you did for the other drivers set already, I did not split this one
in parts.
A documentation update is included.
The change saves 308 bytes in the i2c core, and an average 1382 bytes
for chip drivers which use I2C_CLIENT_INSMOD, 126 bytes for those which
do not.
This change is required if we want to merge the sensors and non-sensors
i2c code (and we want to do this).
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Index: gregkh-2.6/Documentation/i2c/writing-clients
===================================================================
1/ Must typecast int to (sector_t) before inverting or we
might not invert enough bits.
2/ When "bitmap_offset" was added to mdp_superblock_1, we didn't increase
the count of words-used (96 to 100).
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
currently, md updates all superblocks (one on each device) in series. It
waits for one write to complete before starting the next. This isn't a big
problem as superblock updates don't happen that often.
However it is neater to do it in parallel, and if the drives in the array have
gone to "sleep" after a period of idleness, then waking them is parallel is
faster (and someone else should be worrying about power drain).
Futher, we will need parallel superblock updates for a future patch which
keeps the intent-logging bitmap near the superblock.
Also remove the silly code that retired superblock updates 100 times. This
simply never made sense.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This provides an alternate to storing the bitmap in a separate file. The
bitmap can be stored at a given offset from the superblock. Obviously the
creator of the array must make sure this doesn't intersect with data....
After is good for version-0.90 superblocks.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Before completing a 'write' the md superblock might need to be updated.
This is best done by the md_thread.
The current code schedules this up and queues the write request for later
handling by the md_thread.
However some personalities (Raid5/raid6) will deadlock if the md_thread
tries to submit requests to its own array.
So this patch changes things so the processes submitting the request waits
for the superblock to be written and then submits the request itself.
This fixes a recently-created deadlock in raid5/raid6
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When an array is degraded, bit in the intent-bitmap are never cleared. So if
a recently failed drive is re-added, we only need to reconstruct the block
that are still reflected in the bitmap.
This patch adds support for this re-adding.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently we don't wait for updates to the bitmap to be flushed to disk
properly. The infrastructure all there, but it isn't being used....
A separate kernel thread (bitmap_writeback_daemon) is needed to wait for each
page as we cannot get callbacks when a page write completes.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With this patch, the intent to write to some block in the array can be logged
to a bitmap file. Each bit represents some number of sectors and is set
before any update happens, and only cleared when all writes relating to all
sectors are complete.
After an unclean shutdown, information in this bitmap can be used to optimise
resync - only sectors which could be out-of-sync need to be updated.
Also if a drive is removed and then added back into an array, the recovery can
make use of the bitmap to optimise reconstruction. This is not implemented in
this patch.
Currently the bitmap is stored in a file which must (obviously) be stored on a
separate device.
The patch only provided infrastructure. It does not update any personalities
to bitmap intent logging.
Md arrays can still be used with no bitmap file. This patch has minimal
impact on such arrays.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1/ change the return value (which is number-of-sectors synced)
from 'int' to 'sector_t'.
The number of sectors is usually easily small enough to fit
in an int, but if resync needs to abort, it may want to return
the total number of remaining sectors, which could be large.
Also errors cannot be returned as negative numbers now, so use
0 instead
2/ Add a 'skipped' return parameter to allow the array to report
that it skipped the sectors. This allows md to take this into account
in the speed calculations.
Currently there is no important skipping, but the bitmap-based-resync
that is coming will use this.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When md marks the superblock dirty before a write, it calls
generic_make_request (to write the superblock) from within
generic_make_request (to write the first dirty block), which could cause
problems later.
With this patch, the superblock write is always done by the helper thread, and
write request are delayed until that write completes.
Also, the locking around marking the array dirty and writing the superblock is
improved to avoid possible races.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Shrink the stack when calling the drawing alignment functions.
Signed-off-by: James Simmons <jsimmons@www.infradead.org>
Cc: "Antonino A. Daplas" <adaplas@hotpop.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Improve the fonts for use with the framebuffer.
I've added all the characters marked 'FIXME' in the sun12x22 font and
created a 10x18 font (based on the sun12x22 font) and a 7x14 font (based
on the vga8x16 font).
This patch is non-intrusive, no options are enabled by default so most
users won't notice a thing.
I am placing my changes under the GPL, however, I've not seen any copyright
notices on the sun12x22 font and the vga8x16 font which I derived my new
fonts from so I don't know what the copyright status is.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch adds pci ID for CT 69000 chipset.
Signed-off-by: James Simmons <jsimmons@www.infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for the Arc monochrome LCD board.
The board uses KS108 controllers to drive individual 64x64 LCD matrices.
The board can be paneled in a variety of setups such as 2x1=128x64,
4x4=256x256 and so on. The board/host interface is through GPIO.
Signed-off-by: Jaya Kumar <jayalk@intworks.biz>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: <linux-fbdev-devel@lists.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since no one is using the inbuf, outbuf of struct fb_pixmap I removed their
use in the framebuffer console. The idea is instead move the pixmap
functionality below the accelerated functions intead of on top as the way
it is now. If there is no objection please apply. This is against Linus
latestr GIT tree. Thank you.
Signed-off-by: James Simmons <jsimmons@www.infradead.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With Chris Wedgwood <cw@f00f.org>
As suggested by Chris, we can make the "just added" method ->release
conditional to UML only (better: to archs requesting it, i.e. only UML
currently), so that other archs don't get this unneeded crud, and if UML
won't need it any more we can kill this.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
CC: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With Chris Wedgwood <cw@f00f.org>
Currently UML must explicitly call the UML-specific
free_irq_by_irq_and_dev() for each free_irq call it's done.
This is needed because ->shutdown and/or ->disable are only called when the
last "action" for that irq is removed.
Instead, for UML shared IRQs (UML IRQs are very often, if not always,
shared), for each dev_id some setup is done, which must be cleared on the
release of that fd. For instance, for each open console a new instance
(i.e. new dev_id) of the same IRQ is requested().
Exactly, a fd is stored in an array (pollfds), which is after read by a
host thread and passed to poll(). Each event registered by poll() triggers
an interrupt. So, for each free_irq() we must remove the corresponding
host fd from the table, which we do via this -release() method.
In this patch we add an appropriate hook for this, and remove all uses of
it by pointing the hook to the said procedure; this is safe to do since the
said procedure.
Also some cosmetic improvements are included.
This is heavily based on some work by Chris Wedgwood, which however didn't
get the patch merged for something I'd call a "misunderstanding" (the need
for this patch wasn't cleanly explained, thus adding the generic hook was
felt as undesirable).
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
CC: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Several hardware features of SGI's IOC4 I/O controller chip require
timing-related driver calculations dependent upon the PCI bus speed. This
patch enables the core IOC4 driver code to detect the actual bus speed and
store a value that can later be used by the IOC4 subdrivers as needed.
Signed-off-by: Brent Casavant <bcasavan@sgi.com>
Acked-by: Pat Gefre <pfg@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This series of patches reworks the configuration and internal structure
of the SGI IOC4 I/O controller device drivers.
These changes are motivated by several factors:
- The IOC4 chip PCI resources are of mixed use between functions (i.e.
multiple functions are handled in the same address range, sometimes
within the same register), muddling resource ownership and initialization
issues. Centralizing this ownership in a core driver is desirable.
- The IOC4 chip implements multiple functions (serial, IDE, others not
yet implemented in the mainline kernel) but is not a multifunction
PCI device. In order to properly handle device addition and removal
as well as module insertion and deletion, an intermediary IOC4-specific
driver layer is needed to handle these operations cleanly.
- All IOC4 drivers are currently enabled by a single CONFIG value. As
not all systems need all IOC4 functions, it is desireable to enable
these drivers independently.
- The current IOC4 core driver will trigger loading of all function-level
drivers, as it makes direct calls to them. This situation should be
reversed (i.e. function-level drivers cause loading of core driver)
in order to maintain a clear and least-surprise driver loading model.
- IOC4 hardware design necessitates some driver-level dependency on
the PCI bus clock speed. Current code assumes a 66MHz bus, but the
speed should be autodetected and appropriate compensation taken.
This patch series effects the above changes by a newly and better designed
IOC4 core driver with which the function-level drivers can register and
deregister themselves upon module insertion/removal. By tracking these
modules, device addition/removal is also handled properly. PCI resource
management and ownership issues are centralized in this core driver, and
IOC4-wide configuration actions such as bus speed detection are also
handled in this core driver.
This patch:
The SGI IOC4 I/O controller chip implements multiple functions, though it is
not a multi-function PCI device. Additionally, various PCI resources of the
IOC4 are shared by multiple hardware functions, and thus resource ownership by
driver is not clearly delineated. Due to the current driver design, all core
and subordinate drivers must be loaded, or none, which is undesirable if not
all IOC4 hardware features are being used.
This patch reorganizes the IOC4 drivers so that the core driver provides a
subdriver registration service. Through appropriate callbacks the subdrivers
can now handle device addition and removal, as well as module insertion and
deletion (though the IOC4 IDE driver requires further work before module
deletion will work). The core driver now takes care of allocating PCI
resources and data which must be shared between subdrivers, to clearly
delineate module ownership of these items.
Signed-off-by: Brent Casavant <bcasavan@sgi.com>
Acked-by: Pat Gefre <pfg@sgi.com
Acked-by: Jeremy Higdon <jeremy@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Added descriptions of the new MPC8548 family processors, e500 core and
peripherals.
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains the ia64 uncached page allocator and the generic
allocator (genalloc). The uncached allocator was formerly part of the SN2
mspec driver but there are several other users of it so it has been split
off from the driver.
The generic allocator can be used by device driver to manage special memory
etc. The generic allocator is based on the allocator from the sym53c8xx_2
driver.
Various users on ia64 needs uncached memory. The SGI SN architecture requires
it for inter-partition communication between partitions within a large NUMA
cluster. The specific user for this is the XPC code. Another application is
large MPI style applications which use it for synchronization, on SN this can
be done using special 'fetchop' operations but it also benefits non SN
hardware which may use regular uncached memory for this purpose. Performance
of doing this through uncached vs cached memory is pretty substantial. This
is handled by the mspec driver which I will push out in a seperate patch.
Rather than creating a specific allocator for just uncached memory I came up
with genalloc which is a generic purpose allocator that can be used by device
drivers and other subsystems as they please. For instance to handle onboard
device memory. It was derived from the sym53c7xx_2 driver's allocator which
is also an example of a potential user (I am refraining from modifying sym2
right now as it seems to have been under fairly heavy development recently).
On ia64 memory has various properties within a granule, ie. it isn't safe to
access memory as uncached within the same granule as currently has memory
accessed in cached mode. The regular system therefore doesn't utilize memory
in the lower granules which is mixed in with device PAL code etc. The
uncached driver walks the EFI memmap and pulls out the spill uncached pages
and sticks them into the uncached pool. Only after these chunks have been
utilized, will it start converting regular cached memory into uncached memory.
Hence the reason for the EFI related code additions.
Signed-off-by: Jes Sorensen <jes@wildopensource.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The pageset array can potentially acquire a huge amount of memory on large
NUMA systems. F.e. on a system with 512 processors and 256 nodes there
will be 256*512 pagesets. If each pageset only holds 5 pages then we are
talking about 655360 pages.With a 16K page size on IA64 this results in
potentially 10 Gigabytes of memory being trapped in pagesets. The typical
cases are much less for smaller systems but there is still the potential of
memory being trapped in off node pagesets. Off node memory may be rarely
used if local memory is available and so we may potentially have memory in
seldom used pagesets without this patch.
The slab allocator flushes its per cpu caches every 2 seconds. The
following patch flushes the off node pageset caches in the same way by
tying into the slab flush.
The patch also changes /proc/zoneinfo to include the number of pages
currently in each pageset.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
By making the offset argument of __read_page_state an unsigned long instead of
unsigned, we can avoid forcing the compiler to sign extend a usually constant
argument. This saves 1 instruction on x86-64.
Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
By making the offset argument of __mod_page_state an unsigned long instead
of unsigned, we can avoid forcing the compiler to sign extend a usually
constant argument. This saves 1 instruction on x86-64.
Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
try_to_free_pages accepts a third argument, order, but hasn't used it since
before 2.6.0. The following patch removes the argument and updates all the
calls to try_to_free_pages.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove PG_highmem, to save a page flag. Use is_highmem() instead. It'll
generate a little more code, but we don't use PageHigheMem() in many places.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ingo recently introduced a great speedup for allocating new mmaps using the
free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and
causes huge performance increases in thread creation.
The downside of this patch is that it does lead to fragmentation in the
mmap-ed areas (visible via /proc/self/maps), such that some applications
that work fine under 2.4 kernels quickly run out of memory on any 2.6
kernel.
The problem is twofold:
1) the free_area_cache is used to continue a search for memory where
the last search ended. Before the change new areas were always
searched from the base address on.
So now new small areas are cluttering holes of all sizes
throughout the whole mmap-able region whereas before small holes
tended to close holes near the base leaving holes far from the base
large and available for larger requests.
2) the free_area_cache also is set to the location of the last
munmap-ed area so in scenarios where we allocate e.g. five regions of
1K each, then free regions 4 2 3 in this order the next request for 1K
will be placed in the position of the old region 3, whereas before we
appended it to the still active region 1, placing it at the location
of the old region 2. Before we had 1 free region of 2K, now we only
get two free regions of 1K -> fragmentation.
The patch addresses thes issues by introducing yet another cache descriptor
cached_hole_size that contains the largest known hole size below the
current free_area_cache. If a new request comes in the size is compared
against the cached_hole_size and if the request can be filled with a hole
below free_area_cache the search is started from the base instead.
The results look promising: Whereas 2.6.12-rc4 fragments quickly and my
(earlier posted) leakme.c test program terminates after 50000+ iterations
with 96 distinct and fragmented maps in /proc/self/maps it performs nicely
(as expected) with thread creation, Ingo's test_str02 with 20000 threads
requires 0.7s system time.
Taking out Ingo's patch (un-patch available per request) by basically
deleting all mentions of free_area_cache from the kernel and starting the
search for new memory always at the respective bases we observe: leakme
terminates successfully with 11 distinctive hardly fragmented areas in
/proc/self/maps but thread creating is gringdingly slow: 30+s(!) system
time for Ingo's test_str02 with 20000 threads.
Now - drumroll ;-) the appended patch works fine with leakme: it ends with
only 7 distinct areas in /proc/self/maps and also thread creation seems
sufficiently fast with 0.71s for 20000 threads.
Signed-off-by: Wolfgang Wander <wwc@rentec.com>
Credit-to: "Richard Purdie" <rpurdie@rpsys.net>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu> (partly)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch modifies the way pagesets in struct zone are managed.
Each zone has a per-cpu array of pagesets. So any particular CPU has some
memory in each zone structure which belongs to itself. Even if that CPU is
not local to that zone.
So the patch relocates the pagesets for each cpu to the node that is nearest
to the cpu instead of allocating the pagesets in the (possibly remote) target
zone. This means that the operations to manage pages on remote zone can be
done with information available locally.
We play a macro trick so that non-NUMA pmachines avoid the additional
pointer chase on the page allocator fastpath.
AIM7 benchmark on a 32 CPU SGI Altix
w/o patches:
Tasks jobs/min jti jobs/min/task real cpu
1 484.68 100 484.6769 12.01 1.97 Fri Mar 25 11:01:42 2005
100 27140.46 89 271.4046 21.44 148.71 Fri Mar 25 11:02:04 2005
200 30792.02 82 153.9601 37.80 296.72 Fri Mar 25 11:02:42 2005
300 32209.27 81 107.3642 54.21 451.34 Fri Mar 25 11:03:37 2005
400 34962.83 78 87.4071 66.59 588.97 Fri Mar 25 11:04:44 2005
500 31676.92 75 63.3538 91.87 742.71 Fri Mar 25 11:06:16 2005
600 36032.69 73 60.0545 96.91 885.44 Fri Mar 25 11:07:54 2005
700 35540.43 77 50.7720 114.63 1024.28 Fri Mar 25 11:09:49 2005
800 33906.70 74 42.3834 137.32 1181.65 Fri Mar 25 11:12:06 2005
900 34120.67 73 37.9119 153.51 1325.26 Fri Mar 25 11:14:41 2005
1000 34802.37 74 34.8024 167.23 1465.26 Fri Mar 25 11:17:28 2005
with slab API changes and pageset patch:
Tasks jobs/min jti jobs/min/task real cpu
1 485.00 100 485.0000 12.00 1.96 Fri Mar 25 11:46:18 2005
100 28000.96 89 280.0096 20.79 150.45 Fri Mar 25 11:46:39 2005
200 32285.80 79 161.4290 36.05 293.37 Fri Mar 25 11:47:16 2005
300 40424.15 84 134.7472 43.19 438.42 Fri Mar 25 11:47:59 2005
400 39155.01 79 97.8875 59.46 590.05 Fri Mar 25 11:48:59 2005
500 37881.25 82 75.7625 76.82 730.19 Fri Mar 25 11:50:16 2005
600 39083.14 78 65.1386 89.35 872.79 Fri Mar 25 11:51:46 2005
700 38627.83 77 55.1826 105.47 1022.46 Fri Mar 25 11:53:32 2005
800 39631.94 78 49.5399 117.48 1169.94 Fri Mar 25 11:55:30 2005
900 36903.70 79 41.0041 141.94 1310.78 Fri Mar 25 11:57:53 2005
1000 36201.23 77 36.2012 160.77 1458.31 Fri Mar 25 12:00:34 2005
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com>
Signed-off-by: Shai Fultheim <Shai@Scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A lot of the code in arch/*/mm/hugetlbpage.c is quite similar. This patch
attempts to consolidate a lot of the code across the arch's, putting the
combined version in mm/hugetlb.c. There are a couple of uglyish hacks in
order to covert all the hugepage archs, but the result is a very large
reduction in the total amount of code. It also means things like hugepage
lazy allocation could be implemented in one place, instead of six.
Tested, at least a little, on ppc64, i386 and x86_64.
Notes:
- this patch changes the meaning of set_huge_pte() to be more
analagous to set_pte()
- does SH4 need s special huge_ptep_get_and_clear()??
Acked-by: William Lee Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When early zone reclaim is turned on the LRU is scanned more frequently when a
zone is low on memory. This limits when the zone reclaim can be called by
skipping the scan if another thread (either via kswapd or sync reclaim) is
already reclaiming from the zone.
Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When using the early zone reclaim, it was noticed that allocating new pages
that should be spread across the whole system caused eviction of local pages.
This adds a new GFP flag to prevent early reclaim from happening during
certain allocation attempts. The example that is implemented here is for page
cache pages. We want page cache pages to be spread across the whole system,
and we don't want page cache pages to evict other pages to get local memory.
Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is the core of the (much simplified) early reclaim. The goal of this
patch is to reclaim some easily-freed pages from a zone before falling back
onto another zone.
One of the major uses of this is NUMA machines. With the default allocator
behavior the allocator would look for memory in another zone, which might be
off-node, before trying to reclaim from the current zone.
This adds a zone tuneable to enable early zone reclaim. It is selected on a
per-zone basis and is turned on/off via syscall.
Adding some extra throttling on the reclaim was also required (patch
4/4). Without the machine would grind to a crawl when doing a "make -j"
kernel build. Even with this patch the System Time is higher on
average, but it seems tolerable. Here are some numbers for kernbench
runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run:
wall user sys %cpu ctx sw. sleeps
---- ---- --- ---- ------ ------
No patch 1009 1384 847 258 298170 504402
w/patch, no reclaim 880 1376 667 288 254064 396745
w/patch & reclaim 1079 1385 926 252 291625 548873
These numbers are the average of 2 runs of 3 "make -j" runs done right
after system boot. Run-to-run variability for "make -j" is huge, so
these numbers aren't terribly useful except to seee that with reclaim
the benchmark still finishes in a reasonable amount of time.
I also looked at the NUMA hit/miss stats for the "make -j" runs and the
reclaim doesn't make any difference when the machine is thrashing away.
Doing a "make -j8" on a single node that is filled with page cache pages
takes 700 seconds with reclaim turned on and 735 seconds without reclaim
(due to remote memory accesses).
The simple zone_reclaim syscall program is at
http://www.bork.org/~mort/sgi/zone_reclaim.c
Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch implements a number of smp_processor_id() cleanup ideas that
Arjan van de Ven and I came up with.
The previous __smp_processor_id/_smp_processor_id/smp_processor_id API
spaghetti was hard to follow both on the implementational and on the
usage side.
Some of the complexity arose from picking wrong names, some of the
complexity comes from the fact that not all architectures defined
__smp_processor_id.
In the new code, there are two externally visible symbols:
- smp_processor_id(): debug variant.
- raw_smp_processor_id(): nondebug variant. Replaces all existing
uses of _smp_processor_id() and __smp_processor_id(). Defined
by every SMP architecture in include/asm-*/smp.h.
There is one new internal symbol, dependent on DEBUG_PREEMPT:
- debug_smp_processor_id(): internal debug variant, mapped to
smp_processor_id().
Also, i moved debug_smp_processor_id() from lib/kernel_lock.c into a new
lib/smp_processor_id.c file. All related comments got updated and/or
clarified.
I have build/boot tested the following 8 .config combinations on x86:
{SMP,UP} x {PREEMPT,!PREEMPT} x {DEBUG_PREEMPT,!DEBUG_PREEMPT}
I have also build/boot tested x64 on UP/PREEMPT/DEBUG_PREEMPT. (Other
architectures are untested, but should work just fine.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
o This adds ->i_op->setattr VFS method for sysfs inodes. The changed
attribues are saved in the persistent sysfs_dirent structure as a pointer
to struct iattr. The struct iattr is allocated only for those sysfs_dirent's
for which default attributes are getting changed. Thanks to Jon Smirl for
this suggestion.
Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch creates a new header with a potential standard i2c sensor
attribute type (which simply includes an int representing the sensor
number/index) and the associated macros, SENSOR_DEVICE_ATTR to define
a static attribute and to_sensor_dev_attr to get a
sensor_device_attribute reference from an embedded device_attribute
reference.
Signed-off-by: Yani Ioannou <yani.ioannou@gmail.com>
This patch adds the device_attribute paramerter to the
device_attribute store and show sysfs callback functions, and passes a
reference to the attribute when the callbacks are called.
Signed-off-by: Yani Ioannou <yani.ioannou@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Based on the discussion about spufs attributes, this is my suggestion
for a more generic attribute file support that can be used by both
debugfs and spufs.
Simple attribute files behave similarly to sequential files from
a kernel programmers perspective in that a standard set of file
operations is provided and only an open operation needs to
be written that registers file specific get() and set() functions.
These operations are defined as
void foo_set(void *data, u64 val); and
u64 foo_get(void *data);
where data is the inode->u.generic_ip pointer of the file and the
operations just need to make send of that pointer. The infrastructure
makes sure this works correctly with concurrent access and partial
read calls.
A macro named DEFINE_SIMPLE_ATTRIBUTE is provided to further simplify
using the attributes.
This patch already contains the changes for debugfs to use attributes
for its internal file operations.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This adds a generic function 'unregister_node()'.
It is used to remove objects of a node going away
for hotplug. All the devices on the node must be
unregistered before calling this function.
Signed-off-by: Keiichiro Tokunaga <tokunaga.keiich@jp.fujitsu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
diff -puN drivers/base/node.c~numa_hp_base drivers/base/node.c
There's no check to see if the device is already bound to a driver, which
could do bad things. The first thing to go wrong is that it will try to match
a driver with a device already bound to one. In some cases (it appears with
USB with drivers/usb/core/usb.c::usb_match_id()), some drivers will match a
device based on the class type, so it would be common (especially for HID
devices) to match a device that is already bound.
The fun comes when ->probe() is called, it fails, then
driver_probe_device() does this:
dev->driver = NULL;
Later on, that pointer could be be dereferenced without checking and cause
hell to break loose.
This problem could be nasty. It's very hardware dependent, since some
devices could have a different set of matching qualifiers than others.
Now, I don't quite see exactly where/how you were getting that crash.
You're dereferencing bad memory, but I'm not sure which pointer was bad
and where it came from, but it could have come from a couple of different
places.
The patch below will hopefully fix it all up for you. It's against
2.6.12-rc2-mm1, and does the following:
- Move logic to driver_probe_device() and comments uncommon returns:
1 - If device is bound
0 - If device not bound, and no error
error - If there was an error.
- Move locking to caller of that function, since we want to lock a
device for the entire time we're trying to bind it to a driver (to
prevent against a driver being loaded at the same time).
- Update __device_attach() and __driver_attach() to do that locking.
- Check if device is already bound in __driver_attach()
- Update the converse device_release_driver() so it locks the device
around all of the operations.
- Mark driver_probe_device() as static and remove export. It's an
internal function, it should stay that way, and there are no other
callers. If there is ever a need to export it, we can audit it as
necessary.
Signed-off-by: Andrew Morton <akpm@osdl.org>
- Use klist iterator in device_for_each_child(), making it safe to use for
removing devices.
- Remove unused list_to_dev() function.
- Kills all usage of devices_subsys.rwsem.
Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
- Use it in driver_for_each_device() instead of the regular list_head and stop using
the bus's rwsem for protection.
- Use driver_for_each_device() in driver_detach() so we don't deadlock on the
bus's rwsem.
- Remove ->devices.
- Move klist access and sysfs link access out from under device's semaphore, since
they're synchronized through other means.
Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
- Use it in bus_for_each_drv().
- Use the klist spinlock instead of the bus rwsem.
Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
- Use it for bus_for_each_dev().
- Use the klist spinlock instead of the bus rwsem.
Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This klist interface provides a couple of structures that wrap around
struct list_head to provide explicit list "head" (struct klist) and
list "node" (struct klist_node) objects. For struct klist, a spinlock
is included that protects access to the actual list itself. struct
klist_node provides a pointer to the klist that owns it and a kref
reference count that indicates the number of current users of that node
in the list.
The entire point is to provide an interface for iterating over a list
that is safe and allows for modification of the list during the
iteration (e.g. insertion and removal), including modification of the
current node on the list.
It works using a 3rd object type - struct klist_iter - that is declared
and initialized before an iteration. klist_next() is used to acquire the
next element in the list. It returns NULL if there are no more items.
This klist interface provides a couple of structures that wrap around
struct list_head to provide explicit list "head" (struct klist) and
list "node" (struct klist_node) objects. For struct klist, a spinlock
is included that protects access to the actual list itself. struct
klist_node provides a pointer to the klist that owns it and a kref
reference count that indicates the number of current users of that node
in the list.
The entire point is to provide an interface for iterating over a list
that is safe and allows for modification of the list during the
iteration (e.g. insertion and removal), including modification of the
current node on the list.
It works using a 3rd object type - struct klist_iter - that is declared
and initialized before an iteration. klist_next() is used to acquire the
next element in the list. It returns NULL if there are no more items.
Internally, that routine takes the klist's lock, decrements the reference
count of the previous klist_node and increments the count of the next
klist_node. It then drops the lock and returns.
There are primitives for adding and removing nodes to/from a klist.
When deleting, klist_del() will simply decrement the reference count.
Only when the count goes to 0 is the node removed from the list.
klist_remove() will try to delete the node from the list and block
until it is actually removed. This is useful for objects (like devices)
that have been removed from the system and must be freed (but must wait
until all accessors have finished).
Internally, that routine takes the klist's lock, decrements the reference
count of the previous klist_node and increments the count of the next
klist_node. It then drops the lock and returns.
There are primitives for adding and removing nodes to/from a klist.
When deleting, klist_del() will simply decrement the reference count.
Only when the count goes to 0 is the node removed from the list.
klist_remove() will try to delete the node from the list and block
until it is actually removed. This is useful for objects (like devices)
that have been removed from the system and must be freed (but must wait
until all accessors have finished).
Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
diff -Nru a/include/linux/klist.h b/include/linux/klist.h
Now there's an iterator for accessing each device bound to a driver.
Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Index: linux-2.6.12-rc2/drivers/base/driver.c
===================================================================
This adds a per-device semaphore that is taken before every call from the core to a
driver method. This prevents e.g. simultaneous calls to the ->suspend() or ->resume()
and ->probe() or ->release(), potentially saving a whole lot of headaches.
It also moves us a step closer to removing the bus rwsem, since it protects the fields
in struct device that are modified by the core.
Signed-off-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
One step on improving the class api so that it can not be used incorrectly.
This also fixes the module owner issue with the dev files that happened when
the devt logic moved to the class core.
Based on a patch originally written by Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Driver core:
change driver's, bus's, class's and platform device's names
to be const char * so one can use
const char *drv_name = "asdfg";
when initializing structures.
Also kill couple of whitespaces.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
kobject: make kobject's name const char * since users should not
attempt to change it (except by calling kobject_rename).
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Below is a more generic patch to do fib_lookup via netlink. For others
we should say that we discussed this as a way to verify route selection.
It's also possible there are others uses for this.
In short the fist half of struct fib_result_nl is filled in by caller
and netlink call fills in the other half and returns it.
In case anyone is interested there is a corresponding user app to compare
the full routing table this was used to test implementation of the LC-trie.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the flag XFRM_STATE_NOPMTUDISC for xfrm states. It is
similar to the nopmtudisc on IPIP/GRE tunnels. It only has an effect
on IPv4 tunnel mode states. For these states, it will ensure that the
DF flag is always cleared.
This is primarily useful to work around ICMP blackholes.
In future this flag could also allow a larger MTU to be set within the
tunnel just like IPIP/GRE tunnels. This could be useful for short haul
tunnels where temporary fragmentation outside the tunnel is desired over
smaller fragments inside the tunnel.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jmorris@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes the format of the XFRM_MSG_DELSA and
XFRM_MSG_DELPOLICY notification so that the main message
sent is of the same format as that received by the kernel
if the original message was via netlink. This also means
that we won't lose the byid information carried in km_event.
Since this user interface is introduced by Jamal's patch
we can still afford to change it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduces a new macro NLMSG_NEW which extends NLMSG_PUT but takes
a flags argument. NLMSG_PUT stays there for compatibility but now
calls NLMSG_NEW with flags == 0. NLMSG_PUT_ANSWER is renamed to
NLMSG_NEW_ANSWER which now also takes a flags argument.
Also converts the users of NLMSG_PUT_ANSWER to use NLMSG_NEW_ANSWER
and fixes the two direct users of __nlmsg_put to either provide
the flags or use NLMSG_NEW(_ANSWER).
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
To retrieve the neighbour tables send RTM_GETNEIGHTBL with the
NLM_F_DUMP flag set. Every neighbour table configuration is
spread over multiple messages to avoid running into message
size limits on systems with many interfaces. The first message
in the sequence transports all not device specific data such as
statistics, configuration, and the default parameter set.
This message is followed by 0..n messages carrying device
specific parameter sets.
Although the ordering should be sufficient, NDTA_NAME can be
used to identify sequences. The initial message can be identified
by checking for NDTA_CONFIG. The device specific messages do
not contain this TLV but have NDTPA_IFINDEX set to the
corresponding interface index.
To change neighbour table attributes, send RTM_SETNEIGHTBL
with NDTA_NAME set. Changeable attribute include NDTA_THRESH[1-3],
NDTA_GC_INTERVAL, and all TLVs in NDTA_PARMS unless marked
otherwise. Device specific parameter sets can be changed by
setting NDTPA_IFINDEX to the interface index of the corresponding
device.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
RTA_GET_U(32|64)(tlv)
Assumes TLV is a u32/u64 field and returns its value.
RTA_GET_[M]SECS(tlv)
Assumes TLV is a u64 and transports jiffies converted
to seconds or milliseconds and returns its value.
RTA_PUT_U(32|64)(skb, type, value)
Appends %value as fixed u32/u64 to %skb as TLV %type.
RTA_PUT_[M]SECS(skb, type, jiffies)
Converts %jiffies to secs/msecs and appends it as u64
to %skb as TLV %type.
RTA_PUT_STRING(skb, type, string)
Appends %NUL terminated %string to %skb as TLV %type.
RTA_NEST(skb, type)
Starts a nested TLV %type and returns the nesting handle.
RTA_NEST_END(skb, nesting_handle)
Finishes the nested TLV %nesting_handle, must be called
symmetric to RTA_NEST(). Returns skb->len
RTA_NEST_CANCEL(skb, nesting_handle)
Cancel the nested TLV %nesting_handle and trim nested TLV
from skb again, returns -1.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
NLMSG_PUT_ANSWER(skb, nlcb, type, length)
Start a new netlink message as answer to a request,
returns the message header.
NLMSG_END(skb, nlh)
End a netlink message, fixes total message length,
returns skb->len.
NLMSG_CANCEL(skb, nlh)
Cancel the building process and trim whole message
from skb again, returns -1.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This chunks out the accept_queue and tcp_listen_opt code and moves
them to net/core/request_sock.c and include/net/request_sock.h, to
make it useful for other transport protocols, DCCP being the first one
to use it.
Next patches will rename tcp_listen_opt to accept_sock and remove the
inline tcp functions that just call a reqsk_queue_ function.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ok, this one just renames some stuff to have a better namespace and to
dissassociate it from TCP:
struct open_request -> struct request_sock
tcp_openreq_alloc -> reqsk_alloc
tcp_openreq_free -> reqsk_free
tcp_openreq_fastfree -> __reqsk_free
With this most of the infrastructure closely resembles a struct
sock methods subset.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kept this first changeset minimal, without changing existing names to
ease peer review.
Basicaly tcp_openreq_alloc now receives the or_calltable, that in turn
has two new members:
->slab, that replaces tcp_openreq_cachep
->obj_size, to inform the size of the openreq descendant for
a specific protocol
The protocol specific fields in struct open_request were moved to a
class hierarchy, with the things that are common to all connection
oriented PF_INET protocols in struct inet_request_sock, the TCP ones
in tcp_request_sock, that is an inet_request_sock, that is an
open_request.
I.e. this uses the same approach used for the struct sock class
hierarchy, with sk_prot indicating if the protocol wants to use the
open_request infrastructure by filling in sk_prot->rsk_prot with an
or_calltable.
Results? Performance is improved and TCP v4 now uses only 64 bytes per
open request minisock, down from 96 without this patch :-)
Next changeset will rename some of the structs, fields and functions
mentioned above, struct or_calltable is way unclear, better name it
struct request_sock_ops, s/struct open_request/struct request_sock/g,
etc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is for use with slab users that pass a dynamically allocated slab name in
kmem_cache_create, so that before destroying the slab one can retrieve the name
and free its memory.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heres the final patch.
What this patch provides
- netlink xfrm events
- ability to have events generated by netlink propagated to pfkey
and vice versa.
- fixes the acquire lets-be-happy-with-one-success issue
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This is a fixed-up version of the broken "upstream-2.6.13" branch, where
I re-did the manual merge of drivers/net/r8169.c by hand, and made sure
the history is all good.
warning when building with gcc -W :
include/linux/efi.h: In function `efi_range_is_wc':
include/linux/efi.h:320: warning: comparison between signed and unsigned
It looks to me like a significantly large 'len' passed in could cause the
loop to never end. Isn't it safer to make 'i' an unsigned long as well?
Like this little patch below (which of course also kills the warning) :
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch alows you to change the source address of icmp error
messages. It applies cleanly to 2.6.11.11 and retains the default
behaviour.
In the old (default) behaviour icmp error messages are sent with the ip
of the exiting interface.
The new behaviour (when the sysctl variable is toggled on), it will send
the message with the ip of the interface that received the packet that
caused the icmp error. This is the behaviour network administrators will
expect from a router. It makes debugging complicated network layouts
much easier. Also, all 'vendor routers' I know of have the later
behaviour.
Signed-off-by: David S. Miller <davem@davemloft.net>
<linux/if_tr.h> uses __be16, but does not directly include
<asm/byteorder.h>. Add this in, so that dhcp/net-tools token ring code
can compile again.
Signed-off-by: Tom Rini <trini@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now m68k no longer sets HAVE_ARCH_GET_SIGNAL_TO_DELIVER, can it be removed
completely? Or may ARM26 still need it? Note that its usage was removed from
kernel/signal.c about 2 months ago.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adds meta collectors for all socket attributes that make sense
to be filtered upon. Some of them are only useful for debugging
but having them doesn't hurt.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix 5700/5701 DMA write corruption on Apple G4 by detecting the Apple
UniNorth PCI 1.5 chipset and adjusting the DMA write boundary to 16. DMA
test fails to detect the problem with this chipset.
Thanks to Manuel Perez Ayala for reporting the problem and helping to
debug it.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Attached is a small patch for i945G support against 2.6.11.11.
From: Alan Hourihane <alanh@fairlite.demon.co.uk>
Signed-off-by: Dave Jones <davej@redhat.com>
I'm not sure why this issue is suddenly showing, but without this
patchlet, the zx1 config won't compile anymore (e.g., to see the
compilation-error, look for "***" in [1]).
[1] http://www.gelato.unsw.edu.au/kerncomp/results//2005-06-06-17-00/zx1_defconfig-log.html
Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On Wed, May 04, 2005 at 01:37:30PM -0700, David Brownell wrote:
> On Wednesday 04 May 2005 12:19 pm, Roman Kagan wrote:
> > struct urb {
> > /* private, usb core and host controller only fields in the urb */
> > ...
> > struct list_head urb_list; /* list pointer to all active urbs */
> > ...
> > };
> >
> > Is it safe to use it for driver's purposes when the driver owns the urb,
> > that is, starting from the completion routine until the urb is submitted
> > with usb_submit_urb()?
>
> Right now, it should be.
Great! FWIW I've briefly tested a modified version of usbatm using
the list head in struct urb instead of creating a wrapper struct, and I
haven't seen any failures yet. So I tend to believe that your "should
be" actually means "is" :)
> > If it is, can it be guaranteed in future, e.g.
> > by moving the list head into the public section of struct urb?
>
> In fact I'm not sure why it ever got called "private" to usbcore/hcds.
> I thought the idea was that it should be like urb->status, reserved for
> whoever controls the URB.
OK then how about the following (essentially documentation) patch?
Signed-off-by: Roman Kagan <rkagan@mail.ru>
Acked-by: David Brownell <david-b@pacbell.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When the hardware header size is a multiple of HH_DATA_MOD, HH_DATA_OFF()
incorrectly returns HH_DATA_MOD (instead of 0). This affects ieee80211 layer
as 802.11 header is 32 bytes long.
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
o use a semaphore instead of an opencoded and racy lock
o move locking out of shaper_kick and into the callers - most just
released the lock before calling shaper_kick
o remove in_interrupt() tests. from ->close we can always block, from
->hard_start_xmit and timer context never
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
given number of bytes from device. Change ps2_command to
allow using 0 as command ID and actually pass it to the
device instead of working as a drain.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Fix up comment in cpufreq.h stating transition latency should be passed
in microseconds -- it was decided long ago to switch to nanoseconds.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
This patch adds is_multicast_ether_addr() to go along with
is_valid_ether_addr() and friends. It then changes
is_valid_ether_addr() to use the new macro, and fixes up the comment
on that function to move implementation details out of the API doco.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an option to make secondary IP addresses get promoted
when primary IP addresses are removed from the device.
It defaults to off to preserve existing behavior.
Signed-off-by: Harald Welte <laforge@gnumonks.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The features field in netdevice is really a bitmask, and bitmask's should
be unsigned.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Resend of earlier patch (no changes) from Catalin used to provide
device feature change notification.
Signed-off-by: Catalin BOIE <catab at umbrella.ro>
Acked-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
serialize open and close calls and ensure that device's
open and close methods are only called when first user
opens it or last user closes it.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
I've tested it with a Logitech WingMan Rumblepad on an x86-64
machine, and on an ia32 machine to make sure I didn't break
anything.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
After porting this fixlet to UML:
http://linux.bkbits.net:8080/linux-2.5/cset@41791ab52lfMuF2i3V-eTIGRBbDYKQ
, I've also added a warning which should refuse compilation with insane values
for PREEMPT_ACTIVE... maybe we should simply move PREEMPT_ACTIVE out of
architectures using GENERIC_IRQS.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds dummy gameport_register_port, gameport_unregister_port
and gameport_set_phys functions to gameport.h for the case when a driver
can't use gameport.
This fixes the compilation of some OSS drivers with GAMEPORT=n without
the need to #if inside every single driver.
This patch also removes the non-working and now obsolete SOUND_GAMEPORT.
This patch is also an alternative solution for ALSA drivers with similar
problems (but #if's inside the drivers might have the advantage of
saving some more bytes of gameport is not available).
The only user-visible change is that for GAMEPORT=m the affected OSS
drivers are now allowed to be built statically (but they won't have
gameport support).
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Delete quirk_via_bridge(), restore quirk_via_irqpic() -- but now
improved to be invoked upon device ENABLE, and now only for VIA devices
-- not all devices behind VIA bridges.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jens Axboe pointed out that the iounmap() call in libata was occurring
too early, and some drivers (ahci, probably others) were using ioremap'd
memory after it had been unmapped.
The patch should address that problem by way of improving the libata
driver API:
* move ->host_stop() call after all ->port_stop() calls have occurred.
* create default helper function ata_host_stop(), and move iounmap()
call there.
* add ->host_stop_prewalk() hook, use it in sata_qstor.c (hi Mark).
sata_qstor appears to require the host-stop-before-port-stop ordering
that existed prior to applying the attached patch.
A new driver bnx2 for Broadcom bcm5706 is available.
The patch also includes new 1000BASE-X advertisement bit definitions in
mii.h
Thanks to David Miller and Jeff Garzik for reviewing and their valuable
feedback.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is a fixed up version of the reorder feature of netem.
It is the same as the earlier patch plus with the bugfix from Julio merged in.
Has expected backwards compatibility behaviour.
Go ahead and merge this one, the TCP strangeness I was seeing was due
to the reordering bug, and previous version of TSO patch.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
* add ide_bus_match() and export ide_bus_type
* split ide_remove_driver_from_hwgroup() out of ide_unregister()
* move device cleanup from ide_unregister() to drive_release_dev()
* convert ide_driver_t->name to driver->name
* convert ide_driver_t->{attach,cleanup} to driver->{probe,remove}
* remove ide_driver_t->busy as ide_bus_type->subsys.rwsem
protects against concurrent ->{probe,remove} calls
* make ide_{un}register_driver() void as it cannot fail now
* use driver_{un}register() directly, remove ide_{un}register_driver()
* use device_register() instead of ata_attach(), remove ata_attach()
* add proc_print_driver() and ide_drivers_show(), remove ide_drivers_op
* fix ide_replace_subdriver() and move it to ide-proc.c
* remove ide_driver_t->drives, ide_drives and drives_lock
* remove ide_driver_t->drivers, drivers and drivers_lock
* remove ide_drive_t->driver and DRIVER() macro
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@elka.pw.edu.pl>
Use LIST_HEAD_INIT rather than doing it by hand in DEFINE_WAIT.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Everybody does
struct packet_type foo_packet_type = {
.type = __constant_htons(ETH_P_FOO);
};
5 introduced warnings will be properly fixed later.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add 0x1601 as 5752M, it's a 5752 but for mobile PCs.
Stolen from Broadcom bcm5700-8.1.55 driver.
Someone forgot to add it to tg3 ;-)
Signed-off-by: David S. Miller <davem@davemloft.net>
Move audit_serial() into audit.c and use it to generate serial numbers
on messages even when there is no audit context from syscall auditing.
This allows us to disambiguate audit records when more than one is
generated in the same millisecond.
Based on a patch by Steve Grubb after he observed the problem.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
In _spin_unlock_bh(lock):
do { \
_raw_spin_unlock(lock); \
preempt_enable(); \
local_bh_enable(); \
__release(lock); \
} while (0)
there is no reason for using preempt_enable() instead of a simple
preempt_enable_no_resched()
Since we know bottom halves are disabled, preempt_schedule() will always
return at once (preempt_count!=0), and hence preempt_check_resched() is
useless here...
This fixes it by using "preempt_enable_no_resched()" instead of the
"preempt_enable()", and thus avoids the useless preempt_check_resched()
just before re-enabling bottom halves.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Defines for the different command classes as defined in the MMC and SD
specifications.
Removes the check for high command classes and instead checks that the
command classes needed are present.
Previous solution killed forward compatibility at no apparent gain.
Signed-of-by: Pierre Ossman
This patch changes the SELinux AVC to defer logging of paths to the audit
framework upon syscall exit, by saving a reference to the (dentry,vfsmount)
pair in an auxiliary audit item on the current audit context for processing
by audit_log_exit.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Caused oopses again. Also fix potential mismatch in checking if
change_page_attr was needed.
To do it without races I needed to change mm/vmalloc.c to export a
__remove_vm_area that does not take vmlist lock.
Noticed by Terence Ripperda and based on a patch of his.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a device driver for scsi media changer devices.
Signed-off-by: Gerd Knorr <kraxel@bytesex.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
blk_insert_request() has a unobivous feature of requeuing a
request setting REQ_SPECIAL|REQ_SOFTBARRIER. SCSI midlayer
was the only user and as previous patches removed the usage,
remove the feature from blk_insert_request(). Only special
requests should be queued with blk_insert_request(). All
requeueing should go through blk_requeue_request().
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
As noted by Chris Wright, we need to do the full range of tests regardless
of whether MAP_FIXED is set or not, so re-organize get_unmapped_area()
slightly to do the sanity checks unconditionally.
It's silly to have to add explicit entries for new userspace messages
as we invent them. Just treat all messages in the user range the same.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
The driver model has a "detach_state" mechanism that:
- Has never been used by any in-kernel drive;
- Is superfluous, since driver remove() methods can do the same thing;
- Became buggy when the suspend() parameter changed semantics and type;
- Could self-deadlock when called from certain suspend contexts;
- Is effectively wasted documentation, object code, and headspace.
This removes that "detach_state" mechanism; net code shrink, as well
as a per-device saving in the driver model and sysfs.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The attached patch updates generic HDLC to version 1.18.
FR Cisco LMI production-tested. Please apply to Linux 2.6. Thanks.
Changes:
- doc updates
- added Cisco LMI support to Frame-Relay code
- cleaned hdlc_fr.c a bit, removed some orphaned #defines etc.
- fixed a problem with non-functional LMI in FR DCE mode.
- changed diagnostic messages to better conform to FR standards
- all protocols: information about carrier changes (DCD line) is now
printed to kernel logs.
Signed-Off-By: Krzysztof Halasa <khc@pm.waw.pl>
Updated patch to fix erroneous flush of COMRESET set and missing flush
of COMRESET clear. Created a new routine scr_write_flush() to try to
prevent this in the future. Also, this patch is based on libata-2.6
instead of the previous libata-dev-2.6 based patch.
Signed-off-by: Brett Russ <russb@emc.com>
Index: libata-2.6/drivers/scsi/libata-core.c
===================================================================
This patch adds support for the davicom dm9000 network driver. The dm9000
is found on some embedded arm boards such as the pimx1 or the scb9328.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
diff -puN /dev/null drivers/net/dm9000.c
I'm going through the kernel code and have a patch that corrects
several spelling errors in comments.
From: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
This patch adds more messages types to the audit subsystem so that audit
analysis is quicker, intuitive, and more useful.
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
---
I forgot one type in the big patch. I need to add one for user space
originating SE Linux avc messages. This is used by dbus and nscd.
-Steve
---
Updated to 2.6.12-rc4-mm1.
-dwmw2
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
This is version 18 of the Wireless Extensions. The main change
is that it adds all the necessary APIs for WPA and WPA2 support. This
work was entirely done by Jouni Malinen, so let's thank him for both
his hard work and deep expertise on the subject ;-)
This APIs obviously doesn't do much by itself and works in
concert with driver support (Jouni already sent you the HostAP
changes) and userspace (Jouni is updating wpa_supplicant). This is
also orthogonal with the ongoing work on in-kernel IEEE support (but
potentially useful).
The patch is attached, tested with 2.6.11. Normally, I would
ask you to push that directly in the kernel (99% of the patch has been
on my web page for ages and it does not affect non-WPA stuff), but
Jouni convinced me that it should bake a few weeks in wireless-2.6
first, so that other driver maintainers can get up to speed with it.
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This patch works around an issue with WD drives (and possibly others)
over SiL PATA->SATA Bridges on SATA controllers locking up with
transfers > 200 sectors.
Signed-off-by: Brad Campbell <brad@wasp.net.au>
Add audit_log_type to allow callers to specify type and pid when logging.
Convert audit_log to wrapper around audit_log_type. Could have
converted all audit_log callers directly, but common case is default
of type AUDIT_KERNEL and pid 0. Update audit_log_start to take type
and pid values when creating a new audit_buffer. Move sequences that
did audit_log_start, audit_log_format, audit_set_type, audit_log_end,
to simply call audit_log_type directly. This obsoletes audit_set_type
and audit_set_pid, so remove them.
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Remove code conditionally dependent on CONFIG_AUDITSYSCALL from audit.c.
Move these dependencies to audit.h with the rest.
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Add uart_insert_char(), which handles inserting characters into the
flip buffer. This helper function handles the correct semantics
for handling overrun in addition to inserting normal characters.
Signed-off-by: Russell King <rmk@arm.linux.org.uk>
shutdown credential information. It creates a new message type
AUDIT_TERM_INFO, which is used by the audit daemon to query who issued the
shutdown.
It requires the placement of a hook function that gathers the information. The
hook is after the DAC & MAC checks and before the function returns. Racing
threads could overwrite the uid & pid - but they would have to be root and
have policy that allows signalling the audit daemon. That should be a
manageable risk.
The userspace component will be released later in audit 0.7.2. When it
receives the TERM signal, it queries the kernel for shutdown information.
When it receives it, it writes the message and exits. The message looks
like this:
type=DAEMON msg=auditd(1114551182.000) auditd normal halt, sending pid=2650
uid=525, auditd pid=1685
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Ross moved. Remove the bad email address so people will find the correct
one in ./CREDITS.
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Just a few small cleanups to make this coherent english.
Signed-Off-By: Martin Hicks <mort@wildopensource.com>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
compile warning cleanup - suggested by Adrian Bunk; remove unmaintained rcs
char strings from source and handle the occurrences of their use, make sure
kernel-userspace issues taken care of; break out into separate patch
Signed-off-by: Stephen Biggs <yrgrknmxpzlk@gawab.com>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add some comments about task->comm, to explain what it is near its definition
and provide some important pointers to its uses.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes some needlessly global identifiers static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Arjan van de Ven <arjanv@infradead.org>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This had a fatal lock ranking bug: we do journal_start outside
mpage_writepages()'s lock_page().
Revert the whole thing, think again.
Credit-to: Jan Kara <jack@suse.cz>
For identifying the bug.
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The only caller that ever sets it can call fsync_bdev itself easily. Also
update some comments.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds support for a new class of DAC960 controllers. It's based
on the GPLed idac320 driver from IBM for Linux 2.4.18. That driver is a
fork of the 2.4.18 version of DAC960 that adds support for this new type of
controllers (internally called "GEM Series"), that differ from other DAC960
V2 firmware controllers only in the register offsets and removes support
for all others.
This patch instead integrates support for these controllers into the DAC960
driver.
Thanks to Anders Norrbring for pointing me to the idac320 driver and
testing this patch.
No Signed-Off: line because all code is either copy & pasted from IBM's
idac320 driver or support for other controllers in the 2.6 DAC960 driver.
Note: the really odd formating matches the rest of the DAC960 driver.
Cc: Dave Olien <dmo@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allow registration of multiple kprobes at an address in an architecture
agnostic way. Corresponding handlers will be invoked in a sequence. But,
a kprobe and a jprobe can't (yet) co-exist at the same address.
Signed-off-by: Ananth N Mavinakayanahalli <amavin@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fixes for big-endian systems in soundcard.h and awe_voice.h
This patch fixes the AFMT_S16_NE (include/linux/soundcard.h) and AWE_PATCH
(awe_voice.h) macros on big-endian systems.
It also moves _PATCHKEY into a new file, patchkey.h, in order to remove a
duplicate definition of it from awe_voice.h.
Signed-off-by: Stuart Brady <sdbrady@ntlworld.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
this matches the API used by other link layer like ethernet or token
ring.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now pci drivers can know when the system is going down without having to
add a reboot notifier event.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Converts remaining rtnetlink_link tables to use c99 designated
initializers to make greping a little bit easier.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Converts rtm_min and rtm_max arrays to use c99 designated
initializers for easier insertion of new message families.
RTM_GETMULTICAST and RTM_GETANYCAST did not have the minimal
message size specified which means that the netlink message
was parsed for routing attributes starting from the header.
Adds the proper minimal message sizes for these messages
(netlink header + common rtnetlink header) to fix this issue.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
RTM_MAX is currently set to the maximum reserverd message type plus one
thus being the cause of two bugs for new types being assigned a) given the
new family registers only the NEW command in its reserved block the array
size for per family entries is calculated one entry short and b) given the
new family registers all commands RTM_MAX would point to the first entry
of the block following this one and the rtnetlink receive path would accept
a message type for a nonexisting family.
This patch changes RTM_MAX to point to the maximum valid message type
by aligning it to the start of the next block and subtracting one.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Converts xfrm_msg_min and xfrm_dispatch to use c99 designated
initializers to make greping a little bit easier. Also replaces
two hardcoded message type with meaningful names.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Makes the type > XFRM_MSG_MAX check behave correctly to
protect access to xfrm_dispatch.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Another large rollup of various patches from Adrian which make things static
where they were needlessly exported.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some KernelDoc descriptions are updated to match the current code.
No code changes.
Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I have recompiled Linux kernel 2.6.11.5 documentation for me and our
university students again. The documentation could be extended for more
sources which are equipped by structured comments for recent 2.6 kernels. I
have tried to proceed with that task. I have done that more times from 2.6.0
time and it gets boring to do same changes again and again. Linux kernel
compiles after changes for i386 and ARM targets. I have added references to
some more files into kernel-api book, I have added some section names as well.
So please, check that changes do not break something and that categories are
not too much skewed.
I have changed kernel-doc to accept "fastcall" and "asmlinkage" words reserved
by kernel convention. Most of the other changes are modifications in the
comments to make kernel-doc happy, accept some parameters description and do
not bail out on errors. Changed <pid> to @pid in the description, moved some
#ifdef before comments to correct function to comments bindings, etc.
You can see result of the modified documentation build at
http://cmp.felk.cvut.cz/~pisa/linux/lkdb-2.6.11.tar.gz
Some more sources are ready to be included into kernel-doc generated
documentation. Sources has been added into kernel-api for now. Some more
section names added and probably some more chaos introduced as result of quick
cleanup work.
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds to the fbdev interface a set_cmap callback that allow the
driver to "batch" palette changes. This is useful for drivers like
radeonfb which might require lenghtly workarounds on palette accesses, thus
allowing to factor out those workarounds efficiently.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since we only access reiserfs_key ->u.k_offset_v2 guts in four helper
functions, we are free to sanitize those, as long as
- layout of the structure is unchanged (it's on-disk object)
- behaviour of these helpers is same as before.
Patch kills the mess with endianness-dependent bitfields and replaces them
with a single __le64. Helpers are switched to straightforward shift/and/or.
Benefits:
- exact same definitions for little- and big-endian architectures; no ifdefs
in sight.
- generate the same code on little-endian and improved on big-endian.
- doesn't rely on lousy bitfields handling in gcc codegenerator.
- happens to be standard C (unsigned long long is not a valid type for a
bitfield; it's a gccism and not well-implemented one, at that).
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
comp_short_keys() massaged into sane form, which kills the last place where
pointer to in_core_key (or any object containing such) would be cast to or
from something else. At that point we are free to change layout of
in_core_key - nothing depends on it anymore.
So we drop the mess with union in there and simply use (unconditional) __u64
k_offset and __u8 k_type instead; places using in_core_key switched to those.
That gives _far_ better code than current mess - on all platforms.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fixes for a couple of bugs exposed by the above: le32_to_cpu() used on 16bit
value and missing conversion in comparison of host- and little-endian values.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
little-endian objects annotated as such; again, obviously no changes of
resulting code, we only replace __u16 with __le16, etc. in relevant places.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
struct reiserfs_key cloned; (currently) identical struct in_core_key added.
Places that expect host-endian data in reiserfs_key switched to in_core_key.
Basically, we get annotation of reiserfs_key users and keep the resulting tree
obviously equivalent to original.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Bump autofs4 version so we know what's going on.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a new function valid_signal() that tests if its argument is
a valid signal number.
The reasons for adding this new function are:
- some code currently testing _NSIG directly has off-by-one errors.
Using this function instead avoids such errors.
- some code currently tests unsigned signal numbers for <0 which is
pointless and generates warnings when building with gcc -W. Using this
function instead avoids such warnings.
I considered various places to add this function but eventually settled on
include/linux/signal.h as the most logical place for it. If there's some
reason this is a bad choice then please let me know (hints as to a better
location are then welcome of course).
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The synchronize_kernel() primitive is used for quite a few different purposes:
waiting for RCU readers, waiting for NMIs, waiting for interrupts, and so on.
This makes RCU code harder to read, since synchronize_kernel() might or might
not have matching rcu_read_lock()s. This patch creates a new
synchronize_rcu() that is to be used for RCU readers and a new
synchronize_sched() that is used for the rest. These two new primitives
currently have the same implementation, but this is might well change with
additional real-time support. Both new primitives are GPL-only, the old
primitive is deprecated.
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a deprecated_for_modules macro that allows symbols to be deprecated only
when used by modules, as suggested by Andrew Morton some months back.
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attached patch moves the IRQ-related SA_xxx flags (namely, SA_PROBE,
SA_SAMPLE_RANDOM and SA_SHIRQ) from all the arch-specific headers to
linux/signal.h. This looks like a left-over after the irq-handling code
was consolidated. The code was moved to kernel/irq/*, but the flags are
still left per-arch.
Right now, adding a new IRQ flag to the arch-specific header, like this
patch does:
http://cvs.sourceforge.net/viewcvs.py/*checkout*/alsa/alsa-driver/utils/patches/pcsp-kernel-2.6.10-03.diff?rev=1.1
no longer works, it breaks the compilation for all other arches, unless you
add that flag to all the other arch-specific headers too. So I think such
a clean-up makes sense.
Signed-off-by: Stas Sergeev <stsp@aknet.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Arrange for all kernel printks to be no-ops. Only available if
CONFIG_EMBEDDED.
This patch saves about 375k on my laptop config and nearly 100k on minimal
configs.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a pair of rlimits for allowing non-root tasks to raise nice and rt
priorities. Defaults to traditional behavior. Originally written by
Chris Wright.
The patch implements a simple rlimit ceiling for the RT (and nice) priorities
a task can set. The rlimit defaults to 0, meaning no change in behavior by
default. A value of 50 means RT priority levels 1-50 are allowed. A value of
100 means all 99 privilege levels from 1 to 99 are allowed. CAP_SYS_NICE is
blanket permission.
(akpm: see http://www.uwsg.iu.edu/hypermail/linux/kernel/0503.1/1921.html for
tips on integrating this with PAM).
Signed-off-by: Matt Mackall <mpm@selenic.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
GCC 2.95 uses __va_copy instead of va_copy. Handle it inside compiler.h
instead of in a casual file, and avoid the risk that this breaks with a newer
compiler (which it could do).
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The specifications that talk about E820 map doesn't have an upper limit on
the number of e820 entries. But, today's kernel has a hard limit of 32.
With increase in memory size, we are seeing the number of E820 entries
reaching close to 32. Patch below bumps the number upto 128.
The patch changes the location of EDDBUF in zero-page (as it comes after E820).
As, EDDBUF is not used by boot loaders, this patch should not have any effect
on bootloader-setup code interface.
Patch covers both i386 and x86-64.
Tested on:
* grub booting bzImage
* lilo booting bzImage with EDID info enabled
* pxeboot of bzImage
Side-effect:
bss increases by ~ 2K and init.data increases by ~7.5K
on all systems, due to increase in size of static arrays.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds the Intel ICH7DH and ICH7-M DH DID's to the irq.c and
pci_ids.h files.
Signed-off-by: Jason Gaston <Jason.d.gaston@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch by Jaya Kumar introduces a generic infrastructure to deal with
x86 chipsets with nonstandard reset sequences, and adds support for the
Geode gx1/cs5530a chipset.
Signed-off-by: Jaya Kumar <jayalk@intworks.biz>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds support for the special adb buttons of the aluminium
PowerBook G4.
Signed-off-by: Andreas Jaggi <andreas.jaggi@waterwave.ch>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a patch for counting the number of pages for bounce buffers. It's
shown in /proc/vmstat.
Currently, the number of bounce pages are not counted anywhere. So, if
there are many bounce pages, it seems that there are leaked pages. And
it's difficult for a user to imagine the usage of bounce pages. So, it's
meaningful to show # of bouce pages.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mempools have 2 problems.
The first is that mempool_alloc can possibly get stuck in __alloc_pages
when they should opt to fail, and take an element from their reserved pool.
The second is that it will happily eat emergency PF_MEMALLOC reserves
instead of going to their reserved pools.
Fix the first by passing __GFP_NORETRY in the allocation calls in
mempool_alloc. Fix the second by introducing a __GFP_MEMPOOL flag which
directs the page allocator not to allocate from the reserve pool.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Address bug #4508: there's potential for wraparound in the various places
where we perform RLIMIT_AS checking.
(I'm a bit worried about acct_stack_growth(). Are we sure that vma->vm_mm is
always equal to current->mm? If not, then we're comparing some other
process's total_vm with the calling process's rlimits).
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Attached is a new patch that solves the issue of getting valid credentials
into the LOGIN message. The current code was assuming that the audit context
had already been copied. This is not always the case for LOGIN messages.
To solve the problem, the patch passes the task struct to the function that
emits the message where it can get valid credentials.
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Most audit control messages are sent over netlink.In order to properly
log the identity of the sender of audit control messages, we would like
to add the loginuid to the netlink_creds structure, as per the attached
patch.
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Attached is a patch that corrects a signed/unsigned warning. I also noticed
that we needlessly init serial to 0. That only needs to occur if the kernel
was compiled without the audit system.
-Steve Grubb
Signed-off-by: David Woodhouse <dwmw2@infradead.org>