This fixes a data corruption error for mail delivery applications that
expect to be able to do posix locking and then append writes on NFS.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
and add_to_page_cache fails.
Thanks to Shaggy for pointing out the fix.
Signed-off-by: Steve French (sfrench@us.ibm.com)
Signed-off-by: Shaggy (shaggy@us.ibm.com)
We should never apply a lookup intent to anything other than the last
path component in an open(), create() or access() call.
Introduce the helper nfs_lookup_check_intent() which always returns
zero if LOOKUP_CONTINUE or LOOKUP_PARENT are set, and returns the
intent flags if we're on the last component of the lookup.
By doing so, we fix a bug in open(O_EXCL), where we may end up
optimizing away a real lookup of the parent directory.
Problem noticed by Linda Dunaphant <linda.dunaphant@ccur.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make sure that binfmt_flat passes the correct flags into do_mmap(). nommu's
validate_mmap_request() will simple return -EINVAL if we try and pass it a
flags value of zero.
Signed-off-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
__do_follow_link() passes potentially worng vfsmount to touch_atime(). It
matters only in (currently impossible) case of symlink mounted on something,
but it's trivial to fix and that actually makes more sense.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Conditional mntput() moved into __do_follow_link(). There it collapses with
unconditional mntget() on the same sucker, closing another too-early-mntput()
race.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Getting rid of sloppy logics:
a) in do_follow_link() we have the wrong vfsmount dropped if our symlink
had been mounted on something. Currently it worls only because we never
get such situation (modulo filesystem playing dirty tricks on us). And
it obfuscates already convoluted logics...
b) same goes for open_namei().
c) in __link_path_walk() we have another "it should never happen" sloppiness -
out_dput: there does double-free on underlying vfsmount and leaks the covering
one if we hit it just after crossing a mountpoint. Again, wrong vfsmount
getting dropped.
d) another too-early-mntput() race - in do_follow_mount() we need to postpone
conditional mntput(path->mnt) until after dput(path->dentry). Again, this one
happens only in it-currently-never-happens-unless-some-fs-plays-dirty
scenario...
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
shifted conditional mntput() into do_follow_link() - all callers were doing
the same thing.
Obviously equivalent transformation.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In open_namei() exit_dput: we have mntput() done in the wrong order -
if nd->mnt != path.mnt we end up doing
mntput(nd->mnt);
nd->mnt = path.mnt;
dput(nd->dentry);
mntput(nd->mnt);
which drops nd->dentry too late. Fixed by having path.mnt go first.
That allows to switch O_NOFOLLOW under if (__follow_mount(...)) back
to exit_dput, while we are at it.
Fix for early-mntput() race + equivalent transformation.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In open_namei() we take mntput(nd->mnt);nd->mnt=path.mnt; out of the if
(__follow_mount(...)), making it conditional on nd->mnt != path.mnt instead.
Then we shift the result downstream.
Equivalent transformations.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In open_namei(), __follow_down() loop turned into __follow_mount().
Instead of
if we are on a mountpoint dentry
if O_NOFOLLOW checks fail
drop path.dentry
drop nd
return
do equivalent of follow_mount(&path.mnt, &path.dentry)
nd->mnt = path.mnt
we do
if __follow_mount(path) had, indeed, traversed mountpoint
/* now both nd->mnt and path.mnt are pinned down */
if O_NOFOLLOW checks fail
drop path.dentry
drop path.mnt
drop nd
return
mntput(nd->mnt)
nd->mnt = path.mnt
Now __follow_down() can be folded into follow_down() - no other callers left.
We need to reorder dput()/mntput() there - same problem as in follow_mount().
Equivalent transformation + fix for a bug in O_NOFOLLOW handling - we used to
get -ELOOP if we had the same fs mounted on /foo and /bar, had something bound
on /bar/baz and tried to open /foo/baz with O_NOFOLLOW. And fix of
too-early-mntput() race in follow_down()
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
New helper: __follow_mount(struct path *path). Same as follow_mount(), except
that we do *not* do mntput() after the first lookup_mnt().
IOW, original path->mnt stays pinned down. We also take care to do dput()
before mntput() in the loop body (follow_mount() also needs that reordering,
but that will be done later in the series).
The following are equivalent, assuming that path.mnt == x:
(1)
follow_mount(&path.mnt, &path.dentry)
(2)
__follow_mount(&path);
if (path->mnt != x)
mntput(x);
(3)
if (__follow_mount(&path))
mntput(x);
Callers of follow_mount() in __link_path_walk() converted to (2).
Equivalent transformation + fix for too-late-mntput() race in __follow_mount()
loop.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In open_namei() we never use path.mnt or path.dentry after exit: or ok:.
Assignment of path.dentry in case of LAST_BIND is dead code and only
obfuscates already convoluted function; assignment of path.mnt after
__do_follow_link() can be moved down to the place where we set path.dentry.
Obviously equivalent transformations, just to clean the air a bit in that
region.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The first argument of __do_follow_link() switched to struct path *
(__do_follow_link(path->dentry, ...) -> __do_follow_link(path, ...)).
All callers have the same calls of mntget() right before and dput()/mntput()
right after __do_follow_link(); these calls have been moved inside.
Obviously equivalent transformations.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
mntget(path->mnt) in do_follow_link() moved down to right before the
__do_follow_link() call and rigth after loop: resp.
dput()+mntput() on non-ELOOP branch moved up to right after __do_follow_link()
call.
resulting
loop:
mntget(path->mnt);
path_release(nd);
dput(path->mnt);
mntput(path->mnt);
replaced with equivalent
dput(path->mnt);
path_release(nd);
Equivalent transformations - the reason why we have that mntget() is that
__do_follow_link() can drop a reference to nd->mnt and that's what holds
path->mnt. So that call can happen at any point prior to __do_follow_link()
touching nd->mnt. The rest is obvious.
NOTE: current tree relies on symlinks *never* being mounted on anything. It's
not hard to get rid of that assumption (actually, that will come for free
later in the series). For now we are just not making the situation worse than
it is.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fix for too early mntput() in open_namei() - we pin path.mnt down for the
duration of __do_follow_link(). Otherwise we could get the fs where our
symlink lived unmounted while we were in __do_follow_link(). That would end
up with dentry of symlink staying pinned down through the fs shutdown.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
path.mnt in open_namei() set to mirror nd->mnt.
nd->mnt is set in 3 places in that function - path_lookup() in the beginning,
__follow_down() loop after do_last: and __do_follow_link() call after
do_link:.
We set path.mnt to nd->mnt after path_lookup() and __do_follow_link(). In
__follow_down() loop we use &path.mnt instead of &nd->mnt and set nd->mnt to
path.mnt immediately after that loop.
Obviously equivalent transformation.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replaced struct dentry *dentry in namei with struct path path. All uses of
dentry replaced with path.dentry there.
Obviously equivalent transformation.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
All callers of do_follow_link() do mntget() right before it and
dput()+mntput() right after. These calls are moved inside do_follow_link()
now.
Obviously equivalent transformation.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
OK, here comes a patch series that hopefully should close all
too-early-mntput() races in fs/namei.c. Entire area is convoluted as hell, so
I'm splitting that series into _very_ small chunks.
Patches alread in the tree close only (very wide) races in following symlinks
(see "busy inodes after umount" thread some time ago). Unfortunately, quite a
few narrower races of the same nature were not closed. Hopefully this should
take care of all of them.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When fsync() runs wait_on_page_writeback_range() it only inspects pages which
are actually under I/O (PAGECACHE_TAG_WRITEBACK). If a page completed I/O
prior to wait_on_page_writeback_range() looking at it, it is supposed to have
recorded its I/O error state in the address_space.
But mpage_mpage_end_io_write() forgot to set the address_space error flag in
this case.
Signed-off-by: Qu Fuping <fs@ercist.iscas.ac.cn>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a bug in list scanning that can cause us to skip the last buffer on the
checkpoint list (and hence fail to do any progress under some rather
unfavorable conditions).
The problem is we first do jh=next_jh and then test
} while (jh!=last_jh);
Hence we skip the last buffer on the list (if it was not the only buffer on
the list). As we already do jh=next_jh; in the beginning of the loop we
are safe to just remove the assignment in the end. It can happen that 'jh'
will be freed at the point we test jh != last_jh but that does not matter
as we never *dereference* the pointer.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix possible false assertion failure in log_do_checkpoint(). We might fail
to detect that we actually made a progress when cleaning up the checkpoint
lists if we don't retry after writing something to disk. The patch was
confirmed to fix observed assertion failures for several users.
When we flushed some buffers we need to retry scanning the list.
Otherwise we can fail to detect our progress.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This cleans up the /proc/device-tree representation of the Open Firmware
device-tree on ppc and ppc64. It does the following things:
- Workaround an issue in some Apple device-trees where a property may
exist with the same name as a child node of the parent. We now
simply "drop" the property instead of creating duplicate entries in
/proc with random result...
- Do not try to chop off the "@0" at the end of a node name whose unit
address is 0. This is not useful, inconsistent, and the code was
buggy and didn't always work anyway.
- Do not create symlinks for the short name and unit address parts of a
node. These were never really used, bloated the memory footprint of
the device-tree with useless struct proc_dir_entry and their matching
dentry and inode cache bloat.
This results in smaller code, smaller memory footprint, and a more
accurate view of the tree presented to userland.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
in fs/udf/udftime.c the global array '__mon_yday' is not static, and it
conflicts with the glibc one when the kernel is compiled as user mode.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove old useless header that was used in Ye Olde Times during 2.4->2.5
porting to abstract differences. It's definitions are no more used anyway, so
let's finally kill it.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes a bug introduced by Al Viro's patch: [patch 136/174]
reiserfs endianness: clone struct reiserfs_key
The problem is MAX_KEY and MAX_IN_CORE_KEY defined in this patch do not
look equal from reiserfs comp_key's point of view. This caused reiserfs'
sanity check to complain.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
currently it opencodes it, but that's in the way of chaning the
lookup_hash interface.
I'd prefer to disallow modular af_unix over exporting lookup_create,
but I'll leave that to you.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid console spam with ext3 aborted journal.
ext3 usually reports error conditions that it detects in its environment.
But when its journal gets aborted due to such errors, it can sometimes
continue to report that condition forever, spamming the console to such
an extent that the initial first cause of the journal abort can be lost.
When the journal aborts, we put the filesystem into readonly mode. Most
subsequent filesystem operations will get rejected immediately by checks
for MS_RDONLY either in the filesystem or in the VFS. But some paths do
not have such checks --- for example, if we continue to write to a file
handle that was opened before the fs went readonly. (We only check for
the ROFS condition when the file is first opened.) In these cases, we
can continue to generate log errors similar to
EXT3-fs error (device $DEV) in start_transaction: Journal has aborted
for each subsequent write.
There is really no point in generating these errors after the initial
error has been fully reported. Specifically, if we're starting a
completely new filesystem operation, and the filesystem is *already*
readonly (ie. the ext3 layer has already detected and handled the
underlying jbd abort), and we see an EROFS error, then there is simply
no point in reporting it again.
Signed-off-by: Stephen Tweedie <sct@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If block_read_full_page() detects an error when running get_block() it will
run SetPageError(), then it will zero out the block in pagecache and will mark
the buffer_head uptodate.
So at the end of readahead we end up with a non-uptodate pagecache page which
is marked PageError. But it has uptodate buffers.
The pagefault code will run ClearPageError, will launch readpage a second time
and block_read_full_page() will notice the uptodate buffers and will mark the
page uptodate as well. We end up with an uptodate, !PageError page full of
zeros and the error is lost.
(It seems a little odd that filemap_nopage() runs ClearPageError(). I guess
all of this adds up to meaning that for each attempted access to the page, the
pagefault handler will retry the I/O. Which is good and bad. If the app is
ignoring SIGBUS for some reason we could get a lot of back-to-back I/O
errors.)
Fix it by not marking the pagecache buffer_head as uptodate if the attempt to
map that buffer to a disk block failed.
Credit-to: Qu Fuping <fs@ercist.iscas.ac.cn>
For reporting the bug and identifying its source.
Signed-off-by: Qu Fuping <fs@ercist.iscas.ac.cn>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
VmallocTotal: 34359738367 kB
VmallocUsed: 266288 kB
VmallocChunk: 18014366299193295 kB
is unsettling - x86_64 and some other architectures keep a separate address
range for modules in vmalloc's vmlist, which /proc/meminfo should pass over.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This change from March 3rd causes the partition parsing code to ignore
partitions which have a signature byte of zero. Turns out that more people
have such partitions than we expected, and their device numbering is coming up
wrong in post-2.6.11 kernels.
So revert the change while we think about the problem a bit more.
Cc: Andries Brouwer <Andries.Brouwer@cwi.nl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This may be the cause of several open PV's of incorrect
delay flags being set and then tripping asserts.
Do not return a delay alloc extent when the caller is asking to do a write.
SGI Modid: xfs-linux:xfs-kern:189616a
Signed-off-by: Russell Cattelan <cattelan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@sgi.com>