Commit Graph

21482 Commits

Author SHA1 Message Date
J. Bruce Fields d891eedbc3 fs/dcache: allow d_obtain_alias() to return unhashed dentries
Without this patch, inodes are not promptly freed on last close of an
unlinked file by an nfs client:

	client$ mount -tnfs4 server:/export/ /mnt/
	client$ tail -f /mnt/FOO
	...
	server$ df -i /export
	server$ rm /export/FOO
	(^C the tail -f)
	server$ df -i /export
	server$ echo 2 >/proc/sys/vm/drop_caches
	server$ df -i /export

the df's will show that the inode is not freed on the filesystem until
the last step, when it could have been freed after killing the client's
tail -f. On-disk data won't be deallocated either, leading to possible
spurious ENOSPC.

This occurs because when the client does the close, it arrives in a
compound with a putfh and a close, processed like:

	- putfh: look up the filehandle.  The only alias found for the
	  inode will be DCACHE_UNHASHED alias referenced by the filp
	  this, so it creates a new DCACHE_DISCONECTED dentry and
	  returns that instead.
	- close: closes the existing filp, which is destroyed
	  immediately by dput() since it's DCACHE_UNHASHED.
	- end of the compound: release the reference
	  to the current filehandle, and dput() the new
	  DCACHE_DISCONECTED dentry, which gets put on the
	  unused list instead of being destroyed immediately.

Nick Piggin suggested fixing this by allowing d_obtain_alias to return
the unhashed dentry that is referenced by the filp, instead of making it
create a new dentry.

Leave __d_find_alias() alone to avoid changing behavior of other
callers.

Also nfsd doesn't need all the checks of __d_find_alias(); any dentry,
hashed or unhashed, disconnected or not, should work.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10 05:18:54 -05:00
Marco Stornelli 1ca551c6ca Check for immutable/append flag in fallocate path
In the fallocate path the kernel doesn't check for the immutable/append
flag. It's possible to have a race condition in this scenario: an
application open a file in read/write and it does something, meanwhile
root set the immutable flag on the file, the application at that point
can call fallocate with success. In addition, we don't allow to do any
unreserve operation on an append only file but only the reserve one.

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10 04:22:15 -05:00
Al Viro 9177ada99d fat: fix d_revalidate oopsen on NFS exports
can't blindly check nd->flags in ->d_revalidate()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10 03:45:49 -05:00
Al Viro 8ce84eeb5b jfs: fix d_revalidate oopsen on NFS exports
can't blindly check nd->flags in ->d_revalidate()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10 03:45:28 -05:00
Al Viro 4714e63731 ocfs2: fix d_revalidate oopsen on NFS exports
can't blindly check nd->flags in ->d_revalidate()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10 03:45:07 -05:00
Al Viro 53fe924161 gfs2: fix d_revalidate oopsen on NFS exports
can't blindly check nd->flags in ->d_revalidate()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10 03:44:48 -05:00
Al Viro 529c5f958f fuse: fix d_revalidate oopsen on NFS exports
can't blindly check nd->flags in ->d_revalidate()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10 03:44:31 -05:00
Al Viro 0eb980e317 ceph: fix d_revalidate oopsen on NFS exports
can't blindly check nd->flags in ->d_revalidate()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10 03:44:05 -05:00
Al Viro c78f4cc5e7 reiserfs xattr ->d_revalidate() shouldn't care about RCU
... it returns an error unconditionally

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10 03:42:01 -05:00
Al Viro ae50adcb0a /proc/self is never going to be invalidated...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10 03:41:53 -05:00
Linus Torvalds 3979491701 Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.38' of git://linux-nfs.org/~bfields/linux:
  nfsd: wrong index used in inner loop
  nfsd4: fix bad pointer on failure to find delegation
  NFSD: fix decode_cb_sequence4resok
2011-03-09 14:52:09 -08:00
Linus Torvalds 78833dd706 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  nd->inode is not set on the second attempt in path_walk()
  unfuck proc_sysctl ->d_compare()
  minimal fix for do_filp_open() race
2011-03-09 13:55:51 -08:00
Al Viro b306419ae0 nd->inode is not set on the second attempt in path_walk()
We leave it at whatever it had been pointing to after the
first link_path_walk() had failed with -ESTALE.  Things
do not work well after that...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-08 21:16:28 -05:00
roel 3ec07aa952 nfsd: wrong index used in inner loop
Index i was already used in the outer loop

Cc: stable@kernel.org
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-03-08 19:46:10 -05:00
Al Viro dfef6dcd35 unfuck proc_sysctl ->d_compare()
a) struct inode is not going to be freed under ->d_compare();
however, the thing PROC_I(inode)->sysctl points to just might.
Fortunately, it's enough to make freeing that sucker delayed,
provided that we don't step on its ->unregistering, clear
the pointer to it in PROC_I(inode) before dropping the reference
and check if it's NULL in ->d_compare().

b) I'm not sure that we *can* walk into NULL inode here (we recheck
dentry->seq between verifying that it's still hashed / fetching
dentry->d_inode and passing it to ->d_compare() and there's no
negative hashed dentries in /proc/sys/*), but if we can walk into
that, we really should not have ->d_compare() return 0 on it!
Said that, I really suspect that this check can be simply killed.
Nick?

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-08 02:22:27 -05:00
J. Bruce Fields 32b007b4e1 nfsd4: fix bad pointer on failure to find delegation
In case of a nonempty list, the return on error here is obviously bogus;
it ends up being a pointer to the list head instead of to any valid
delegation on the list.

In particular, if nfsd4_delegreturn() hits this case, and you're quite unlucky,
then renew_client may oops, and it may take an embarassingly long time to
figure out why.  Facepalm.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
IP: [<ffffffff81292965>] nfsd4_delegreturn+0x125/0x200
...

Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-03-07 11:44:53 -05:00
Linus Torvalds fb62c00a6d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: no .snap inside of snapped namespace
  libceph: fix msgr standby handling
  libceph: fix msgr keepalive flag
  libceph: fix msgr backoff
  libceph: retry after authorization failure
  libceph: fix handling of short returns from get_user_pages
  ceph: do not clear I_COMPLETE from d_release
  ceph: do not set I_COMPLETE
  Revert "ceph: keep reference to parent inode on ceph_dentry"
2011-03-05 10:43:22 -08:00
Neil Horman e9e3d724e2 nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v3)
The "bad_page()" page allocator sanity check was reported recently (call
chain as follows):

  bad_page+0x69/0x91
  free_hot_cold_page+0x81/0x144
  skb_release_data+0x5f/0x98
  __kfree_skb+0x11/0x1a
  tcp_ack+0x6a3/0x1868
  tcp_rcv_established+0x7a6/0x8b9
  tcp_v4_do_rcv+0x2a/0x2fa
  tcp_v4_rcv+0x9a2/0x9f6
  do_timer+0x2df/0x52c
  ip_local_deliver+0x19d/0x263
  ip_rcv+0x539/0x57c
  netif_receive_skb+0x470/0x49f
  :virtio_net:virtnet_poll+0x46b/0x5c5
  net_rx_action+0xac/0x1b3
  __do_softirq+0x89/0x133
  call_softirq+0x1c/0x28
  do_softirq+0x2c/0x7d
  do_IRQ+0xec/0xf5
  default_idle+0x0/0x50
  ret_from_intr+0x0/0xa
  default_idle+0x29/0x50
  cpu_idle+0x95/0xb8
  start_kernel+0x220/0x225
  _sinittext+0x22f/0x236

It occurs because an skb with a fraglist was freed from the tcp
retransmit queue when it was acked, but a page on that fraglist had
PG_Slab set (indicating it was allocated from the Slab allocator (which
means the free path above can't safely free it via put_page.

We tracked this back to an nfsv4 setacl operation, in which the nfs code
attempted to fill convert the passed in buffer to an array of pages in
__nfs4_proc_set_acl, which gets used by the skb->frags list in
xs_sendpages.  __nfs4_proc_set_acl just converts each page in the buffer
to a page struct via virt_to_page, but the vfs allocates the buffer via
kmalloc, meaning the PG_slab bit is set.  We can't create a buffer with
kmalloc and free it later in the tcp ack path with put_page, so we need
to either:

1) ensure that when we create the list of pages, no page struct has
   PG_Slab set

 or

2) not use a page list to send this data

Given that these buffers can be multiple pages and arbitrarily sized, I
think (1) is the right way to go.  I've written the below patch to
allocate a page from the buddy allocator directly and copy the data over
to it.  This ensures that we have a put_page free-able page for every
entry that winds up on an skb frag list, so it can be safely freed when
the frame is acked.  We do a put page on each entry after the
rpc_call_sync call so as to drop our own reference count to the page,
leaving only the ref count taken by tcp_sendpages.  This way the data
will be properly freed when the ack comes in

Successfully tested by myself to solve the above oops.

Note, as this is the result of a setacl operation that exceeded a page
of data, I think this amounts to a local DOS triggerable by an
uprivlidged user, so I'm CCing security on this as well.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
CC: security@kernel.org
CC: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-04 17:28:52 -08:00
Sage Weil 455cec0abf ceph: no .snap inside of snapped namespace
Otherwise you can do things like

# mkdir .snap/foo
# cd .snap/foo/.snap
# ls
<badness>

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-04 12:25:09 -08:00
Al Viro 1858efd471 minimal fix for do_filp_open() race
failure exits on the no-O_CREAT side of do_filp_open() merge with
those of O_CREAT one; unfortunately, if do_path_lookup() returns
-ESTALE, we'll get out_filp:, notice that we are about to return
-ESTALE without having trying to create the sucker with LOOKUP_REVAL
and jump right into the O_CREAT side of code.  And proceed to try
and create a file.  Usually that'll fail with -ESTALE again, but
we can race and get that attempt of pathname resolution to succeed.

open() without O_CREAT really shouldn't end up creating files, races
or not.  The real fix is to rearchitect the whole do_filp_open(),
but for now splitting the failure exits will do.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-04 13:14:21 -05:00
Linus Torvalds 8336026942 Merge branch 'i_nlink' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'i_nlink' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  hfs: fix rename() over non-empty directory
  udf: fix i_nlink limit
  fix reiserfs mkdir() breakage
  exofs: i_nlink races in rename()
  nilfs2: i_nlink races in rename()
  minix: i_nlink races in rename()
  ufs: i_nlink races in rename()
  sysv: i_nlink races in rename()
2011-03-03 15:37:59 -08:00
Linus Torvalds 4c7fd114c6 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: zero proper structure size for geometry calls
2011-03-03 12:44:22 -08:00
Linus Torvalds c640e13f8e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: fix regression that i-flag is not set on changeless checkpoints
2011-03-03 12:42:48 -08:00
Sage Weil 16a8b70a5a ceph: do not clear I_COMPLETE from d_release
First, this was racy anyway: d_release isn't called until well after the
dentry is unhashed.  Second, this runs afoul of the recent dcache change
that clears d_parent prior to calling d_release (949854d0), causing a NULL
pointer dereference.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-03 10:09:52 -08:00
Sage Weil b545cc1505 ceph: do not set I_COMPLETE
Do not set the I_COMPLETE flag on directories until we resolve races with
dcache pruning.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-03 10:09:51 -08:00
Sage Weil 9bde178d05 Revert "ceph: keep reference to parent inode on ceph_dentry"
This reverts commit 97d79b403e.

This fails to account for d_parent changes due to rename or disconnected
dentries due to submounts or NFS reexports.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-03 10:09:50 -08:00
Al Viro 69102e9b4b hfs: fix rename() over non-empty directory
merge hfs_unlink() and hfs_rmdir(), while we are at it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-03 01:28:40 -05:00
Al Viro 810c1b2e48 udf: fix i_nlink limit
(256 << sizeof(x)) - 1 is not the maximal possible value of x...
In reality, the maximal allowed value for UDF FileLinkCount is
65535.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-03 01:28:40 -05:00
Al Viro 99890a3be1 fix reiserfs mkdir() breakage
if directory has so many subdirectories that its link count is set
to 1 (i.e. "can't tell accurately") and reiserfs_new_inode() fails,
we shouldn't decrement the parent's link count in cleanup path;
that's what DEC_DIR_INODE_NLINK() is for.  As it is, we end up
with parent suddenly getting zero i_nlink, with very unpleasant
effects.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-03 01:28:40 -05:00
Al Viro babfe56046 exofs: i_nlink races in rename()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-03 01:28:17 -05:00
Al Viro 30eb43d314 nilfs2: i_nlink races in rename()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-03 01:28:17 -05:00
Al Viro 6f88049caf minix: i_nlink races in rename()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-03 01:28:16 -05:00
Al Viro 37750cdda3 ufs: i_nlink races in rename()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-03 01:28:16 -05:00
Al Viro 4787d45fa7 sysv: i_nlink races in rename()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-03 01:28:16 -05:00
Linus Torvalds f7d222ea2a Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6
* 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
  of/promtree: allow DT device matching by fixing 'name' brokenness (v5)
  x86: OLPC: have prom_early_alloc BUG rather than return NULL
  of/flattree: Drop an uninteresting message to pr_debug level
  of: Add missing of_address.h to xilinx ehci driver
2011-03-02 20:01:57 -08:00
Paul Bolle 8aaccf7fa2 of/flattree: Drop an uninteresting message to pr_debug level
This message looks like an error (which it isn't) when booting with a
flattened device tree.  Remove the message from normal kernel builds.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-03-02 13:45:18 -07:00
Josh Hunt e8a80c6f76 ext2: Fix link count corruption under heavy link+rename load
vfs_rename_other() does not lock renamed inode with i_mutex. Thus changing
i_nlink in a non-atomic manner (which happens in ext2_rename()) can corrupt
it as reported and analyzed by Josh.

In fact, there is no good reason to mess with i_nlink of the moved file.
We did it presumably to simulate linking into the new directory and unlinking
from an old one. But the practical effect of this is disputable because fsck
can possibly treat file as being properly linked into both directories without
writing any error which is confusing. So we just stop increment-decrement
games with i_nlink which also fixes the corruption.

CC: stable@kernel.org
CC: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-03-02 11:03:52 +01:00
Alex Elder af24ee9ea8 xfs: zero proper structure size for geometry calls
Commit 493f3358cb added this call to
xfs_fs_geometry() in order to avoid passing kernel stack data back
to user space:

+       memset(geo, 0, sizeof(*geo));

Unfortunately, one of the callers of that function passes the
address of a smaller data type, cast to fit the type that
xfs_fs_geometry() requires.  As a result, this can happen:

Kernel panic - not syncing: stack-protector: Kernel stack is corrupted
in: f87aca93

Pid: 262, comm: xfs_fsr Not tainted 2.6.38-rc6-493f3358cb2+ #1
Call Trace:

[<c12991ac>] ? panic+0x50/0x150
[<c102ed71>] ? __stack_chk_fail+0x10/0x18
[<f87aca93>] ? xfs_ioc_fsgeometry_v1+0x56/0x5d [xfs]

Fix this by fixing that one caller to pass the right type and then
copy out the subset it is interested in.

Note: This patch is an alternative to one originally proposed by
Eric Sandeen.

Reported-by: Jeffrey Hundstad <jeffrey.hundstad@mnsu.edu>
Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Jeffrey Hundstad <jeffrey.hundstad@mnsu.edu>
2011-03-01 21:21:13 -06:00
Ryusuke Konishi 72746ac643 nilfs2: fix regression that i-flag is not set on changeless checkpoints
According to the report from Jiro SEKIBA titled "regression in
2.6.37?"  (Message-Id: <8739n8vs1f.wl%jir@sekiba.com>), on 2.6.37 and
later kernels, lscp command no longer displays "i" flag on checkpoints
that snapshot operations or garbage collection created.

This is a regression of nilfs2 checkpointing function, and it's
critical since it broke behavior of a part of nilfs2 applications.
For instance, snapshot manager of TimeBrowse gets to create
meaningless snapshots continuously; snapshot creation triggers another
checkpoint, but applications cannot distinguish whether the new
checkpoint contains meaningful changes or not without the i-flag.

This patch fixes the regression and brings that application behavior
back to normal.

Reported-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Jiro SEKIBA <jir@unicus.jp>
Cc: stable <stable@kernel.org>  [2.6.37]
2011-03-02 09:55:18 +09:00
Randy Dunlap e6eb5ce1b2 fs/block_dev.c: fix new kernel-doc warning
Fix new kernel-doc warning in fs/block_dev.c:

Warning(fs/block_dev.c:937): No description found for parameter 'kill_dirty'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-28 18:08:31 -08:00
Linus Torvalds 58da94f013 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: fix truncate after open
  fuse: fix hang of single threaded fuseblk filesystem
2011-02-28 17:53:04 -08:00
Linus Torvalds 158a5d61f7 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  ocfs2: Check heartbeat mode for kernel stacks only
  Ocfs2/refcounttree: Fix a bug for refcounttree to writeback clusters in a right number.
  ocfs2: Fix estimate of necessary credits for mkdir
2011-02-28 17:52:47 -08:00
Jan Kara 7137c6bd45 aio: fix race between io_destroy() and io_submit()
A race can occur when io_submit() races with io_destroy():

 CPU1						CPU2
io_submit()
  do_io_submit()
    ...
    ctx = lookup_ioctx(ctx_id);
						io_destroy()
    Now do_io_submit() holds the last reference to ctx.
    ...
    queue new AIO
    put_ioctx(ctx) - frees ctx with active AIOs

We solve this issue by checking whether ctx is being destroyed in AIO
submission path after adding new AIO to ctx.  Then we are guaranteed that
either io_destroy() waits for new AIO or we see that ctx is being
destroyed and bail out.

Cc: Nick Piggin <npiggin@kernel.dk>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25 15:07:37 -08:00
Nick Piggin 3bd9a5d734 aio: fix rcu ioctx lookup
aio-dio-invalidate-failure GPFs in aio_put_req from io_submit.

lookup_ioctx doesn't implement the rcu lookup pattern properly.
rcu_read_lock does not prevent refcount going to zero, so we might take
a refcount on a zero count ioctx.

Fix the bug by atomically testing for zero refcount before incrementing.

[jack@suse.cz: added comment into the code]
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25 15:07:37 -08:00
Timo Warns 294f6cf486 ldm: corrupted partition table can cause kernel oops
The kernel automatically evaluates partition tables of storage devices.
The code for evaluating LDM partitions (in fs/partitions/ldm.c) contains
a bug that causes a kernel oops on certain corrupted LDM partitions.  A
kernel subsystem seems to crash, because, after the oops, the kernel no
longer recognizes newly connected storage devices.

The patch changes ldm_parse_vmdb() to Validate the value of vblk_size.

Signed-off-by: Timo Warns <warns@pre-sense.de>
Cc: Eugene Teo <eugeneteo@kernel.sg>
Acked-by: Richard Russon <ldm@flatcap.org>
Cc: Harvey Harrison <harvey.harrison@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25 15:07:36 -08:00
Davide Libenzi 22bacca48a epoll: prevent creating circular epoll structures
In several places, an epoll fd can call another file's ->f_op->poll()
method with ep->mtx held.  This is in general unsafe, because that other
file could itself be an epoll fd that contains the original epoll fd.

The code defends against this possibility in its own ->poll() method using
ep_call_nested, but there are several other unsafe calls to ->poll
elsewhere that can be made to deadlock.  For example, the following simple
program causes the call in ep_insert recursively call the original fd's
->poll, leading to deadlock:

 #include <unistd.h>
 #include <sys/epoll.h>

 int main(void) {
     int e1, e2, p[2];
     struct epoll_event evt = {
         .events = EPOLLIN
     };

     e1 = epoll_create(1);
     e2 = epoll_create(2);
     pipe(p);

     epoll_ctl(e2, EPOLL_CTL_ADD, e1, &evt);
     epoll_ctl(e1, EPOLL_CTL_ADD, p[0], &evt);
     write(p[1], p, sizeof p);
     epoll_ctl(e1, EPOLL_CTL_ADD, e2, &evt);

     return 0;
 }

On insertion, check whether the inserted file is itself a struct epoll,
and if so, do a recursive walk to detect whether inserting this file would
create a loop of epoll structures, which could lead to deadlock.

[nelhage@ksplice.com: Use epmutex to serialize concurrent inserts]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Nelson Elhage <nelhage@ksplice.com>
Reported-by: Nelson Elhage <nelhage@ksplice.com>
Tested-by: Nelson Elhage <nelhage@ksplice.com>
Cc: <stable@kernel.org>		[2.6.34+, possibly earlier]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25 15:07:36 -08:00
Linus Torvalds 4660ba63f1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: fix fiemap bugs with delalloc
  Btrfs: set FMODE_EXCL in btrfs_device->mode
  Btrfs: make btrfs_rm_device() fail gracefully
  Btrfs: Avoid accessing unmapped kernel address
  Btrfs: Fix BTRFS_IOC_SUBVOL_SETFLAGS ioctl
  Btrfs: allow balance to explicitly allocate chunks as it relocates
  Btrfs: put ENOSPC debugging under a mount option
2011-02-25 14:03:39 -08:00
Linus Torvalds 638691a7a4 Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
  md: Fix - again - partition detection when array becomes active
  Fix over-zealous flush_disk when changing device size.
  md: avoid spinlock problem in blk_throtl_exit
  md: correctly handle probe of an 'mdp' device.
  md: don't set_capacity before array is active.
  md: Fix raid1->raid0 takeover
2011-02-25 11:13:26 -08:00
Anton Blanchard f129ccc923 afs: Fix oops in afs_unlink_writeback
I'm seeing the following oops when testing afs:

  Unable to handle kernel paging request for data at address 0x00000008
  ...
  NIP [c0000000003393b0] .afs_unlink_writeback+0x38/0xc0
  LR [c00000000033987c] .afs_put_writeback+0x98/0xec
  Call Trace:
  [c00000000345f600] [c00000000033987c] .afs_put_writeback+0x98/0xec
  [c00000000345f690] [c00000000033ae80] .afs_write_begin+0x6a4/0x75c
  [c00000000345f790] [c00000000012b77c] .generic_file_buffered_write+0x148/0x320
  [c00000000345f8d0] [c00000000012e1b8] .__generic_file_aio_write+0x37c/0x3e4
  [c00000000345f9d0] [c00000000012e2a8] .generic_file_aio_write+0x88/0xfc
  [c00000000345fa90] [c0000000003390a8] .afs_file_write+0x10c/0x178
  [c00000000345fb40] [c000000000188788] .do_sync_write+0xc4/0x128
  [c00000000345fcc0] [c000000000189658] .vfs_write+0xe8/0x1d8
  [c00000000345fd70] [c000000000189884] .SyS_write+0x68/0xb0
  [c00000000345fe30] [c000000000008564] syscall_exit+0x0/0x40

afs_write_begin hits an error and calls afs_unlink_writeback. In there
we do list_del_init on an uninitialised list.

The patch below initialises ->link when creating the afs_writeback struct.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25 11:12:37 -08:00
Miklos Szeredi 8d56addd70 fuse: fix truncate after open
Commit e1181ee6 "vfs: pass struct file to do_truncate on O_TRUNC
opens" broke the behavior of open(O_TRUNC|O_RDONLY) in fuse.  Fuse
assumed that when called from open, a truncate() will be done, not an
ftruncate().

Fix by restoring the old behavior, based on the ATTR_OPEN flag.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2011-02-25 14:44:58 +01:00