linux_old1

Commit Graph

Author	SHA1	Message	Date
Chris Mason	ec29ed5b40	Btrfs: fix fiemap bugs with delalloc The Btrfs fiemap code wasn't properly returning delalloc extents, so applications that trust fiemap to decide if there are holes in the file see holes instead of delalloc. This reworks the btrfs fiemap code, adding a get_extent helper that searches for delalloc ranges and also adding a helper for extent_fiemap that skips past holes in the file. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-23 16:23:20 -05:00
Dirk Behme	6f644e5f97	UDF: Fix compiler warning Fix compiler warning fs/udf/balloc.c: In function 'udf_bitmap_new_block': fs/udf/balloc.c:273: warning: passing argument 1 of '_find_next_bit_le' from incompatible pointer type fs/udf/balloc.c:285: warning: passing argument 1 of '_find_next_bit_le' from incompatible pointer type fs/udf/balloc.c:311: warning: passing argument 1 of '_find_next_bit_le' from incompatible pointer type fs/udf/balloc.c:325: warning: passing argument 1 of '_find_next_bit_le' from incompatible pointer type The main fix is to add a cast in ext2_find_next_bit(). As all other usage locations of udf_find_next_one_bit() directly use bh->b_data (which is a char ), the useless (char ) cast in line 311 can be removed, too. Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com> Signed-off-by: George G. Davis <gdavis@mvista.com> Signed-off-by: Jan Kara <jack@suse.cz>	2011-02-23 11:00:44 +01:00
Jan Kara	7e49b6f248	udf: Convert UDF to new truncate calling sequence Use new truncation sequence in UDF and fix up error handling in the code. Signed-off-by: Jan Kara <jack@suse.cz>	2011-02-23 11:00:37 +01:00
Benny Halevy	2c9c8f36c3	NFSD: fix decode_cb_sequence4resok Fix bug introduced in patch `85a56480` NFSD: Update XDR decoders in NFSv4 callback client Although decode_cb_sequence4resok ignores highest slotid and target highest slotid it must account for their space in their xdr stream when calling xdr_inline_decode Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-02-22 15:55:09 -08:00
Lukas Czerner	be715140b5	xfs: check if device support discard in xfs_ioc_trim() Right now we, are relying on the fact that when we attempt to actually do the discard, blkdev_issue_discar() returns -EOPNOTSUPP and the user is informed that the device does not support discard. However, in the case where the we do not hit any suitable free extent to trim in FITRIM code, it will finish without any error. This is very confusing, because it seems that FITRIM was successful even though the device does not actually supports discard. Solution: Check for the discard support before attempt to search for free extents. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-02-22 15:08:44 -06:00
Dan Rosenberg	3a3675b7f2	xfs: prevent leaking uninitialized stack memory in FSGEOMETRY_V1 The FSGEOMETRY_V1 ioctl (and its compat equivalent) calls out to xfs_fs_geometry() with a version number of 3. This code path does not fill in the logsunit member of the passed xfs_fsop_geom_t, leading to the leaking of four bytes of uninitialized stack data to potentially unprivileged callers. v2 switches to memset() to avoid future issues if structure members change, on suggestion of Dave Chinner. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Reviewed-by: Eugene Teo <eugeneteo@kernel.org> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-02-22 15:06:47 -06:00
Linus Torvalds	3b71710f08	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6: eCryptfs: Copy up lower inode attrs in getattr ecryptfs: read on a directory should return EISDIR if not supported eCryptfs: Handle NULL nameidata pointers eCryptfs: Revert "dont call lookup_one_len to avoid NULL nameidata"	2011-02-21 17:25:00 -08:00
Randy Dunlap	361821854b	Docbook: add fs/eventfd.c and fix typos in it Add fs/eventfd.c to filesystems docbook. Make typo corrections in fs/eventfd.c. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-21 15:07:04 -08:00
Linus Torvalds	8bd89ca220	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: keep reference to parent inode on ceph_dentry ceph: queue cap_snaps once per realm libceph: fix socket write error handling libceph: fix socket read error handling	2011-02-21 15:01:38 -08:00
Linus Torvalds	b4f5c46245	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] update cifs version cifs: Fix regression in LANMAN (LM) auth code cifs: fix handling of scopeid in cifs_convert_address	2011-02-21 14:57:39 -08:00
Steve French	eed9e8307e	[CIFS] update cifs version Update version to 1.71 so we can more easily spot modules with the last two fixes Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-21 22:31:47 +00:00
Shirish Pargaonkar	5e640927a5	cifs: Fix regression in LANMAN (LM) auth code LANMAN response length was changed to 16 bytes instead of 24 bytes. Revert it back to 24 bytes. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> CC: stable@kernel.org Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-21 21:53:30 +00:00
Tyler Hicks	55f9cf6bba	eCryptfs: Copy up lower inode attrs in getattr The lower filesystem may do some type of inode revalidation during a getattr call. eCryptfs should take advantage of that by copying the lower inode attributes to the eCryptfs inode after a call to vfs_getattr() on the lower inode. I originally wrote this fix while working on eCryptfs on nfsv3 support, but discovered it also fixed an eCryptfs on ext4 nanosecond timestamp bug that was reported. https://bugs.launchpad.net/bugs/613873 Cc: <stable@kernel.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-02-21 14:46:36 -06:00
Andy Whitcroft	323ef68faf	ecryptfs: read on a directory should return EISDIR if not supported read() calls against a file descriptor connected to a directory are incorrectly returning EINVAL rather than EISDIR: [EISDIR] [XSI] [Option Start] The fildes argument refers to a directory and the implementation does not allow the directory to be read using read() or pread(). The readdir() function should be used instead. [Option End] This occurs because we do not have a .read operation defined for ecryptfs directories. Connect this up to generic_read_dir(). BugLink: http://bugs.launchpad.net/bugs/719691 Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-02-21 14:46:36 -06:00
Tyler Hicks	70b8902199	eCryptfs: Handle NULL nameidata pointers Allow for NULL nameidata pointers in eCryptfs create, lookup, and d_revalidate functions. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-02-21 14:45:57 -06:00
Tejun Heo	43d133c18b	Merge branch 'master' into for-2.6.39	2011-02-21 09:43:56 +01:00
Mark Fasheh	52c303c56c	ocfs2: Check heartbeat mode for kernel stacks only Commit `2c442719e9` added some checks for proper heartbeat mode when the o2cb stack is running. Unfortunately, it didn't take into account that a userpsace stack could be running. Fix this by only doing the check if o2cb is in use. This patch allows userspace stacks to mount the fs again. Cc: stable@kernel.org Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-02-20 02:36:28 -08:00
Tristan Ye	acf3bb007e	Ocfs2/refcounttree: Fix a bug for refcounttree to writeback clusters in a right number. Current refcounttree codes actually didn't writeback the new pages out in write-back mode, due to a bug of always passing a ZERO number of clusters to 'ocfs2_cow_sync_writeback', the patch tries to pass a proper one in. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Cc: stable@kernel.org Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-02-20 02:36:12 -08:00
Jan Kara	705773a665	ocfs2: Fix estimate of necessary credits for mkdir In the rare case that INLINE_DATA, INDEX_DIR, QUOTA, XATTR features are disabled and both the allocation of the directory inode and the allocation of the first directory block need to relink allocation group, there need not be enough credits reserved in a transaction. Fix the estimate. CC: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-02-20 02:33:32 -08:00
Yehuda Sadeh	97d79b403e	ceph: keep reference to parent inode on ceph_dentry When creating a new dentry we now hold a reference to the parent inode in the ceph_dentry. This is required due to the new RCU changes from `949854d0`, which set dentry->d_parent to NULL in d_kill before calling the ->release() callback. If/when that behavior is changed, we can revert this hack. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-02-19 19:59:14 -08:00
Linus Torvalds	bc3adfc670	Merge branch 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq * 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: make sure MAYDAY_INITIAL_TIMEOUT is at least 2 jiffies long workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable' workqueue: wake up a worker when a rescuer is leaving a gcwq	2011-02-18 12:36:06 -08:00
Jan Kara	25d41d8455	debugfs: Fix filesystem reference counting on debugfs_remove() failure When __debugfs_remove() fails (because simple_rmdir() fails e.g. when a directory is not empty), we must not decrement use count of the filesystem as nothing was in fact deleted. This fixes use after free caused by debugfs in some cases. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-18 08:07:18 -08:00
Tyler Hicks	8787c7a3e0	eCryptfs: Revert "dont call lookup_one_len to avoid NULL nameidata" This reverts commit `21edad3220` and commit `93c3fe40c2`, which fixed a regression by the former. Al Viro pointed out bypassed dcache lookups in ecryptfs_new_lower_dentry(), misuse of vfs_path_lookup() in ecryptfs_lookup_one_lower() and a dislike of passing nameidata to the lower filesystem. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-02-17 20:30:29 -06:00
Timo Warns	fa7ea87a05	fs/partitions: Validate map_count in Mac partition tables Validate number of blocks in map and remove redundant variable. Signed-off-by: Timo Warns <warns@pre-sense.de> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-17 17:50:51 -08:00
Daniel Baluta	bf6a41db77	fs/eventpoll.c: fix spelling eventpoll.c has wonderful comments but some annoying typos sneaked in: * toepoll_ctl -> to epoll_ctl * rapresent -> represents * sructure -> structure * machanism -> mechanism * trasfering -> transferring Signed-off-by: Daniel Baluta <daniel.baluta@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-02-17 17:38:09 +01:00
Namhyung Kim	b0a4bb830e	fs: update comments to point correct document dcache-locking.txt is not exist any more, and the path was not correct anyway. Fix it. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-02-17 16:41:32 +01:00
Linus Torvalds	ee71508702	Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux * 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: nfsd: correctly handle return value from nfsd_map_name_to_*	2011-02-16 21:53:41 -08:00
Jeff Layton	9616125611	cifs: fix handling of scopeid in cifs_convert_address The code finds, the '%' sign in an ipv6 address and copies that to a buffer allocated on the stack. It then ignores that buffer, and passes 'pct' to simple_strtoul(), which doesn't work right because we're comparing 'endp' against a completely different string. Fix it by passing the correct pointer. While we're at it, this is a good candidate for conversion to strict_strtoul as well. Cc: stable@kernel.org Cc: David Howells <dhowells@redhat.com> Reported-by: BjÃ¶rn JACKE <bj@sernet.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-17 05:35:33 +00:00
Chuck Ebbert	e51900f7d3	block: revert block_dev read-only check This reverts commit `75f1dc0d07` ("block: check bdev_read_only() from blkdev_get()"). That commit added stricter checking to make sure devices that were being used read-only were actually opened in that mode. It turns out that the change breaks a bunch of kernel code that opens block devices. Affected systems include dm, md, and the loop device. Because strict checking for read-only opens of block devices was not done before this, the code that opens the devices was opening them read-write even if they were being used read-only. Auditing all that code will take time, and new userspace packages for dm, mdadm, etc. will also be required. Signed-off-by: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-16 16:48:13 -08:00
NeilBrown	47c85291d3	nfsd: correctly handle return value from nfsd_map_name_to_* These functions return an nfs status, not a host_err. So don't try to convert before returning. This is a regression introduced by 3c726023402a2f3b28f49b9d90ebf9e71151157d; I fixed up two of the callers, but missed these two. Cc: stable@kernel.org Reported-by: Herbert Poetzl <herbert@13thfloor.at> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-02-16 18:31:05 -05:00
Ilya Dryomov	fb01aa85b8	Btrfs: set FMODE_EXCL in btrfs_device->mode This fixes a bug introduced in `d4d77629`, where the device added online (and therefore initialized via btrfs_init_new_device()) would be left with the positive bdev->bd_holders after unmount. Since `d4d77629` we no longer OR FMODE_EXCL explicitly on blkdev_put(), set it in btrfs_device->mode. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-16 16:34:00 -05:00
Ilya Dryomov	9b3517e913	Btrfs: make btrfs_rm_device() fail gracefully If shrinking done as part of the online device removal fails add that device back to the allocation list and increment the rw_devices counter. This fixes two bugs: 1) we could have a perfectly good device out of alloc list for no good reason; 2) in the btrfs consisting of two devices, failure in btrfs_rm_device() could lead to a situation where it was impossible to remove any of the devices because of the "unable to remove the only writeable device" error. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-16 15:37:59 -05:00
Li Zefan	ca9b688c1c	Btrfs: Avoid accessing unmapped kernel address When decompressing a chunk of data, we'll copy the data out to a working buffer if the data is stored in more than one page, otherwise we'll use the mapped page directly to avoid memory copy. In the latter case, we'll end up accessing the kernel address after we've unmapped the page in a corner case. Reported-by: Juan Francisco Cantero Hurtado <iam@juanfra.info> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-16 15:37:58 -05:00
Li Zefan	b4dc2b8c69	Btrfs: Fix BTRFS_IOC_SUBVOL_SETFLAGS ioctl - Check user-specified flags correctly - Check the inode owership - Search root item in root tree but not fs tree Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-16 15:37:58 -05:00
Chris Mason	c87f08ca44	Btrfs: allow balance to explicitly allocate chunks as it relocates Btrfs device shrinking and balancing ends up reallocating all the blocks in order to allow COW to move them to new destinations. It is somewhat awkward in terms of ENOSPC because most of the enospc code is built around the idea that some operation on a reference counted tree triggers allocations in the non-reference counted trees. This commit changes the balancing code to deal with enospc by trying to allocate a new chunk. If that allocation succeeds, we go ahead and retry whatever failed due to enospc. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-16 15:28:47 -05:00
Chris Mason	91435650c2	Btrfs: put ENOSPC debugging under a mount option ENOSPC in btrfs is getting to the point where the extra debugging isn't required. I've put it under mount -o enospc_debug just in case someone is having difficult problems. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-16 15:28:36 -05:00
Linus Torvalds	3abb17e82f	vfs: fix BUG_ON() in fs/namei.c:1461 When Al moved the nameidata_dentry_drop_rcu_maybe() call into the do_follow_link function in commit `844a391799` ("nothing in do_follow_link() is going to see RCU"), he mistakenly left the BUG_ON(inode != path->dentry->d_inode); behind. Which would otherwise be ok, but that BUG_ON() really needs to be _after_ dropping RCU, since the dentry isn't necessarily stable otherwise. So complete the code movement in that commit, and move the BUG_ON() into do_follow_link() too. This means that we need to pass in 'inode' as an argument (just for this one use), but that's a small thing. And eventually we may be confident enough in our path lookup that we can just remove the BUG_ON() and the unnecessary inode argument. Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-16 08:56:55 -08:00
Tejun Heo	58a69cb47e	workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable' There are two spellings in use for 'freeze' + 'able' - 'freezable' and 'freezeable'. The former is the more prominent one. The latter is mostly used by workqueue and in a few other odd places. Unify the spelling to 'freezable'. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Dmitry Torokhov <dtor@mail.ru> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Alex Dubov <oakad@yahoo.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Steven Whitehouse <swhiteho@redhat.com>	2011-02-16 17:48:59 +01:00
Linus Torvalds	f60c153d50	Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux * 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: nfsd: break lease on unlink due to rename nfsd4: acquire only one lease per file nfsd4: modify fi_delegations under recall_lock nfsd4: remove unused deleg dprintk's. nfsd4: split lease setting into separate function nfsd4: fix leak on allocation error nfsd4: add helper function for lease setup nfsd4: split up nfsd_break_deleg_cb NFSD: memory corruption due to writing beyond the stat array NFSD: use nfserr for status after decode_cb_op_status nfsd: don't leak dentry count on mnt_want_write failure	2011-02-15 12:06:38 -08:00
Linus Torvalds	055d219441	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: get rid of nameidata_dentry_drop_rcu() calling nameidata_drop_rcu() drop out of RCU in return_reval split do_revalidate() into RCU and non-RCU cases in do_lookup() split RCU and non-RCU cases of need_revalidate nothing in do_follow_link() is going to see RCU	2011-02-15 08:06:36 -08:00
Linus Torvalds	007a14af26	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: check return value of alloc_extent_map() Btrfs - Fix memory leak in btrfs_init_new_device() btrfs: prevent heap corruption in btrfs_ioctl_space_info() Btrfs: Fix balance panic Btrfs: don't release pages when we can't clear the uptodate bits Btrfs: fix page->private races	2011-02-15 08:00:35 -08:00
Martin Schwidefsky	261cd298a8	s390: remove task_show_regs task_show_regs used to be a debugging aid in the early bringup days of Linux on s390. /proc/<pid>/status is a world readable file, it is not a good idea to show the registers of a process. The only correct fix is to remove task_show_regs. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-15 07:34:16 -08:00
Paul Bolle	fd018fe823	ext4: fix comment typo uninitized Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Reviewed-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-02-15 10:28:59 +01:00
Paul Bolle	8272f4c9c5	fuse/cuse: fix comment typo initilaization Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Reviewed-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-02-15 10:26:38 +01:00
Jiri Kosina	0a9d59a246	Merge branch 'master' into for-next	2011-02-15 10:24:31 +01:00
Al Viro	4e924a4f53	get rid of nameidata_dentry_drop_rcu() calling nameidata_drop_rcu() can't happen anymore and didn't work right anyway Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-02-15 02:26:54 -05:00
Al Viro	f60aef7ec6	drop out of RCU in return_reval ... thus killing the need to handle drop-from-RCU in d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-02-15 02:26:54 -05:00
Al Viro	f5e1c1c1af	split do_revalidate() into RCU and non-RCU cases fixing oopsen in lookup_one_len() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-02-15 02:26:54 -05:00
Al Viro	24643087e7	in do_lookup() split RCU and non-RCU cases of need_revalidate and use unlikely() instead of gotos, for fsck sake... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-02-15 02:26:54 -05:00
Al Viro	844a391799	nothing in do_follow_link() is going to see RCU Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-02-15 02:26:53 -05:00
Tsutomu Itoh	c26a920373	Btrfs: check return value of alloc_extent_map() I add the check on the return value of alloc_extent_map() to several places. In addition, alloc_extent_map() returns only the address or NULL. Therefore, check by IS_ERR() is unnecessary. So, I remove IS_ERR() checking. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-14 16:21:37 -05:00
Ilya Dryomov	67100f255d	Btrfs - Fix memory leak in btrfs_init_new_device() Memory allocated by calling kstrdup() should be freed. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-14 16:21:31 -05:00
Dan Rosenberg	51788b1bdd	btrfs: prevent heap corruption in btrfs_ioctl_space_info() Commit `bf5fc093c5` refactored btrfs_ioctl_space_info() and introduced several security issues. space_args.space_slots is an unsigned 64-bit type controlled by a possibly unprivileged caller. The comparison as a signed int type allows providing values that are treated as negative and cause the subsequent allocation size calculation to wrap, or be truncated to 0. By providing a size that's truncated to 0, kmalloc() will return ZERO_SIZE_PTR. It's also possible to provide a value smaller than the slot count. The subsequent loop ignores the allocation size when copying data in, resulting in a heap overflow or write to ZERO_SIZE_PTR. The fix changes the slot count type and comparison typecast to u64, which prevents truncation or signedness errors, and also ensures that we don't copy more data than we've allocated in the subsequent loop. Note that zero-size allocations are no longer possible since there is already an explicit check for space_args.space_slots being 0 and truncation of this value is no longer an issue. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Josef Bacik <josef@redhat.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-14 16:04:23 -05:00
Yan, Zheng	6848ad6461	Btrfs: Fix balance panic Mark the cloned backref_node as checked in clone_backref_node() Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-14 16:00:03 -05:00
Chris Mason	e3f24cc521	Btrfs: don't release pages when we can't clear the uptodate bits Btrfs tracks uptodate state in an rbtree as well as in the page bits. This is supposed to enable us to use block sizes other than the page size, but there are a few parts still missing before that completely works. But, our readpage routine trusts this additional range based tracking of uptodateness, much in the same way the buffer head up to date bits are trusted for the other filesystems. The problem is that sometimes we need to allocate memory in order to split records in the rbtree, even when we are just clearing bits. This can be difficult when our clearing function is called GFP_ATOMIC, which can happen in the releasepage path. So, what happens today looks like this: releasepage called with GFP_ATOMIC btrfs_releasepage calls clear_extent_bit clear_extent_bit fails to allocate ram, leaving the up to date bit set btrfs_releasepage returns success The end result is the page being gone, but btrfs thinking the range is up to date. Later on if someone tries to read that same page, the btrfs readpage code will return immediately thinking the page is already up to date. This commit fixes things to fail the releasepage when we can't clear the extent state bits. It covers both data pages and metadata tree blocks. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-14 13:04:01 -05:00
Chris Mason	eb14ab8ed2	Btrfs: fix page->private races There is a race where btrfs_releasepage can drop the page->private contents just as alloc_extent_buffer is setting up pages for metadata. Because of how the Btrfs page flags work, this results in us skipping the crc on the page during IO. This patch sovles the race by waiting until after the extent buffer is inserted into the radix tree before it sets page private. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-14 13:03:52 -05:00
J. Bruce Fields	83f6b0c182	nfsd: break lease on unlink due to rename `4795bb37ef` "nfsd: break lease on unlink, link, and rename", only broke the lease on the file that was being renamed, and didn't handle the case where the target path refers to an already-existing file that will be unlinked by a rename--in that case the target file should have any leases broken as well. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-02-14 10:35:19 -05:00
J. Bruce Fields	acfdf5c383	nfsd4: acquire only one lease per file Instead of acquiring one lease each time another client opens a file, nfsd can acquire just one lease to represent all of them, and reference count it to determine when to release it. This fixes a regression introduced by `c45821d263` "locks: eliminate fl_mylease callback": after that patch, only the struct file * is used to determine who owns a given lease. But since we recently converted the server to share a single struct file per open, if we acquire multiple leases on the same file from nfsd, it then becomes impossible on unlocking a lease to determine which of those leases (all of whom share the same struct file *) we meant to remove. Thanks to Takashi Iwai <tiwai@suse.de> for catching a bug in a previous version of this patch. Tested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-02-14 10:35:19 -05:00
J. Bruce Fields	5d926e8c2f	nfsd4: modify fi_delegations under recall_lock Modify fi_delegations only under the recall_lock, allowing us to use that list on lease breaks. Also some trivial cleanup to simplify later changes. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-02-14 10:35:19 -05:00
J. Bruce Fields	65bc58f518	nfsd4: remove unused deleg dprintk's. These aren't all that useful, and get in the way of the next steps. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-02-14 10:35:19 -05:00
J. Bruce Fields	edab9782b5	nfsd4: split lease setting into separate function Splitting some code into a separate function which we'll be adding some more to. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-02-14 10:35:18 -05:00
J. Bruce Fields	dd239cc05f	nfsd4: fix leak on allocation error Also share some common exit code. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-02-14 10:35:18 -05:00
J. Bruce Fields	22d38c4c10	nfsd4: add helper function for lease setup Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-02-14 10:35:18 -05:00
J. Bruce Fields	6b57d9c86d	nfsd4: split up nfsd_break_deleg_cb We'll be adding some more code here soon. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-02-14 10:35:18 -05:00
Konstantin Khorenko	3aa6e0aa8a	NFSD: memory corruption due to writing beyond the stat array If nfsd fails to find an exported via NFS file in the readahead cache, it should increment corresponding nfsdstats counter (ra_depth[10]), but due to a bug it may instead write to ra_depth[11], corrupting the following field. In a kernel with NFSDv4 compiled in the corruption takes the form of an increment of a counter of the number of NFSv4 operation 0's received; since there is no operation 0, this is harmless. In a kernel with NFSDv4 disabled it corrupts whatever happens to be in the memory beyond nfsdstats. Signed-off-by: Konstantin Khorenko <khorenko@openvz.org> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-02-14 10:35:18 -05:00
Benny Halevy	0af3f814cc	NFSD: use nfserr for status after decode_cb_op_status Bugs introduced in `85a5648019` "NFSD: Update XDR decoders in NFSv4 callback client" Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-02-14 10:35:18 -05:00
J. Bruce Fields	541ce98c10	nfsd: don't leak dentry count on mnt_want_write failure The exit cleanup isn't quite right here. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-02-14 10:31:08 -05:00
Linus Torvalds	c8e0b00ed1	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: call __jbd2_log_start_commit with j_state_lock write locked ext4: serialize unaligned asynchronous DIO ext4: make grpinfo slab cache names static ext4: Fix data corruption with multi-block writepages support ext4: fix up ext4 error handling ext4: unregister features interface on module unload ext4: fix panic on module unload when stopping lazyinit thread	2011-02-12 09:10:24 -08:00
Theodore Ts'o	e447183180	jbd2: call __jbd2_log_start_commit with j_state_lock write locked On an SMP ARM system running ext4, I've received a report that the first J_ASSERT in jbd2_journal_commit_transaction has been triggering: J_ASSERT(journal->j_running_transaction != NULL); While investigating possible causes for this problem, I noticed that __jbd2_log_start_commit() is getting called with j_state_lock only read-locked, in spite of the fact that it's possible for it might j_commit_request. Fix this by grabbing the necessary information so we can test to see if we need to start a new transaction before dropping the read lock, and then calling jbd2_log_start_commit() which will grab the write lock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-02-12 08:18:24 -05:00
Eric Sandeen	e9e3bcecf4	ext4: serialize unaligned asynchronous DIO ext4 has a data corruption case when doing non-block-aligned asynchronous direct IO into a sparse file, as demonstrated by xfstest 240. The root cause is that while ext4 preallocates space in the hole, mappings of that space still look "new" and dio_zero_block() will zero out the unwritten portions. When more than one AIO thread is going, they both find this "new" block and race to zero out their portion; this is uncoordinated and causes data corruption. Dave Chinner fixed this for xfs by simply serializing all unaligned asynchronous direct IO. I've done the same here. The difference is that we only wait on conversions, not all IO. This is a very big hammer, and I'm not very pleased with stuffing this into ext4_file_write(). But since ext4 is DIO_LOCKING, we need to serialize it at this high level. I tried to move this into ext4_ext_direct_IO, but by then we have the i_mutex already, and we will wait on the work queue to do conversions - which must also take the i_mutex. So that won't work. This was originally exposed by qemu-kvm installing to a raw disk image with a normal sector-63 alignment. I've tested a backport of this patch with qemu, and it does avoid the corruption. It is also quite a lot slower (14 min for package installs, vs. 8 min for well-aligned) but I'll take slow correctness over fast corruption any day. Mingming suggested that we can track outstanding conversions, and wait on those so that non-sparse files won't be affected, and I've implemented that here; unaligned AIO to nonsparse files won't take a perf hit. [tytso@mit.edu: Keep the mutex as a hashed array instead of bloating the ext4 inode] [tytso@mit.edu: Fix up namespace issues so that global variables are protected with an "ext4_" prefix.] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-02-12 08:17:34 -05:00
Eric Sandeen	2892c15ddd	ext4: make grpinfo slab cache names static In 2.6.37 I was running into oopses with repeated module loads & unloads. I tracked this down to: `fb1813f4` ext4: use dedicated slab caches for group_info structures (this was in addition to the features advert unload problem) The kstrdup & subsequent kfree of the cache name was causing a double free. In slub, at least, if I read it right it allocates & frees the name itself, slab seems to do something different... so in slub I think we were leaking -our- cachep->name, and double freeing the one allocated by slub. After getting lost in slab/slub/slob a bit, I just looked at other sized-caches that get allocated. jbd2, biovec, sgpool all do it more or less the way jbd2 does. Below patch follows the jbd2 method of dynamically allocating a cache at mount time from a list of static names. (This might also possibly fix a race creating the caches with parallel mounts running). [Folded in a fix from Dan Carpenter which fixed an off-by-one error in the original patch] Cc: stable@kernel.org Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-02-12 08:12:18 -05:00
Linus Torvalds	d40b0c3482	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: use single thread workqueues	2011-02-11 16:29:57 -08:00
Linus Torvalds	3aec46c1e0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: don't always drop malformed replies on the floor (try #3) cifs: clean up checks in cifs_echo_request [CIFS] Do not send SMBEcho requests on new sockets until SMBNegotiate	2011-02-11 16:29:50 -08:00
Boaz Harrosh	d863b50ab0	vfs: call rcu_barrier after ->kill_sb() In commit `fa0d7e3de6` ("fs: icache RCU free inodes"), we use rcu free inode instead of freeing the inode directly. It causes a crash when we rmmod immediately after we umount the volume[1]. So we need to call rcu_barrier after we kill_sb so that the inode is freed before we do rmmod. The idea is inspired by Aneesh Kumar. rcu_barrier will wait for all callbacks to end before preceding. The original patch was done by Tao Ma, but synchronize_rcu() is not enough here. 1. http://marc.info/?l=linux-fsdevel&m=129680863330185&w=2 Tested-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-11 16:12:19 -08:00
Linus Torvalds	2dab597441	Fix possible filp_cachep memory corruption In commit `31e6b01f41` ("fs: rcu-walk for path lookup") we started doing path lookup using RCU, which then falls back to a careful non-RCU lookup in case of problems (LOOKUP_REVAL). So do_filp_open() has this "re-do the lookup carefully" looping case. However, that means that we must not release the open-intent file data if we are going to loop around and use it once more! Fix this by moving the release of the open-intent data to the function that allocates it (do_filp_open() itself) rather than the helper functions that can get called multiple times (finish_open() and do_last()). This makes the logic for the lifetime of that field much more obvious, and avoids the possible double free. Reported-by: J. R. Okajima <hooanon05@yahoo.co.jp> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-11 15:53:38 -08:00
David Teigland	6b155c8fd4	dlm: use single thread workqueues The recent commit to use cmwq for send and recv threads `dcce240ead` introduced problems, apparently due to multiple workqueue threads. Single threads make the problems go away, so return to that until we fully understand the concurrency issues with multiple threads. Signed-off-by: David Teigland <teigland@redhat.com>	2011-02-11 16:50:47 -06:00
Jeff Layton	71823baff1	cifs: don't always drop malformed replies on the floor (try #3 ) Slight revision to this patch...use min_t() instead of conditional assignment. Also, remove the FIXME comment and replace it with the explanation that Steve gave earlier. After receiving a packet, we currently check the header. If it's no good, then we toss it out and continue the loop, leaving the caller waiting on that response. In cases where the packet has length inconsistencies, but the MID is valid, this leads to unneeded delays. That's especially problematic now that the client waits indefinitely for responses. Instead, don't immediately discard the packet if checkSMB fails. Try to find a matching mid_q_entry, mark it as having a malformed response and issue the callback. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-11 03:59:12 +00:00
Mimi Zohar	890275b5eb	IMA: maintain i_readcount in the VFS layer ima_counts_get() updated the readcount and invalidated the PCR, as necessary. Only update the i_readcount in the VFS layer. Move the PCR invalidation checks to ima_file_check(), where it belongs. Maintaining the i_readcount in the VFS layer, will allow other subsystems to use i_readcount. Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Acked-by: Eric Paris <eparis@redhat.com>	2011-02-10 07:51:44 -05:00
Jeff Layton	195291e68c	cifs: clean up checks in cifs_echo_request Follow-on patch to `7e90d705` which is already in Steve's tree... The check for tcpStatus == CifsGood is not meaningful since it doesn't indicate whether the NEGOTIATE request has been done. Also, clarify why we're checking for maxBuf == 0. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-10 03:44:20 +00:00
Steve French	7e90d705fc	[CIFS] Do not send SMBEcho requests on new sockets until SMBNegotiate In order to determine whether an SMBEcho request can be sent we need to know that the socket is established (server tcpStatus == CifsGood) AND that an SMB NegotiateProtocol has been sent (server maxBuf != 0). Without the second check we can send an Echo request during reconnection before the server can accept it. CC: JG <jg@cms.ac> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-08 23:52:32 +00:00
Artem Bityutskiy	10ac279702	UBIFS: fix LEB number in printk This is a minor patch which fixes the LEB number we print when corrupted empty space is found. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-02-08 17:26:32 +02:00
Linus Torvalds	cb5520f02c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (33 commits) Btrfs: Fix page count calculation btrfs: Drop __exit attribute on btrfs_exit_compress btrfs: cleanup error handling in btrfs_unlink_inode() Btrfs: exclude super blocks when we read in block groups Btrfs: make sure search_bitmap finds something in remove_from_bitmap btrfs: fix return value check of btrfs_start_transaction() btrfs: checking NULL or not in some functions Btrfs: avoid uninit variable warnings in ordered-data.c Btrfs: catch errors from btrfs_sync_log Btrfs: make shrink_delalloc a little friendlier Btrfs: handle no memory properly in prepare_pages Btrfs: do error checking in btrfs_del_csums Btrfs: use the global block reserve if we cannot reserve space Btrfs: do not release more reserved bytes to the global_block_rsv than we need Btrfs: fix check_path_shared so it returns the right value btrfs: check return value of btrfs_start_ioctl_transaction() properly btrfs: fix return value check of btrfs_join_transaction() fs/btrfs/inode.c: Add missing IS_ERR test btrfs: fix missing break in switch phrase btrfs: fix several uncheck memory allocations ...	2011-02-07 14:06:18 -08:00
Linus Torvalds	257a65d795	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: remove checks for ses->status == CifsExiting cifs: add check for kmalloc in parse_dacl cifs: don't send an echo request unless NegProt has been done cifs: enable signing flag in SMB header when server has it on cifs: Possible slab memory corruption while updating extended stats (repost) CIFS: Fix variable types in cifs_iovec_read/write (try #2) cifs: fix length vs. total_read confusion in cifs_demultiplex_thread	2011-02-07 14:02:06 -08:00
Abhijith Das	e79a46a030	GFS2: panics on quotacheck update Handle block allocation for forceful unstuffing of quota dinode during quota update using quotactl(). Also fix block reservation for special cases when quotas cross over block boundaries and update 2 blocks instead of 1. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-02-07 20:00:44 +00:00
Yan, Zheng	3a90983dbd	Btrfs: Fix page count calculation take offset of start position into account when calculating page count. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-07 14:13:51 -05:00
Curt Wohlgemuth	d50bdd5aa5	ext4: Fix data corruption with multi-block writepages support This fixes a corruption problem with the multi-block writepages submittal change for ext4, from commit `bd2d0210cf` ("ext4: use bio layer instead of buffer layer in mpage_da_submit_io"). (Note that this corruption is not present in 2.6.37 on ext4, because the corruption was detected after the feature was merged in 2.6.37-rc1, and so it was turned off by adding a non-default mount option, mblk_io_submit. With this commit, which hopefully fixes the last of the bugs with this feature, we'll be able to turn on this performance feature by default in 2.6.38, and remove the mblk_io_submit option.) The ext4 code path to bundle multiple pages for writeback in ext4_bio_write_page() had a bug: we should be clearing buffer head dirty flags before we submit the bio, not in the completion routine. The patch below was tested on 2.6.37 under KVM with the postgresql script which was submitted by Jon Nelson as documented in commit `1449032be1`. Without the patch, I'd hit the corruption problem about 50-70% of the time. With the patch, I executed the script > 100 times with no corruption seen. I also fixed a bug to make sure ext4_end_bio() doesn't dereference the bio after the bio_put() call. Reported-by: Jon Nelson <jnelson@jamponi.net> Reported-by: Matthias Bayer <jackdachef@gmail.com> Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2011-02-07 12:46:14 -05:00
Jeff Layton	d402539b8f	cifs: remove checks for ses->status == CifsExiting ses->status is never set to CifsExiting, so these checks are always false. Tested-by: JG <jg@cms.ac> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-07 17:25:55 +00:00
Vasiliy Kulikov	8c559d30b4	UBIFS: restrict world-writable debugfs files Don't allow everybody to dump sensitive information about filesystems. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-02-06 18:59:31 +02:00
Artem Bityutskiy	be7b42a5cb	UBIFS: describe UBIFS recovery logic some more This patch adds more commentaries about UBIFS recovery logic which should explain the famous UBIFS "corrupt empty space" errors. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-02-06 18:59:30 +02:00
Artem Bityutskiy	822ed64c5b	UBIFS: remove double semicolon Just a tiny clean-up - remove ;; Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-02-06 15:08:02 +02:00
Alexey Charkov	8e4eef7a60	btrfs: Drop __exit attribute on btrfs_exit_compress As this function is called in some error paths while not removing the module, the __exit attribute prevents the kernel image from linking when btrfs is compiled in statically. Signed-off-by: Alexey Charkov <alchark@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-06 07:19:19 -05:00
Tsutomu Itoh	554233a6e0	btrfs: cleanup error handling in btrfs_unlink_inode() When btrfs_alloc_path() fails, btrfs_free_path() need not be called. Therefore, it changes the branch ahead. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-06 07:17:45 -05:00
Josef Bacik	3c14874acc	Btrfs: exclude super blocks when we read in block groups This has been resulting in a BUT_ON(ret) after btrfs_reserve_extent in btrfs_cow_file_range. The reason is we don't actually calculate the bytes_super for a block group until we go to cache it, which means that the space_info can hand out reservations for space that it doesn't actually have, and we can run out of data space. This is also a problem if you are using space caching since we don't ever calculate bytes_super for the block groups. So instead everytime we read a block group call exclude_super_stripes, which calculates the bytes_super for the block group so it can be left out of the space_info. Then whenever caching completes we just call free_excluded_extents so that the super excluded extents are freed up. Also if we are unmounting and we hit any block groups that haven't been cached we still need to call free_excluded_extents to make sure things are cleaned up properly. Thanks, Reported-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-06 07:17:44 -05:00
Josef Bacik	13dbc08987	Btrfs: make sure search_bitmap finds something in remove_from_bitmap When we're cleaning up the tree log we need to be able to remove free space from the block group. The problem is if that free space spans bitmaps we would not find the space since we're looking for too many bytes. So make sure the amount of bytes we search for is limited to either the number of bytes we want, or the number of bytes left in the bitmap. This was tested by a user who was hitting the BUG() after search_bitmap. With this patch he can now mount his fs. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-06 07:13:12 -05:00
Stanislav Fomichev	8132b65bc6	cifs: add check for kmalloc in parse_dacl Exit from parse_dacl if no memory returned from the call to kmalloc. Signed-off-by: Stanislav Fomichev <kernel@fomichev.me> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-06 00:36:23 +00:00
Sage Weil	e8e1ba96b2	ceph: queue cap_snaps once per realm We were forming a dirty list, and then queueing cap_snaps for each realm _and_ its children, regardless of whether the children were already in the dirty list. This meant we did it twice for some realms. Which in turn meant we corrupted mdsc->snap_flush_list when the cap_snap was re-added to the list it was already on, and could trigger an infinite loop. We were also using recursion to do reach all the children, a no-no when stack is limited. Instead, (re)queue any children on the dirty list, avoiding processing anything twice and avoiding any recursion. Signed-off-by: Sage Weil <sage@newdream.net>	2011-02-04 20:45:58 -08:00
Jeff Layton	247ec9b418	cifs: don't send an echo request unless NegProt has been done When the socket to the server is disconnected, the client more or less immediately calls cifs_reconnect to reconnect the socket. The NegProt and SessSetup however are not done until an actual call needs to be made. With the addition of the SMB echo code, it's possible that the server will initiate a disconnect on an idle socket. The client will then reconnect the socket but no NegotiateProtocol request is done. The SMBEcho workqueue job will then eventually pop, and an SMBEcho will be sent on the socket. The server will then reject it since no NegProt was done. The ideal fix would be to either have the socket not be reconnected until we plan to use it, or to immediately do a NegProt when the reconnect occurs. The code is not structured for this however. For now we must just settle for not sending any echoes until the NegProt is done. Reported-by: JG <jg@cms.ac> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-05 03:02:14 +00:00
Jeff Layton	e3f0dadb2b	cifs: enable signing flag in SMB header when server has it on cifs_sign_smb only generates a signature if the correct Flags2 bit is set. Make sure that it gets set correctly if we're sending an async call. This patch fixes: https://bugzilla.kernel.org/show_bug.cgi?id=28142 Reported-and-Tested-by: JG <jg@cms.ac> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-04 20:19:57 +00:00
Shirish Pargaonkar	64474bdd07	cifs: Possible slab memory corruption while updating extended stats (repost) Updating extended statistics here can cause slab memory corruption if a callback function frees slab memory (mid_entry). Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-04 20:18:06 +00:00
Tetsuo Handa	78d2978874	CRED: Fix kernel panic upon security_file_alloc() failure. In get_empty_filp() since 2.6.29, file_free(f) is called with f->f_cred == NULL when security_file_alloc() returned an error. As a result, kernel will panic() due to put_cred(NULL) call within RCU callback. Fix this bug by assigning f->f_cred before calling security_file_alloc(). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-04 10:40:29 -08:00
Pavel Shilovsky	76429c148b	CIFS: Fix variable types in cifs_iovec_read/write (try #2 ) Variable 'i' should be unsigned long as it's used in circle with num_pages, and bytes_read/total_written should be ssize_t according to return value. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-04 04:41:06 +00:00
Linus Torvalds	89840966c5	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus: hfsplus: fix up a comparism in hfsplus_file_extend hfsplus: fix two memory leaks in wrapper.c hfsplus: do not leak buffer on error hfsplus: fix failed mount handling	2011-02-03 16:31:43 -08:00
Amerigo Wang	1f7da214e2	debugfs: remove module_exit() debugfs can't be a module, so module_exit() is meaningless for it. Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-03 15:39:17 -08:00
Christoph Hellwig	1065348d47	hfsplus: fix up a comparism in hfsplus_file_extend Revert an incorrect hunk from commit `b2837fcf49`, "hfsplus: %L-to-%ll, macro correction, and remove unneeded braces" revert a pointless change of comparism operation argument order, which turned out to not even be equivalent. Reported-by: Joe Perches <joe@perches.com> Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2011-02-03 16:34:18 -07:00
Chuck Ebbert	a1dbcef017	hfsplus: fix two memory leaks in wrapper.c Signed-Off-By: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2011-02-03 16:34:11 -07:00
Chuck Ebbert	14dd01f883	hfsplus: do not leak buffer on error Signed-Off-By: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2011-02-03 16:34:05 -07:00
Christoph Hellwig	c5b8d0bce0	hfsplus: fix failed mount handling Currently the error handling in hfsplus_fill_super is a mess, and can lead to accessing fields in the superblock that haven't been even set up yet. Fix this by making sure we do not set up sb->s_root until we have the mount fully set up, and before that do proper step by step unwinding instead of using hfsplus_put_super as a big hammer. Reported-by: Dan Williams <dcbw@redhat.com> Signed-off-by: Christoph Hellwig <hch@tuxera.com>	2011-02-03 16:33:51 -07:00
Theodore Ts'o	dd68314ccf	ext4: fix up ext4 error handling Make sure we the correct cleanup happens if we die while trying to load the ext4 file system. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-02-03 14:33:49 -05:00
Lukas Czerner	8f021222c1	ext4: unregister features interface on module unload Ext4 features interface was not properly unregistered which led to problems while unloading/reloading ext4 module. This commit fixes that by adding proper kobject unregistration code into ext4_exit_fs() as well as fail-path of ext4_init_fs() Reported-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2011-02-03 14:33:33 -05:00
Eric Sandeen	8f1f745331	ext4: fix panic on module unload when stopping lazyinit thread https://bugzilla.kernel.org/show_bug.cgi?id=27652 If the lazyinit thread is running, the teardown function ext4_destroy_lazyinit_thread() has problems: ext4_clear_request_list(); while (ext4_li_info->li_task) { wake_up(&ext4_li_info->li_wait_daemon); wait_event(ext4_li_info->li_wait_task, ext4_li_info->li_task == NULL); } Clearing the request list will cause the thread to exit and free ext4_li_info, so then we're waiting on something which is getting freed. Fix this up by making the thread respond to kthread_stop, and exit, without the need to wait for that exit in some other homegrown way. Cc: stable@kernel.org Reported-and-Tested-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-02-03 14:33:15 -05:00
Boaz Harrosh	0b0abeaf3d	Revert "exofs: Set i_mapping->backing_dev_info anyway" This reverts commit `115e19c535`. Apparently setting inode->bdi to one's own sb->s_bdi stops VFS from sending read-aheads. This problem was bisected to this commit. A revert fixes it. I'll investigate farther why is this happening for the next Kernel, but for now a revert. I'm sending to stable@kernel.org as well, since it exists also in 2.6.37. 2.6.36 is good and does not have this patch. CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-02 17:53:27 -08:00
Josef Bacik	d54cdc8ca7	fs: make block fiemap mapping length at least blocksize long Some filesystems don't deal well with being asked to map less than blocksize blocks (GFS2 for example). Since we are always mapping at least blocksize sections anyway, just make sure len is at least as big as a blocksize so we don't trip up any filesystems. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-02 16:03:20 -08:00
Namhyung Kim	3cd90ea42f	vfs: sparse: add __FMODE_EXEC FMODE_EXEC is a constant type of fmode_t but was used with normal integer constants. This results in following warnings from sparse. Fix it using new macro __FMODE_EXEC. fs/exec.c:116:58: warning: restricted fmode_t degrades to integer fs/exec.c:689:58: warning: restricted fmode_t degrades to integer fs/fcntl.c:777:9: warning: restricted fmode_t degrades to integer Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-02 16:03:19 -08:00
Eric Dumazet	0781b909b5	epoll: epoll_wait() should not use timespec_add_ns() commit `95aac7b1cd` ("epoll: make epoll_wait() use the hrtimer range feature") added a performance regression because it uses timespec_add_ns() with potential very large 'ns' values. [akpm@linux-foundation.org: s/epoll_set_mstimeout/ep_set_mstimeout/, per Davide] Reported-by: Simon Kirby <sim@hostway.ca> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Shawn Bohrer <shawn.bohrer@gmail.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: <stable@kernel.org> [2.6.37.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-02 16:03:18 -08:00
Steven Whitehouse	b9c93bb7de	GFS2: Improve cluster mmap scalability The mmap system call grabs a glock when an update to atime maybe required. It does this in order to ensure that the flags on the inode are uptodate, but since it will only mark atime for a future update, an exclusive lock is not required here (one will be taken later when the actual update is performed). Also, the lock can be skipped when the mount is marked noatime in addition to the original check which only looked at the noatime flag for the inode itself. This should increase the scalability of the mmap call when multiple nodes are all mmaping the same file. Reported-by: Scooter Morris <scooter@cgl.ucsf.edu> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-02-02 14:48:10 +00:00
Jeff Layton	9587fcff42	cifs: fix length vs. total_read confusion in cifs_demultiplex_thread length at this point is the length returned by the last kernel_recvmsg call. total_read is the length of all of the data read so far. length is more or less meaningless at this point, so use total_read for everything. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-02 00:17:04 +00:00
Linus Torvalds	405b864d3f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: fix length checks in checkSMB [CIFS] Update cifs minor version cifs: No need to check crypto blockcipher allocation cifs: clean up some compiler warnings cifs: make CIFS depend on CRYPTO_MD4 cifs: force a reconnect if there are too many MIDs in flight cifs: don't pop a printk when sending on a socket is interrupted cifs: simplify SMB header check routine cifs: send an NT_CANCEL request when a process is signalled cifs: handle cancelled requests better cifs: fix two compiler warning about uninitialized vars	2011-02-02 10:22:40 +11:00
Lucian Adrian Grijincu	8e6c96935f	security/selinux: fix /proc/sys/ labeling This fixes an old (2007) selinux regression: filesystem labeling for /proc/sys returned -r--r--r-- unknown /proc/sys/fs/file-nr instead of -r--r--r-- system_u:object_r:sysctl_fs_t:s0 /proc/sys/fs/file-nr Events that lead to breaking of /proc/sys/ selinux labeling: 1) sysctl was reimplemented to route all calls through /proc/sys/ commit `77b14db502` [PATCH] sysctl: reimplement the sysctl proc support 2) proc_dir_entry was removed from ctl_table: commit `3fbfa98112` [PATCH] sysctl: remove the proc_dir_entry member for the sysctl tables 3) selinux still walked the proc_dir_entry tree to apply labeling. Because ctl_tables don't have a proc_dir_entry, we did not label /proc/sys/ inodes any more. To achieve this the /proc/sys/ inodes were marked private and private inodes were ignored by selinux. commit `bbaca6c2e7` [PATCH] selinux: enhance selinux to always ignore private inodes commit `86a71dbd3e` [PATCH] sysctl: hide the sysctl proc inodes from selinux Access control checks have been done by means of a special sysctl hook that was called for read/write accesses to any /proc/sys/ entry. We don't have to do this because, instead of walking the proc_dir_entry tree we can walk the dentry tree (as done in this patch). With this patch: * we don't mark /proc/sys/ inodes as private * we don't need the sysclt security hook * we walk the dentry tree to find the path to the inode. We have to strip the PID in /proc/PID/ entries that have a proc_dir_entry because selinux does not know how to label paths like '/1/net/rpc/nfsd.fh' (and defaults to 'proc_t' labeling). Selinux does know of '/net/rpc/nfsd.fh' (and applies the 'sysctl_rpc_t' label). PID stripping from the path was done implicitly in the previous code because the proc_dir_entry tree had the root in '/net' in the example from above. The dentry tree has the root in '/1'. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2011-02-01 11:53:54 -05:00
Eric Paris	2a7dba391e	fs/vfs/security: pass last path component to LSM on inode creation SELinux would like to implement a new labeling behavior of newly created inodes. We currently label new inodes based on the parent and the creating process. This new behavior would also take into account the name of the new object when deciding the new label. This is not the (supposed) full path, just the last component of the path. This is very useful because creating /etc/shadow is different than creating /etc/passwd but the kernel hooks are unable to differentiate these operations. We currently require that userspace realize it is doing some difficult operation like that and than userspace jumps through SELinux hoops to get things set up correctly. This patch does not implement new behavior, that is obviously contained in a seperate SELinux patch, but it does pass the needed name down to the correct LSM hook. If no such name exists it is fine to pass NULL. Signed-off-by: Eric Paris <eparis@redhat.com>	2011-02-01 11:12:29 -05:00
Tsutomu Itoh	98d5dc13e7	btrfs: fix return value check of btrfs_start_transaction() The error check of btrfs_start_transaction() is added, and the mistake of the error check on several places is corrected. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-01 07:17:27 -05:00
Tsutomu Itoh	5df6708348	btrfs: checking NULL or not in some functions Because NULL is returned when the memory allocation fails, it is checked whether it is NULL. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-01 07:16:37 -05:00
Tejun Heo	83e759043a	xfs: convert to alloc_workqueue() Convert from create[_singlethread]_workqueue() to alloc_workqueue(). * xfsdatad_workqueue and xfsconvertd_workqueue are identity converted. Using higher concurrency limit might be useful but given the complexity of workqueue usage in xfs, proceeding cautiously seems better. * xfs_mru_reap_wq is converted to non-ordered workqueue with max concurrency of 1 as the work items don't require any specific ordering and already have proper synchronization. It seems it was singlethreaded to save worker threads, which is no longer a concern. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Alex Elder <aelder@sgi.com> Cc: xfs-masters@oss.sgi.com Cc: Christoph Hellwig <hch@infradead.org>	2011-02-01 11:42:43 +01:00
Tejun Heo	28aadf5169	reiserfs: make commit_wq use the default concurrency level The maximum number of concurrent work items queued on commit_wq is bound by the number of active journals. Convert to alloc_workqueue() and use the default concurrency level so that they can be processed in parallel. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: reiserfs-devel@vger.kernel.org	2011-02-01 11:42:42 +01:00
Tejun Heo	316873c958	ocfs2: use system_wq instead of ocfs2_quota_wq ocfs2_quota_wq is not depended upon during memory reclaim and, with cmwq, there's no reason to use a dedicated workqueue. Drop ocfs2_quota_wq and use system_wq instead. dqi_sync_work is already sync canceled on quota disable and no further synchronization is necessary. This change makes ocfs2_quota_setup/shutdown() noops. Both functions removed. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com>	2011-02-01 11:42:42 +01:00
Tejun Heo	fd89d5f203	ext4: convert to alloc_workqueue() Convert create_workqueue() to alloc_workqueue(). This is an identity conversion. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: linux-ext4@vger.kernel.org	2011-02-01 11:42:42 +01:00
Chris Mason	c87fb6fdca	Btrfs: avoid uninit variable warnings in ordered-data.c This one isn't really an uninit variable, but for pretty obscure reasons. Let's make it clearly correct. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-31 20:33:37 -05:00
Linus Torvalds	0fd08c5545	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: NFSv4 readdir loses entries NFS: Micro-optimize nfs4_decode_dirent() NFS: Fix an NFS client lockdep issue NFS construct consistent co_ownerid for v4.1 NFS: nfs_wcc_update_inode() should set nfsi->attr_gencount NFS improve pnfs_put_deviceid_cache debug print NFS fix cb_sequence error processing NFS do not find client in NFSv4 pg_authenticate NLM: Fix "kernel BUG at fs/lockd/host.c:417!" or ".../host.c:283!" NFS: Prevent memory allocation failure in nfsacl_encode() NFS: nfsacl_{encode,decode} should return signed integer NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!" NFS: Fix "kernel BUG at fs/aio.c:554!" NFS4: Avoid potential NULL pointer dereference in decode_and_add_ds(). NFS: fix handling of malloc failure during nfs_flush_multi()	2011-02-01 09:41:02 +10:00
Jeff Layton	6284644e8d	cifs: fix length checks in checkSMB The cERROR message in checkSMB when the calculated length doesn't match the RFC1001 length is incorrect in many cases. It always says that the RFC1001 length is bigger than the SMB, even when it's actually the reverse. Fix the error message to say the reverse of what it does now when the SMB length goes beyond the end of the received data. Also, clarify the error message when the RFC length is too big. Finally, clarify the comments to show that the 512 byte limit on extra data at the end of the packet is arbitrary. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-31 22:35:37 +00:00
Linus Torvalds	fb9f1f17e9	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: xfs_bmap_add_extent_delay_real should init br_startblock xfs: fix dquot shaker deadlock xfs: handle CIl transaction commit failures correctly xfs: limit extsize to size of AGs and/or MAXEXTLEN xfs: prevent extsize alignment from exceeding maximum extent size xfs: limit extent length for allocation to AG size xfs: speculative delayed allocation uses rounddown_power_of_2 badly xfs: fix efi item leak on forced shutdown xfs: fix log ticket leak on forced shutdown.	2011-02-01 08:15:40 +10:00
Steve French	cab6958da0	[CIFS] Update cifs minor version Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-31 21:56:35 +00:00
Chris Mason	b31eabd86e	Btrfs: catch errors from btrfs_sync_log btrfs_sync_log returns -EAGAIN when we need full transaction commits instead of small log commits, but sometimes we were dropping the return value. In practice, we check for this a few different ways, but this is still a bug that can leave off full log commits when we really need them. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-31 16:48:24 -05:00
Josef Bacik	b1953bcec9	Btrfs: make shrink_delalloc a little friendlier Xfstests 224 will just sit there and spin for ever until eventually we give up flushing delalloc and exit. On my box this took several hours. I could not interrupt this process either, even though we use INTERRUPTIBLE. So do 2 things 1) Keep us from looping over and over again without reclaiming anything 2) If we get interrupted exit the loop I tested this and the test now exits in a reasonable amount of time, and can be interrupted with ctrl+c. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-31 16:27:28 -05:00
Shirish Pargaonkar	7a8587e7c8	cifs: No need to check crypto blockcipher allocation Missed one change as per earlier suggestion. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-31 17:29:18 +00:00
Jeff Layton	31c2659d78	cifs: clean up some compiler warnings New compiler warnings that I noticed when building a patchset based on recent Fedora kernel: fs/cifs/cifssmb.c: In function 'CIFSSMBSetFileSize': fs/cifs/cifssmb.c:4813:8: warning: variable 'data_offset' set but not used [-Wunused-but-set-variable] fs/cifs/file.c: In function 'cifs_open': fs/cifs/file.c:349:24: warning: variable 'pCifsInode' set but not used [-Wunused-but-set-variable] fs/cifs/file.c: In function 'cifs_partialpagewrite': fs/cifs/file.c:1149:23: warning: variable 'cifs_sb' set but not used [-Wunused-but-set-variable] fs/cifs/file.c: In function 'cifs_iovec_write': fs/cifs/file.c:1740:9: warning: passing argument 6 of 'CIFSSMBWrite2' from incompatible pointer type [enabled by default] fs/cifs/cifsproto.h:337:12: note: expected 'unsigned int ' but argument is of type 'size_t ' fs/cifs/readdir.c: In function 'cifs_readdir': fs/cifs/readdir.c:767:23: warning: variable 'cifs_sb' set but not used [-Wunused-but-set-variable] fs/cifs/cifs_dfs_ref.c: In function 'cifs_dfs_d_automount': fs/cifs/cifs_dfs_ref.c:342:2: warning: 'rc' may be used uninitialized in this function [-Wuninitialized] fs/cifs/cifs_dfs_ref.c:278:6: note: 'rc' was declared here Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-31 15:39:10 +00:00
Jeff Layton	f855f6cbeb	cifs: make CIFS depend on CRYPTO_MD4 Recently CIFS was changed to use the kernel crypto API for MD4 hashes, but the Kconfig dependencies were not changed to reflect this. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reported-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-31 15:26:07 +00:00
Steven Whitehouse	edae38a643	GFS2: Fix glock queue trace point Somehow this tracepoint landed up in the wrong place. This moves it to where it should be. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-01-31 09:38:12 +00:00
Jeff Layton	92a4e0f016	cifs: force a reconnect if there are too many MIDs in flight Currently, we allow the pending_mid_q to grow without bound with SIGKILL'ed processes. This could eventually be a DoS'able problem. An unprivileged user could a process that does a long-running call and then SIGKILL it. If he can also intercept the NT_CANCEL calls or the replies from the server, then the pending_mid_q could grow very large, possibly even to 2^16 entries which might leave GetNextMid in an infinite loop. Fix this by imposing a hard limit of 32k calls per server. If we cross that limit, set the tcpStatus to CifsNeedReconnect to force cifsd to eventually reconnect the socket and clean out the pending_mid_q. While we're at it, clean up the function a bit and eliminate an unnecessary NULL pointer check. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-31 04:38:15 +00:00
Jeff Layton	d804d41d16	cifs: don't pop a printk when sending on a socket is interrupted If we kill the process while it's sending on a socket then the kernel_sendmsg will return -EINTR. This is normal. No need to spam the ring buffer with this info. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-31 04:32:21 +00:00
Jeff Layton	68abaffa6b	cifs: simplify SMB header check routine ...just cleanup. There should be no behavior change. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-31 04:30:37 +00:00
Jeff Layton	2db7c58155	cifs: send an NT_CANCEL request when a process is signalled Use the new send_nt_cancel function to send an NT_CANCEL when the process is delivered a fatal signal. This is a "best effort" enterprise however, so don't bother to check the return code. There's nothing we can reasonably do if it fails anyway. Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-31 04:24:38 +00:00
Jeff Layton	1be912dde7	cifs: handle cancelled requests better Currently, when a request is cancelled via signal, we delete the mid immediately. If the request was already transmitted however, the client is still likely to receive a response. When it does, it won't recognize it however and will pop a printk. It's also a little dangerous to just delete the mid entry like this. We may end up reusing that mid. If we do then we could potentially get the response from the first request confused with the later one. Prevent the reuse of mids by marking them as cancelled and keeping them on the pending_mid_q list. If the reply comes in, we'll delete it from the list then. If it never comes, then we'll delete it at reconnect or when cifsd comes down. Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-31 04:23:31 +00:00
Steve French	58b8a5b45a	Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6	2011-01-31 04:17:03 +00:00
Jeff Layton	ffeb414a59	cifs: fix two compiler warning about uninitialized vars fs/cifs/link.c: In function ‘symlink_hash’: fs/cifs/link.c:58:3: warning: ‘rc’ may be used uninitialized in this function [-Wuninitialized] fs/cifs/smbencrypt.c: In function ‘mdfour’: fs/cifs/smbencrypt.c:61:3: warning: ‘rc’ may be used uninitialized in this function [-Wuninitialized] Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-31 03:15:57 +00:00
Anton Altaparmakov	af5eb745ef	NTFS: Fix invalid pointer dereference in ntfs_mft_record_alloc(). In ntfs_mft_record_alloc() when mapping the new extent mft record with map_extent_mft_record() we overwrite @m with the return value and on error, we then try to use the old @m but that is no longer there as @m now contains an error code instead so we crash when dereferencing the error code as if it were a pointer. The simple fix is to use a temporary variable to store the return value thus preserving the original @m for later use. This is a backport from the commercial Tuxera-NTFS driver and is well tested... Thanks go to Julia Lawall for pointing this out (whilst I had fixed it in the commercial driver I had failed to fix it in the Linux kernel). Signed-off-by: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-31 12:58:11 +10:00
Linus Torvalds	9fbf0c08d4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: More crypto cleanup (try #2) CIFS: Add strictcache mount option CIFS: Implement cifs_strict_writev (try #4) [CIFS] Replace cifs md5 hashing functions with kernel crypto APIs	2011-01-31 12:56:27 +10:00
Josef Bacik	7adf5dfbb3	Btrfs: handle no memory properly in prepare_pages Instead of doing a BUG_ON(1) in prepare_pages if grab_cache_page() fails, just loop through the pages we've already grabbed and unlock and release them, then return -ENOMEM like we should. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:42:34 -05:00
Josef Bacik	ad0397a7a9	Btrfs: do error checking in btrfs_del_csums Got a report of a box panicing because we got a NULL eb in read_extent_buffer. His fs was borked and btrfs_search_path returned EIO, but we don't check for errors so the box paniced. Yes I know this will just make something higher up the stack panic, but that's a problem for future Josef. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:42:34 -05:00
Josef Bacik	68a82277b8	Btrfs: use the global block reserve if we cannot reserve space We call use_block_rsv right before we make an allocation in order to make sure we have enough space. Now normally people have called btrfs_start_transaction() with the appropriate amount of space that we need, so we just use some of that pre-reserved space and move along happily. The problem is where people use btrfs_join_transaction(), which doesn't actually reserve any space. So we try and reserve space here, but we cannot flush delalloc, so this forces us to return -ENOSPC when in reality we have plenty of space. The most common symptom is seeing a bunch of "couldn't dirty inode" messages in syslog. With xfstests 224 we end up falling back to start_transaction and then doing all the flush delalloc stuff which causes to hang for a very long time. So instead steal from the global reserve, which is what this is meant for anyway. With this patch and the other 2 I have sent xfstests 224 now passes successfully. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:40:37 -05:00
Josef Bacik	e9e22899de	Btrfs: do not release more reserved bytes to the global_block_rsv than we need When we do btrfs_block_rsv_release, if global_block_rsv is not full we will release all the extra bytes to global_block_rsv, even if it's only a little short of the amount of space that we need to reserve. This causes us to starve ourselves of reservable space during the transaction which will force us to shrink delalloc bytes and commit the transaction more often than we should. So instead just add the amount of bytes we need to add to the global reserve so reserved == size, and then add the rest back into the space_info for general use. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:40:37 -05:00
Josef Bacik	dedefd7215	Btrfs: fix check_path_shared so it returns the right value When running xfstests 224 I kept getting ENOSPC when trying to remove the files, and this is because we were returning ret from check_path_shared while it was uninitalized, which isn't right. Fix this to return 0 properly, and now xfstests 224 doesn't freak out when it tries to clean itself up. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:40:37 -05:00
Tsutomu Itoh	abd30bb0af	btrfs: check return value of btrfs_start_ioctl_transaction() properly btrfs_start_ioctl_transaction() returns ERR_PTR(), not NULL. So, it is necessary to use IS_ERR() to check the return value. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:40:37 -05:00
Tsutomu Itoh	3612b49598	btrfs: fix return value check of btrfs_join_transaction() The error check of btrfs_join_transaction()/btrfs_join_transaction_nolock() is added, and the mistake of the error check in several places is corrected. For more stable Btrfs, I think that we should reduce BUG_ON(). But, I think that long time is necessary for this. So, I propose this patch as a short-term solution. With this patch: - To more stable Btrfs, the part that should be corrected is clarified. - The panic isn't done by the NULL pointer reference etc. (even if BUG_ON() is increased temporarily) - The error code is returned in the place where the error can be easily returned. As a long-term plan: - BUG_ON() is reduced by using the forced-readonly framework, etc. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:40:37 -05:00
Julia Lawall	34d19bada0	fs/btrfs/inode.c: Add missing IS_ERR test After the conditional that precedes the following code, inode may be an ERR_PTR value. This can eg result from a memory allocation failure via the call to btrfs_iget, and thus does not imply that root is different than sub_root. Thus, an IS_ERR check is added to ensure that there is no dereference of inode in this case. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r@ identifier f; @@ f(...) { ... return ERR_PTR(...); } @@ identifier r.f, fld; expression x; statement S1,S2; @@ x = f(...) ... when != IS_ERR(x) ( if (IS_ERR(x) \|\|...) S1 else S2 \| *x->fld ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:40:37 -05:00
liubo	333e810544	btrfs: fix missing break in switch phrase There is a missing break in switch, fix it. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:40:37 -05:00
liubo	2a29edc6b6	btrfs: fix several uncheck memory allocations To make btrfs more stable, add several missing necessary memory allocation checks, and when no memory, return proper errno. We've checked that some of those -ENOMEM errors will be returned to userspace, and some will be catched by BUG_ON() in the upper callers, and none will be ignored silently. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:40:36 -05:00
liubo	6b82ce8d82	btrfs: fix uncheck memory allocation in btrfs_submit_compressed_read btrfs_submit_compressed_read() is lack of memory allocation checks and corresponding error route. After this fix, if it comes to "no memory" case, errno will be returned to userland step by step, and tell users this operation cannot go on. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-28 16:40:36 -05:00
Chris Mason	eab49bec41	Merge branch 'bug-fixes' of git://repo.or.cz/linux-btrfs-devel into btrfs-38	2011-01-28 16:24:59 -05:00
Chuck Lever	d1205f87bb	NFS: NFSv4 readdir loses entries On recent 2.6.38-rc kernels, connectathon basic test 6 fails on NFSv4 mounts of OpenSolaris with something like: > ./test6: readdir > ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.12' dir entry, pass 0 > ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.82' dir entry, pass 0 > ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.164' dir entry, pass 0 > ./test6: (/mnt/klimt/matisse.test) Test failed with 3 errors > basic tests failed > Tests failed, leaving /mnt/klimt mounted > [cel@matisse cthon04]$ I narrowed the problem down to nfs4_decode_dirent() reporting that the decode buffer had overflowed while decoding the entries for those missing files. verify_attr_len() assumes both it's pointer arguments reside on the same page. When these arguments point to locations on two different pages, verify_attr_len() can report false errors. This can happen now that a large NFSv4 readdir result can span pages. We have reasonably good checking in nfs4_decode_dirent() anyway, so it should be safe to simply remove the extra checking. At a guess, this was introduced by commit `6650239a`, "NFS: Don't use vm_map_ram() in readdir". Cc: stable@kernel.org [2.6.37] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-28 13:41:35 -05:00
Chuck Lever	c08e76d0cd	NFS: Micro-optimize nfs4_decode_dirent() Make the decoding of NFSv4 directory entries slightly more efficient by: 1. Avoiding unnecessary byte swapping when checking XDR booleans, and 2. Not bumping "p" when its value will be immediately replaced by xdr_inline_decode() This commit makes nfs4_decode_dirent() consistent with similar logic in the other two decode_dirent() functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-28 13:37:35 -05:00
Trond Myklebust	e00b8a2404	NFS: Fix an NFS client lockdep issue There is no reason to be freeing the delegation cred in the rcu callback, and doing so is resulting in a lockdep complaint that rpc_credcache_lock is being called from both softirq and non-softirq contexts. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2011-01-28 13:37:09 -05:00
bpm@sgi.com	24446fc66f	xfs: xfs_bmap_add_extent_delay_real should init br_startblock When filling in the middle of a previous delayed allocation in xfs_bmap_add_extent_delay_real, set br_startblock of the new delay extent to the right to nullstartblock instead of 0 before inserting the extent into the ifork (xfs_iext_insert), rather than setting br_startblock afterward. Adding the extent into the ifork with br_startblock=0 can lead to the extent being copied into the btree by xfs_bmap_extent_to_btree if we happen to convert from extents format to btree format before updating br_startblock with the correct value. The unexpected addition of this delay extent to the btree can cause subsequent XFS_WANT_CORRUPTED_GOTO filesystem shutdown in several xfs_bmap_add_extent_delay_real cases where we are converting a delay extent to real and unexpectedly find an extent already inserted. For example: 911 case BMAP_LEFT_FILLING: 912 /* 913 * Filling in the first part of a previous delayed allocation. 914 * The left neighbor is not contiguous. 915 */ 916 trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_); 917 xfs_bmbt_set_startoff(ep, new_endoff); 918 temp = PREV.br_blockcount - new->br_blockcount; 919 xfs_bmbt_set_blockcount(ep, temp); 920 xfs_iext_insert(ip, idx, 1, new, state); 921 ip->i_df.if_lastex = idx; 922 ip->i_d.di_nextents++; 923 if (cur == NULL) 924 rval = XFS_ILOG_CORE \| XFS_ILOG_DEXT; 925 else { 926 rval = XFS_ILOG_CORE; 927 if ((error = xfs_bmbt_lookup_eq(cur, new->br_startoff, 928 new->br_startblock, new->br_blockcount, 929 &i))) 930 goto done; 931 XFS_WANT_CORRUPTED_GOTO(i == 0, done); With the bogus extent in the btree we shutdown the filesystem at 931. The conversion from extents to btree format happens when the number of extents in the inode increases above ip->i_df.if_ext_max. xfs_bmap_extent_to_btree copies extents from the ifork into the btree, ignoring all delalloc extents which are denoted by br_startblock having some value of nullstartblock. SGI-PV: 1013221 Signed-off-by: Ben Myers <bpm@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-01-28 09:13:29 -06:00
Dave Chinner	0fbca4d1c3	xfs: fix dquot shaker deadlock Commit `368e136` ("xfs: remove duplicate code from dquot reclaim") fails to unlock the dquot freelist when the number of loop restarts is exceeded in xfs_qm_dqreclaim_one(). This causes hangs in memory reclaim. Rework the loop control logic into an unwind stack that all the different cases jump into. This means there is only one set of code that processes the loop exit criteria, and simplifies the unlocking of all the items from different points in the loop. It also fixes a double increment of the restart counter from the qi_dqlist_lock case. Reported-by: Malcolm Scott <lkml@malc.org.uk> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-01-28 09:05:36 -06:00
Dave Chinner	c6f990d1ff	xfs: handle CIl transaction commit failures correctly Failure to commit a transaction into the CIL is not handled correctly. This currently can only happen when racing with a shutdown and requires an explicit shutdown check, so it rare and can be avoided. Remove the shutdown check and make the CIL commit a void function to indicate it will always succeed, thereby removing the incorrectly handled failure case. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-01-28 09:05:36 -06:00
Dave Chinner	5315837dae	xfs: limit extsize to size of AGs and/or MAXEXTLEN The extent size hint can be set to larger than an AG. This means that the alignment process can push the range to be allocated outside the bounds of the AG, resulting in assert failures or corrupted bmbt records. Similarly, if the extsize is larger than the maximum extent size supported, the alignment process will produce extents that are too large to fit into the bmbt records, resulting in a different type of assert/corruption failure. Fix this by limiting extsize at the time іt is set firstly to be less than MAXEXTLEN, then to be a maximum of half the size of the AGs in the filesystem for non-realtime inodes. Realtime inodes do not allocate out of AGs, so don't have to be restricted by the size of AGs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-01-28 09:05:36 -06:00
Dave Chinner	4ce159890c	xfs: prevent extsize alignment from exceeding maximum extent size When doing delayed allocation, if the allocation size is for a maximally sized extent, extent size alignment can push it over this limit. This results in an assert failure in xfs_bmbt_set_allf() as the extent length is too large to find in the extent record. Fix this by ensuring that we allow for space that extent size alignment requires (up to 2 * (extsize -1) blocks as we have to handle both head and tail alignment) when limiting the maximum size of the extent. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-01-28 09:05:36 -06:00
Dave Chinner	14b064ceaa	xfs: limit extent length for allocation to AG size Delayed allocation extents can be larger than AGs, so when trying to convert a large range we may scan every AG inside xfs_bmap_alloc_nullfb() trying to find an AG with a size larger than an AG. We should stop when we find the first AG with a maximum possible allocation size. This causes excessive CPU usage when there are lots of AGs. The same problem occurs when doing preallocation of a range larger than an AG. Fix the problem by limiting real allocation lengths to the maximum that an AG can support. This means if we have empty AGs, we'll stop the search at the first of them. If there are no empty AGs, we'll still scan them all, but that is a different problem.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-01-28 09:05:35 -06:00
Dave Chinner	b8fc82630a	xfs: speculative delayed allocation uses rounddown_power_of_2 badly rounddown_power_of_2() returns an undefined result when passed a value of zero. The specualtive delayed allocation code is doing this when the inode is zero length. Hence occasionally the preallocation is much, much larger than is necessary (e.g. 8GB for a 270 _byte_ file). Ensure we don't even pass a zero value to this function so the result of preallocation is always the desired size. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-01-28 09:05:35 -06:00
Dave Chinner	e34a314c5e	xfs: fix efi item leak on forced shutdown After test 139, kmemleak shows: unreferenced object 0xffff880078b405d8 (size 400): comm "xfs_io", pid 4904, jiffies 4294909383 (age 1186.728s) hex dump (first 32 bytes): 60 c1 17 79 00 88 ff ff 60 c1 17 79 00 88 ff ff `..y....`..y.... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81afb04d>] kmemleak_alloc+0x2d/0x60 [<ffffffff8115c6cf>] kmem_cache_alloc+0x13f/0x2b0 [<ffffffff814aaa97>] kmem_zone_alloc+0x77/0xf0 [<ffffffff814aab2e>] kmem_zone_zalloc+0x1e/0x50 [<ffffffff8147cd6b>] xfs_efi_init+0x4b/0xb0 [<ffffffff814a4ee8>] xfs_trans_get_efi+0x58/0x90 [<ffffffff81455fab>] xfs_bmap_finish+0x8b/0x1d0 [<ffffffff814851b4>] xfs_itruncate_finish+0x2c4/0x5d0 [<ffffffff814a970f>] xfs_setattr+0x8df/0xa70 [<ffffffff814b5c7b>] xfs_vn_setattr+0x1b/0x20 [<ffffffff8117dc00>] notify_change+0x170/0x2e0 [<ffffffff81163bf6>] do_truncate+0x66/0xa0 [<ffffffff81163d0b>] sys_ftruncate+0xdb/0xe0 [<ffffffff8103a002>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff The cause of the leak is that the "remove" parameter of IOP_UNPIN() is never set when a CIL push is aborted. This means that the EFI item is never freed if it was in the push being cancelled. The problem is specific to delayed logging, but has uncovered a couple of problems with the handling of IOP_UNPIN(remove). Firstly, we cannot safely call xfs_trans_del_item() from IOP_UNPIN() in the CIL commit failure path or the iclog write failure path because for delayed loging we have no transaction context. Hence we must only call xfs_trans_del_item() if the log item being unpinned has an active log item descriptor. Secondly, xfs_trans_uncommit() does not handle log item descriptor freeing during the traversal of log items on a transaction. It can reference a freed log item descriptor when unpinning an EFI item. Hence it needs to use a safe list traversal method to allow items to be removed from the transaction during IOP_UNPIN(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-01-28 09:01:33 -06:00
Linus Torvalds	b12ece7d85	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: avoid picking MDS that is not active ceph: avoid immediate cap check after import ceph: fix flushing of caps vs cap import ceph: fix erroneous cap flush to non-auth mds ceph: fix cap_wanted_delay_{min,max} mount option initialization ceph: fix xattr rbtree search ceph: fix getattr on directory when using norbytes	2011-01-28 12:12:58 +10:00
Shirish Pargaonkar	ee2c925850	cifs: More crypto cleanup (try #2 ) Replaced md4 hashing function local to cifs module with kernel crypto APIs. As a result, md4 hashing function and its supporting functions in file md4.c are not needed anymore. Cleaned up function declarations, removed forward function declarations, and removed a header file that is being deleted from being included. Verified that sec=ntlm/i, sec=ntlmv2/i, and sec=ntlmssp/i work correctly. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-27 19:58:13 +00:00
Dave Chinner	7db37c5e65	xfs: fix log ticket leak on forced shutdown. The kmemleak detector shows this after test 139: unreferenced object 0xffff880079b88bb0 (size 264): comm "xfs_io", pid 4904, jiffies 4294909382 (age 276.824s) hex dump (first 32 bytes): 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N.......... ff ff ff ff ff ff ff ff 48 7b c9 82 ff ff ff ff ........H{...... backtrace: [<ffffffff81afb04d>] kmemleak_alloc+0x2d/0x60 [<ffffffff8115c6cf>] kmem_cache_alloc+0x13f/0x2b0 [<ffffffff814aaa97>] kmem_zone_alloc+0x77/0xf0 [<ffffffff814aab2e>] kmem_zone_zalloc+0x1e/0x50 [<ffffffff8148f394>] xlog_ticket_alloc+0x34/0x170 [<ffffffff81494444>] xlog_cil_push+0xa4/0x3f0 [<ffffffff81494eca>] xlog_cil_force_lsn+0x15a/0x160 [<ffffffff814933a5>] _xfs_log_force_lsn+0x75/0x2d0 [<ffffffff814a264d>] _xfs_trans_commit+0x2bd/0x2f0 [<ffffffff8148bfdd>] xfs_iomap_write_allocate+0x1ad/0x350 [<ffffffff814ac17f>] xfs_map_blocks+0x21f/0x370 [<ffffffff814ad1b7>] xfs_vm_writepage+0x1c7/0x550 [<ffffffff8112200a>] __writepage+0x1a/0x50 [<ffffffff81122df2>] write_cache_pages+0x1c2/0x4c0 [<ffffffff81123117>] generic_writepages+0x27/0x30 [<ffffffff814aba5d>] xfs_vm_writepages+0x5d/0x80 By inspection, the leak occurs when xlog_write() returns and error and we jump to the abort path without dropping the reference on the active ticket. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>	2011-01-27 12:02:00 +11:00
Li Zefan	4d728ec7ae	Btrfs: Fix file clone when source offset is not 0 Suppose: - the source extent is: [0, 100] - the src offset is 10 - the clone length is 90 - the dest offset is 0 This statement: new_key.offset = key.offset + destoff - off will produce such an extent for the dest file: [ino, BTRFS_EXTENT_DATA_KEY, -10] , which is obviously wrong. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-01-27 01:11:18 +08:00
Miao Xie	b897abec03	Btrfs: Fix memory leak in writepage fixup work fixup, which is allocated when starting page write to fix up the extent without ORDERED bit set, should be freed after this work is done. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-01-27 01:10:30 +08:00
Miao Xie	d0f69686c2	Btrfs: Don't return acl info when mounting with noacl option Steps to reproduce: # mkfs.btrfs /dev/sda2 # mount /dev/sda2 /mnt # touch /mnt/file0 # setfacl -m 'u:root:x,g::x,o::x' /mnt/file0 # umount /mnt # mount /dev/sda2 -o noacl /mnt # getfacl /mnt/file0 ... user::rw- user:root:--x group::--x mask::--x other::--x The output should be: user::rw- group::--x other::--x Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-01-27 01:05:16 +08:00
Tero Roponen	3f3d0bc0df	Btrfs: Free correct pointer after using strsep We must save and free the original kstrdup()'ed pointer because strsep() modifies its first argument. Signed-off-by: Tero Roponen <tero.roponen@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-01-27 01:05:11 +08:00
Ian Kent	bdc924bb4c	Btrfs: Fix memory leak on finding existing super We missed a memory deallocation in commit `450ba0ea`. If an existing super block is found at mount and there is no error condition then the pre-allocated tree_root and fs_info are no not used and are not freeded. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-01-27 01:05:07 +08:00
Li Zefan	83a4d54840	Btrfs: Fix memory leak at umount fs_info, which is allocated in open_ctree(), should be freed in close_ctree(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-01-27 01:05:02 +08:00
Li Zefan	f333adb5d6	btrfs: Check mergeable free space when removing a cluster After returing extents from a cluster to the block group, some extents in the block group may be mergeable. Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-01-27 01:04:57 +08:00
Li Zefan	120d66eec0	btrfs: Add a helper try_merge_free_space() When adding a new extent, we'll firstly see if we can merge this extent to the left or/and right extent. Extract this as a helper try_merge_free_space(). As a side effect, we fix a small bug that if the new extent has non-bitmap left entry but is unmergeble, we'll directly link the extent without trying to drop it into bitmap. This also prepares for the next patch. Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-01-27 01:04:50 +08:00
Li Zefan	5e71b5d5ec	btrfs: Update stats when allocating from a cluster When allocating extent entry from a cluster, we should update the free_space and free_extents fields of the block group. Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-01-27 01:04:46 +08:00
Li Zefan	70b7da304f	btrfs: Free fully occupied bitmap in cluster If there's no more free space in a bitmap, we should free it. Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-01-27 01:04:41 +08:00
Li Zefan	edf6e2d1dd	btrfs: Add helper function free_bitmap() Remove some duplicated code. This prepares for the next patch. Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-01-27 01:04:37 +08:00
Li Zefan	8eb2d829ff	btrfs: Fix threshold calculation for block groups smaller than 1GB If a block group is smaller than 1GB, the extent entry threadhold calculation will always set the threshold to 0. So as free space gets fragmented, btrfs will switch to use bitmap to manage free space, but then will never switch back to extents due to this bug. Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-01-27 01:04:31 +08:00
Tejun Heo	d37adaa159	fs/aio: aio_wq isn't used in memory reclaim path aio_wq isn't used during memory reclaim. Convert to alloc_workqueue() without WQ_MEM_RECLAIM. It's possible to use system_wq but given that the number of work items is determined from userland and the work item may block, enforcing strict concurrency limit would be a good idea. Also, move fput_work to system_wq so that aio_wq is used soley to throttle the max concurrency of aio work items and fput_work doesn't interact with other work items. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: linux-aio@kvack.org	2011-01-26 17:42:27 +01:00
Andy Adamson	c7a360b05b	NFS construct consistent co_ownerid for v4.1 As stated in section 2.4 of RFC 5661, subsequent instances of the client need to present the same co_ownerid. Concatinate the client's IP dot address, host name, and the rpc_auth pseudoflavor to form the co_ownerid. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 22:49:14 -05:00
Torben Hohn	ac751efa6a	console: rename acquire/release_console_sem() to console_lock/unlock() The -rt patches change the console_semaphore to console_mutex. As a result, a quite large chunk of the patches changes all acquire/release_console_sem() to acquire/release_console_mutex() This commit makes things use more neutral function names which dont make implications about the underlying lock. The only real change is the return value of console_trylock which is inverted from try_acquire_console_sem() This patch also paves the way to switching console_sem from a semaphore to a mutex. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: make console_trylock return 1 on success, per Geert] Signed-off-by: Torben Hohn <torbenh@gmx.de> Cc: Thomas Gleixner <tglx@tglx.de> Cc: Greg KH <gregkh@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-26 10:50:06 +10:00
Phillip Lougher	3689456b4b	squashfs: fix use of uninitialised variable in zlib & xz decompressors Fix potential use of uninitialised variable caused by recent decompressor code optimisations. In zlib_uncompress (zlib_wrapper.c) we have int zlib_err, zlib_init = 0; ... do { ... if (avail == 0) { offset = 0; put_bh(bh[k++]); continue; } ... zlib_err = zlib_inflate(stream, Z_SYNC_FLUSH); ... } while (zlib_err == Z_OK); If continue is executed (avail == 0) then the while condition will be evaluated testing zlib_err, which is uninitialised first time around the loop. Fix this by getting rid of the 'if (avail == 0)' condition test, this edge condition should not be being handled in the decompressor code, and instead handle it generically in the caller code. Similarly for xz_wrapper.c. Incidentally, on most architectures (bar Mips and Parisc), no uninitialised variable warning is generated by gcc, this is because the while condition test on continue is optimised out and not performed (when executing continue zlib_err has not been changed since entering the loop, and logically if the while condition was true previously, then it's still true). Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk> Reported-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-26 10:50:05 +10:00
Linus Torvalds	3af03655e8	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix crash after one superblock became unavailable	2011-01-26 09:03:36 +10:00
Trond Myklebust	27dc1cd3ad	NFS: nfs_wcc_update_inode() should set nfsi->attr_gencount If the call to nfs_wcc_update_inode() results in an attribute update, we need to ensure that the inode's attr_gencount gets bumped too, otherwise we are not protected against races with other GETATTR calls. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:28:21 -05:00
Andy Adamson	b2a2897dc4	NFS improve pnfs_put_deviceid_cache debug print What we really want to know is the ref count. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:26:51 -05:00
Andy Adamson	2c4cdf8f6d	NFS fix cb_sequence error processing Always assign the cb_process_state nfs_client pointer so a processing error in cb_sequence after the nfs_client is found and referenced returns a non-NULL cb_process_state nfs_client and the matching nfs_put_client in nfs4_callback_compound dereferences the client. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:26:51 -05:00
Andy Adamson	778be232a2	NFS do not find client in NFSv4 pg_authenticate The information required to find the nfs_client cooresponding to the incoming back channel request is contained in the NFS layer. Perform minimal checking in the RPC layer pg_authenticate method, and push more detailed checking into the NFS layer where the nfs_client can be found. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:26:51 -05:00
Chuck Lever	80c30e8de4	NLM: Fix "kernel BUG at fs/lockd/host.c:417!" or ".../host.c:283!" Nick Bowler <nbowler@elliptictech.com> reports: > We were just having some NFS server troubles, and my client machine > running 2.6.38-rc1+ (specifically, commit `2b1caf6ed7`) crashed > hard (syslog output appended to this mail). > > I'm not sure what the exact timeline was or how to reproduce this, > but the server was rebooted during all this. Since I've never seen > this happen before, it is possibly a regression from previous kernel > releases. However, I recently updated my nfs-utils (on the client) to > version 1.2.3, so that might be related as well. [ BUG output redacted ] When done searching, the for_each_host loop in next_host_state() falls through and returns the final host on the host chain without bumping it's reference count. Since the host's ref count is only one at that point, releasing the host in nlm_host_rebooted() attempts to destroy the host prematurely, and therefore hits a BUG(). Likely, the original intent of the for_each_host behavior in next_host_state() was to handle the case when the host chain is empty. Searching the chain and finding no suitable host to return needs to be handled as well. Defensively restructure next_host_state() always to return NULL when the loop falls through. Introduced by commit `b10e30f6` "lockd: reorganize nlm_host_rebooted". Cc: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:24:47 -05:00
Chuck Lever	f61f6da0d5	NFS: Prevent memory allocation failure in nfsacl_encode() nfsacl_encode() allocates memory in certain cases. This of course is not guaranteed to work. Since commit `9f06c719` "SUNRPC: New xdr_streams XDR encoder API", the kernel's XDR encoders can't return a result indicating possibly a failure, so a memory allocation failure in nfsacl_encode() has become fatal (ie, the XDR code Oopses) in some cases. However, the allocated memory is a tiny fixed amount, on the order of 40-50 bytes. We can easily use a stack-allocated buffer for this, with only a wee bit of nose-holding. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:24:47 -05:00
Chuck Lever	731f3f482a	NFS: nfsacl_{encode,decode} should return signed integer Clean up. The nfsacl_encode() and nfsacl_decode() functions return negative errno values, and each call site verifies that the returned value is not negative. Change the synopsis of both of these functions to reflect this usage. Document the synopsis and return values. Reported-by: Trond Myklebust <trond.myklebust@netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:24:47 -05:00
Chuck Lever	ee5dc7732b	NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!" Milan Broz <mbroz@redhat.com> reports: > on today Linus' tree I get OOps if using nfs. > > server (2.6.36) exports dir: > /dir 172.16.1.0/24(rw,async,all_squash,no_subtree_check,anonuid=500,anongid=500) > > on client it is mounted in fstab > server:/dir /mnt/tst nfs rw,soft 0 0 > > and these commands OOpses it (simplified from a configure script): > > cd /dir > touch x > install x y > > [ 105.327701] ------------[ cut here ]------------ > [ 105.327979] kernel BUG at fs/nfs/nfs3xdr.c:1338! > [ 105.328075] invalid opcode: 0000 [#1] PREEMPT SMP > [ 105.328223] last sysfs file: /sys/devices/virtual/bdi/0:16/uevent > [ 105.328349] Modules linked in: usbcore dm_mod > [ 105.328553] > [ 105.328678] Pid: 3710, comm: install Not tainted 2.6.37+ #423 440BX Desktop Reference Platform/VMware Virtual Platform > [ 105.328853] EIP: 0060:[<c116c06c>] EFLAGS: 00010282 CPU: 0 > [ 105.329152] EIP is at nfs3_xdr_enc_setacl3args+0x61/0x98 > [ 105.329249] EAX: ffffffea EBX: ce941d98 ECX: 00000000 EDX: 00000004 > [ 105.329340] ESI: ce941cd0 EDI: 000000a4 EBP: ce941cc0 ESP: ce941cb4 > [ 105.329431] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > [ 105.329525] Process install (pid: 3710, ti=ce940000 task=ced36f20 task.ti=ce940000) > [ 105.336600] Stack: > [ 105.336693] ce941cd0 ce9dc000 00000000 ce941cf8 c12ecd02 c12f43e0 c116c00b cf754158 > [ 105.336982] ce9dc004 cf754284 ce9dc004 cf7ffee8 ceff9978 ce9dc000 cf7ffee8 ce9dc000 > [ 105.337182] ce9dc000 ce941d14 c12e698d cf75412c ce941d98 cf7ffee8 cf7fff20 00000000 > [ 105.337405] Call Trace: > [ 105.337695] [<c12ecd02>] rpcauth_wrap_req+0x75/0x7f > [ 105.337806] [<c12f43e0>] ? xdr_encode_opaque+0x12/0x15 > [ 105.337898] [<c116c00b>] ? nfs3_xdr_enc_setacl3args+0x0/0x98 > [ 105.337988] [<c12e698d>] call_transmit+0x17e/0x1e8 > [ 105.338072] [<c12ec307>] __rpc_execute+0x6d/0x1a6 > [ 105.338155] [<c12ec474>] rpc_execute+0x34/0x37 > [ 105.338235] [<c12e738d>] rpc_run_task+0xb5/0xbd > [ 105.338316] [<c12e7474>] rpc_call_sync+0x3d/0x58 > [ 105.338402] [<c116d0c6>] nfs3_proc_setacls+0x18e/0x24f > [ 105.338493] [<c10b3f76>] ? __kmalloc+0x148/0x1c4 > [ 105.338579] [<c10ecd01>] ? posix_acl_alloc+0x12/0x22 > [ 105.338665] [<c116d5c8>] nfs3_proc_setacl+0xa0/0xca > [ 105.338748] [<c116d69c>] nfs3_setxattr+0x62/0x88 > [ 105.338834] [<c1317042>] ? sub_preempt_count+0x7c/0x89 > [ 105.338926] [<c116d63a>] ? nfs3_setxattr+0x0/0x88 > [ 105.339026] [<c10cfa79>] __vfs_setxattr_noperm+0x26/0x95 > [ 105.339114] [<c10cfb43>] vfs_setxattr+0x5b/0x76 > [ 105.339211] [<c10cfbfb>] setxattr+0x9d/0xc3 > [ 105.339298] [<c10a2ea8>] ? handle_pte_fault+0x258/0x5cb > [ 105.339428] [<c1091ff6>] ? __free_pages+0x1a/0x23 > [ 105.339517] [<c10498ea>] ? up_read+0x16/0x2c > [ 105.339599] [<c10b8365>] ? fget+0x0/0xa3 > [ 105.339677] [<c10b8365>] ? fget+0x0/0xa3 > [ 105.339760] [<c1025d23>] ? get_parent_ip+0xb/0x31 > [ 105.339843] [<c1317042>] ? sub_preempt_count+0x7c/0x89 > [ 105.339931] [<c10cfc72>] sys_fsetxattr+0x51/0x79 > [ 105.340014] [<c1002853>] sysenter_do_call+0x12/0x32 > [ 105.340133] Code: 2e 76 18 00 58 31 d2 8b 7f 28 f6 43 04 01 74 03 8b 53 08 6a 00 8b 46 04 6a 01 8b 0b 52 89 fa e8 85 10 f8 ff 83 c4 0c 85 c0 79 04 <0f> 0b eb fe 31 c9 f6 43 04 04 74 03 8b 4b 0c 68 00 10 00 00 8d > [ 105.350321] EIP: [<c116c06c>] nfs3_xdr_enc_setacl3args+0x61/0x98 SS:ESP 0068:ce941cb4 > [ 105.364385] ---[ end trace 01fcfe7f0f7f6e4a ]--- nfs3_xdr_enc_setacl3args() is not properly setting up the target buffer before nfsacl_encode() attempts to encode the ACL. Introduced by commit `d9c407b1` "NFS: Introduce new-style XDR encoding functions for NFSv3." Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:24:47 -05:00
Chuck Lever	839f7ad693	NFS: Fix "kernel BUG at fs/aio.c:554!" Nick Piggin reports: > I'm getting use after frees in aio code in NFS > > [ 2703.396766] Call Trace: > [ 2703.396858] [<ffffffff8100b057>] ? native_sched_clock+0x27/0x80 > [ 2703.396959] [<ffffffff8108509e>] ? put_lock_stats+0xe/0x40 > [ 2703.397058] [<ffffffff81088348>] ? lock_release_holdtime+0xa8/0x140 > [ 2703.397159] [<ffffffff8108a2a5>] lock_acquire+0x95/0x1b0 > [ 2703.397260] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60 > [ 2703.397361] [<ffffffff81039701>] ? get_parent_ip+0x11/0x50 > [ 2703.397464] [<ffffffff81612a31>] _raw_spin_lock_irq+0x41/0x80 > [ 2703.397564] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60 > [ 2703.397662] [<ffffffff811627db>] aio_put_req+0x2b/0x60 > [ 2703.397761] [<ffffffff811647fe>] do_io_submit+0x2be/0x7c0 > [ 2703.397895] [<ffffffff81164d0b>] sys_io_submit+0xb/0x10 > [ 2703.397995] [<ffffffff8100307b>] system_call_fastpath+0x16/0x1b > > Adding some tracing, it is due to nfs completing the request then > returning something other than -EIOCBQUEUED, so aio.c > also completes the request. To address this, prevent the NFS direct I/O engine from completing async iocbs when the forward path returns an error without starting any I/O. This fix appears to survive ^C during both "xfstest no. 208" and "fsx -Z." It's likely this bug has existed for a very long while, as we are seeing very similar symptoms in OEL 5. Copying stable. Cc: Stable <stable@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:24:47 -05:00
Jesper Juhl	ad3d2eedf0	NFS4: Avoid potential NULL pointer dereference in decode_and_add_ds(). On Mon, 17 Jan 2011, Mi Jinlong wrote: > > > Jesper Juhl: > > strrchr() can return NULL if nothing is found. If this happens we'll > > dereference a NULL pointer in > > fs/nfs/nfs4filelayoutdev.c::decode_and_add_ds(). > > > > I tried to find some other code that guarantees that this can never > > happen but I was unsuccessful. So, unless someone else can point to some > > code that ensures this can never be a problem, I believe this patch is > > needed. > > > > While I was changing this code I also noticed that all the dprintk() > > statements, except one, start with "%s:". The one missing the ":" I added > > it to. > > Maybe another one also should be changed at decode_and_add_ds() at line 243: > > 243 printk("%s Decoded address and port %s\n", __func__, buf); > Missed that one. Thanks. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:24:46 -05:00
Pavel Shilovsky	d39454ffe4	CIFS: Add strictcache mount option Use for switching on strict cache mode. In this mode the client reads from the cache all the time it has Oplock Level II, otherwise - read from the server. As for write - the client stores a data in the cache in Exclusive Oplock case, otherwise - write directly to the server. Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-25 19:31:38 +00:00
Pavel Shilovsky	72432ffcf5	CIFS: Implement cifs_strict_writev (try #4 ) If we don't have Exclusive oplock we write a data to the server. Also set invalidate_mapping flag on the inode if we wrote something to the server. Add cifs_iovec_write to let the client write iovec buffers through CIFSSMBWrite2. Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-25 19:30:13 +00:00
Steve French	93c100c0b4	[CIFS] Replace cifs md5 hashing functions with kernel crypto APIs Replace remaining use of md5 hash functions local to cifs module with kernel crypto APIs. Remove header and source file containing those local functions. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-25 19:28:43 +00:00
Sage Weil	d66bbd441c	ceph: avoid picking MDS that is not active Ignore replication or auth frag data if it indicates an MDS that is not active. This can happen if the MDS shuts down and the client has stale data about the namespace distribution across the MDS cluster. If that's the case, fall back to directing the request based on the auth cap (which should always be accurate). Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-25 08:16:37 -08:00
Tejun Heo	ada609ee2a	workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER WQ_RESCUER is now an internal flag and should only be used in the workqueue implementation proper. Use WQ_MEM_RECLAIM instead. This doesn't introduce any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: dm-devel@redhat.com Cc: Neil Brown <neilb@suse.de>	2011-01-25 14:35:54 +01:00
Artem Bityutskiy	944fdef52c	UBIFS: do not start the commit if there is nothing to commit This patch fixes suboptimal UBIFS 'sync_fs()' implementation which causes flash I/O even if the file-system is synchronized. E.g., a 'printk()' in the MTD erasure function (e.g., 'nand_erase_nand()') can show that for every 'sync' shell command UBIFS erases at least one eraseblock. So '$ while true; do sync; done' will cause huge amount of flash I/O. The reason for this is that UBIFS commits in 'sync_fs()', and starts the commit even if there is nothing to commit, e.g., it anyway changes the log. This patch adds a check in the 'do_commit()' UBIFS functions which prevents the commit if there is nothing to commit. Reported-by: Hans J. Koch <hjk@linutronix.de> Tested-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-01-25 10:21:13 +02:00
Linus Torvalds	c723fdab8a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: Make CIFS mount work in a container. CIFS: Remove pointless variable assignment in cifs_dfs_do_automount()	2011-01-25 14:23:54 +10:00
Rob Landley	f1d0c99865	Make CIFS mount work in a container. Teach cifs about network namespaces, so mounting uses adresses/routing visible from the container rather than from init context. A container is a chroot on steroids that changes more than just the root filesystem the new processes see. One thing containers can isolate is "network namespaces", meaning each container can have its own set of ethernet interfaces, each with its own own IP address and routing to the outside world. And if you open a socket in _userspace_ from processes within such a container, this works fine. But sockets opened from within the kernel still use a single global networking context in a lot of places, meaning the new socket's address and routing are correct for PID 1 on the host, but are _not_ what userspace processes in the container get to use. So when you mount a network filesystem from within in a container, the mount code in the CIFS driver uses the host's networking context and not the container's networking context, so it gets the wrong address, uses the wrong routing, and may even try to go out an interface that the container can't even access... Bad stuff. This patch copies the mount process's network context into the CIFS structure that stores the rest of the server information for that mount point, and changes the socket open code to use the saved network context instead of the global network context. I.E. "when you attempt to use these addresses, do so relative to THIS set of network interfaces and routing rules, not the old global context from back before we supported containers". The big long HOWTO sets up a test environment on the assumption you've never used ocntainers before. It basically says: 1) configure and build a new kernel that has container support 2) build a new root filesystem that includes the userspace container control package (LXC) 3) package/run them under KVM (so you don't have to mess up your host system in order to play with containers). 4) set up some containers under the KVM system 5) set up contradictory routing in the KVM system and the container so that the host and the container see different things for the same address 6) try to mount a CIFS share from both contexts so you can both force it to work and force it to fail. For a long drawn out test reproduction sequence, see: http://landley.livejournal.com/47024.html http://landley.livejournal.com/47205.html http://landley.livejournal.com/47476.html Signed-off-by: Rob Landley <rlandley@parallels.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-24 04:28:51 +00:00
Jesper Juhl	3f391c79b0	CIFS: Remove pointless variable assignment in cifs_dfs_do_automount() In fs/cifs/cifs_dfs_ref.c::cifs_dfs_do_automount() we have this code: ... mnt = ERR_PTR(-EINVAL); if (IS_ERR(tlink)) { mnt = ERR_CAST(tlink); goto free_full_path; } ses = tlink_tcon(tlink)->ses; rc = get_dfs_path(xid, ses, full_path + 1, cifs_sb->local_nls, &num_referrals, &referrals, cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MAP_SPECIAL_CHR); cifs_put_tlink(tlink); mnt = ERR_PTR(-ENOENT); ... The assignment of 'mnt = ERR_PTR(-EINVAL);' is completely pointless. If we take the 'if (IS_ERR(tlink))' branch we'll set 'mnt' again and we'll also do so if we do not take the branch. There is no way we'll ever use 'mnt' with the assigned 'ERR_PTR(-EINVAL)' value, so we may as well just remove the pointless assignment. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-24 03:32:01 +00:00
David Howells	821404434f	CacheFiles: Add calls to path-based security hooks Add calls to path-based security hooks into CacheFiles as, unlike inode-based security, these aren't implicit in the vfs_mkdir() and similar calls. Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2011-01-24 10:49:45 +11:00
Randy Dunlap	ff5fdb6149	fs: fix new dcache.c kernel-doc warnings Fix new fs/dcache.c kernel-doc warnings: Warning(fs/dcache.c:184): No description found for parameter 'dentry' Warning(fs/dcache.c:296): No description found for parameter 'parent' Warning(fs/dcache.c:1985): No description found for parameter 'dparent' Warning(fs/dcache.c:1985): Excess function parameter 'parent' description in 'd_validate' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-22 20:32:38 -08:00
Ryusuke Konishi	0ca7a5b9ac	nilfs2: fix crash after one superblock became unavailable Fixes the following kernel oops in nilfs_setup_super() which could arise if one of two super-blocks is unavailable. > BUG: unable to handle kernel NULL pointer dereference at (null) > Pid: 3529, comm: mount.nilfs2 Not tainted 2.6.37 #1 / > EIP: 0060:[<c03196bc>] EFLAGS: 00010202 CPU: 3 > EIP is at memcpy+0xc/0x1b > Call Trace: > [<f953720e>] ? nilfs_setup_super+0x6c/0xa5 [nilfs2] > [<f95369e9>] ? nilfs_get_root_dentry+0x81/0xcb [nilfs2] > [<f9537a08>] ? nilfs_mount+0x4f9/0x62c [nilfs2] > [<c02745cf>] ? kstrdup+0x36/0x3f > [<f953750f>] ? nilfs_mount+0x0/0x62c [nilfs2] > [<c0293940>] ? vfs_kern_mount+0x4d/0x12c > [<c02a5100>] ? get_fs_type+0x76/0x8f > [<c0293a68>] ? do_kern_mount+0x33/0xbf > [<c02a784a>] ? do_mount+0x2ed/0x714 > [<c02a6171>] ? copy_mount_options+0x28/0xfc > [<c02a7ce3>] ? sys_mount+0x72/0xaf > [<c0473085>] ? syscall_call+0x7/0xb Reported-by: Wakko Warner <wakko@animx.eu.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Wakko Warner <wakko@animx.eu.org> Cc: stable <stable@kernel.org> [2.6.37, 2.6.36] LKML-Reference: <20110121024918.GA29598@animx.eu.org>	2011-01-22 15:22:36 +09:00
Linus Torvalds	9093ba53b7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: fix up CIFSSMBEcho for unaligned access cifs: fix unaligned accesses in cifsConvertToUCS cifs: clean up unaligned accesses in cifs_unicode.c cifs: fix unaligned access in check2ndT2 and coalesce_t2 cifs: clean up unaligned accesses in validate_t2 cifs: use get/put_unaligned functions to access ByteCount cifs: move time field in cifsInodeInfo cifs: TCP_Server_Info diet CIFS: Implement cifs_strict_readv (try #4) CIFS: Implement cifs_file_strict_mmap (try #2) CIFS: Implement cifs_strict_fsync CIFS: Make cifsFileInfo_put work with strict cache mode	2011-01-21 13:44:07 -08:00
Linus Torvalds	4843456c5c	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: quota: Fix deadlock during path resolution	2011-01-21 07:33:37 -08:00
Tao Ma	b8d6568a12	ext4: Fix comment typo "especiially". Change "especiially" to "especially". Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-01-21 16:27:01 +01:00
Steven Whitehouse	75d5cfbe4b	GFS2: Post-VFS scale update for RCU path walk We can allow a few more cases to use RCU path walking than originally allowed. It should be possible to also enable RCU path walking when the glock is already cached. Thats a bit more complicated though, so left for a future patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Nick Piggin <npiggin@gmail.com>	2011-01-21 09:39:24 +00:00
Steven Whitehouse	bc015cb841	GFS2: Use RCU for glock hash table This has a number of advantages: - Reduces contention on the hash table lock - Makes the code smaller and simpler - Should speed up glock dumps when under load - Removes ref count changing in examine_bucket - No longer need hash chain lock in glock_put() in common case There are some further changes which this enables and which we may do in the future. One is to look at using SLAB_RCU, and another is to look at using a per-cpu counter for the per-sb glock counter, since that is touched twice in the lifetime of each glock (but only used at umount time). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-01-21 09:39:08 +00:00
Jeff Layton	99d86c8f1b	cifs: fix up CIFSSMBEcho for unaligned access Make sure that CIFSSMBEcho can handle unaligned fields. Also fix a minor bug that causes this warning: fs/cifs/cifssmb.c: In function 'CIFSSMBEcho': fs/cifs/cifssmb.c:740: warning: large integer implicitly truncated to unsigned type ...WordCount is u8, not __le16, so no need to convert it. This patch should apply cleanly on top of the rest of the patchset to clean up unaligned access. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-21 02:23:27 +00:00
Steve French	bf67b9be97	Merge branch 'for-next'	2011-01-21 02:19:30 +00:00
Linus Torvalds	8d99641f6c	Merge branch 'akpm' * akpm: kernel/smp.c: consolidate writes in smp_call_function_interrupt() kernel/smp.c: fix smp_call_function_many() SMP race memcg: correctly order reading PCG_USED and pc->mem_cgroup backlight: fix 88pm860x_bl macro collision drivers/leds/ledtrig-gpio.c: make output match input, tighten input checking MAINTAINERS: update Atmel AT91 entry mm: fix truncate_setsize() comment memcg: fix rmdir, force_empty with THP memcg: fix LRU accounting with THP memcg: fix USED bit handling at uncharge in THP memcg: modify accounting function for supporting THP better fs/direct-io.c: don't try to allocate more than BIO_MAX_PAGES in a bio mm: compaction: prevent division-by-zero during user-requested compaction mm/vmscan.c: remove duplicate include of compaction.h memblock: fix memblock_is_region_memory() thp: keep highpte mapped until it is no longer needed kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT	2011-01-20 17:02:14 -08:00
David Dillow	20d9600cb4	fs/direct-io.c: don't try to allocate more than BIO_MAX_PAGES in a bio When using devices that support max_segments > BIO_MAX_PAGES (256), direct IO tries to allocate a bio with more pages than allowed, which leads to an oops in dio_bio_alloc(). Clamp the request to the supported maximum, and change dio_bio_alloc() to reflect that bio_alloc() will always return a bio when called with __GFP_WAIT and a valid number of vectors. [akpm@linux-foundation.org: remove redundant BUG_ON()] Signed-off-by: David Dillow <dillowda@ornl.gov> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-20 17:02:05 -08:00
David Rientjes	6a108a14fa	kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option is used to configure any non-standard kernel with a much larger scope than only small devices. This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes references to the option throughout the kernel. A new CONFIG_EMBEDDED option is added that automatically selects CONFIG_EXPERT when enabled and can be used in the future to isolate options that should only be considered for embedded systems (RISC architectures, SLOB, etc). Calling the option "EXPERT" more accurately represents its intention: only expert users who understand the impact of the configuration changes they are making should enable it. Reviewed-by: Ingo Molnar <mingo@elte.hu> Acked-by: David Woodhouse <david.woodhouse@intel.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Greg KH <gregkh@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jens Axboe <axboe@kernel.dk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Robin Holt <holt@sgi.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-20 17:02:05 -08:00
Linus Torvalds	5cdec1fca2	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: mangle existing header for SMB_COM_NT_CANCEL cifs: remove code for setting timeouts on requests [CIFS] cifs: reconnect unresponsive servers cifs: set up recurring workqueue job to do SMB echo requests cifs: add ability to send an echo request cifs: add cifs_call_async cifs: allow for different handling of received response cifs: clean up sync_mid_result cifs: don't reconnect server when we don't get a response cifs: wait indefinitely for responses cifs: Use mask of ACEs for SID Everyone to calculate all three permissions user, group, and other cifs: Fix regression during share-level security mounts (Repost) [CIFS] Update cifs version number cifs: move mid result processing into common function cifs: move locked sections out of DeleteMidQEntry and AllocMidQEntry cifs: clean up accesses to midCount cifs: make wait_for_free_request take a TCP_Server_Info pointer cifs: no need to mark smb_ses_list as cifs_demultiplex_thread is exiting cifs: don't fail writepages on -EAGAIN errors CIFS: Fix oplock break handling (try #2)	2011-01-20 16:36:38 -08:00
Linus Torvalds	b06ae1ead2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Fix error path in gfs2_lookup_by_inum() GFS2: remove iopen glocks from cache on failed deletes	2011-01-20 16:29:04 -08:00
Linus Torvalds	28e58ee8ce	Fix broken "pipe: use event aware wakeups" optimization Commit `e462c448fd` ("pipe: use event aware wakeups") optimized the pipe event wakeup calls to avoid wakeups if the events do not match the requested set. However, the optimization was buggy, in that it didn't actually use the correct sets for the events: when we make room for more data to be written, the pipe poll() routine will return both the POLLOUT _and_ POLLWRNORM bits. Similarly for read. And most critically, when a pipe is released, that will potentially result in POLLHUP\|POLLERR (depending on whether it was the last reader or writer), not just the regular POLLIN\|POLLOUT. This bug showed itself as a hung gnome-screensaver-dialog process, stuck forever (or at least until it was poked by a signal or by being traced) in a poll() system call. Cc: Davide Libenzi <davidel@xmailserver.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-20 16:21:59 -08:00
Jeff Layton	84cdf74e80	cifs: fix unaligned accesses in cifsConvertToUCS Move cifsConvertToUCS to cifs_unicode.c where all of the other unicode related functions live. Have it store mapped characters in 'temp' and then use put_unaligned_le16 to copy it to the target buffer. Also fix the comments to match kernel coding style. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-20 21:48:00 +00:00
Jeff Layton	ba2dbf30df	cifs: clean up unaligned accesses in cifs_unicode.c Make sure we use get/put_unaligned routines when accessing wide character strings. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-20 21:47:57 +00:00
Jeff Layton	26ec254869	cifs: fix unaligned access in check2ndT2 and coalesce_t2 Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-20 21:47:54 +00:00
Jeff Layton	12df83c9b9	cifs: clean up unaligned accesses in validate_t2 ...and clean up function to reduce indentation. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-20 21:46:33 +00:00
Jeff Layton	690c522fa5	cifs: use get/put_unaligned functions to access ByteCount It's possible that when we access the ByteCount that the alignment will be off. Most CPUs deal with that transparently, but there's usually some performance impact. Some CPUs raise an exception on unaligned accesses. Fix this by accessing the byte count using the get_unaligned and put_unaligned inlined functions. While we're at it, fix the types of some of the variables that end up getting returns from these functions. Acked-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-20 21:46:29 +00:00
Jeff Layton	aae62fdb6b	cifs: move time field in cifsInodeInfo ...and remove length qualifiers from bools. Before: /* size: 1176, cachelines: 19, members: 13 / / sum members: 1165, holes: 2, sum holes: 11 / / bit holes: 1, sum bit holes: 4 bits / / last cacheline: 24 bytes / After: / size: 1168, cachelines: 19, members: 13 / / last cacheline: 16 bytes */ ...savings of 8 bytes per inode. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-20 21:46:26 +00:00
Jeff Layton	c3dccf4817	cifs: TCP_Server_Info diet Remove fields that are completely unused, and rearrange struct according to recommendations by "pahole". Before: /* size: 1112, cachelines: 18, members: 49 / / sum members: 1086, holes: 8, sum holes: 26 / / bit holes: 1, sum bit holes: 7 bits / / last cacheline: 24 bytes / After: / size: 1072, cachelines: 17, members: 42 / / sum members: 1065, holes: 3, sum holes: 7 / / last cacheline: 48 bytes */ ...savings of 40 bytes per struct on x86_64. 21 bytes by field removal, and 19 by reorganizing to eliminate holes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-20 21:46:22 +00:00
Pavel Shilovsky	a70307eeeb	CIFS: Implement cifs_strict_readv (try #4 ) Read from the cache if we have at least Level II oplock - otherwise read from the server. Add cifs_user_readv to let the client read into iovec buffers. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-20 21:42:29 +00:00
Pavel Shilovsky	7a6a19b17a	CIFS: Implement cifs_file_strict_mmap (try #2 ) Invalidate inode mapping if we don't have at least Level II oplock. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-20 21:42:25 +00:00
Pavel Shilovsky	8be7e6ba14	CIFS: Implement cifs_strict_fsync Invalidate inode mapping if we don't have at least Level II oplock in cifs_strict_fsync. Also remove filemap_write_and_wait call from cifs_fsync because it is previously called from vfs_fsync_range. Add file operations' structures for strict cache mode. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-20 21:42:21 +00:00
Pavel Shilovsky	4f8ba8a0c0	CIFS: Make cifsFileInfo_put work with strict cache mode On strict cache mode when we close the last file handle of the inode we should set invalid_mapping flag on this inode to prevent data coherency problem when we open it again but it has been modified on the server. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-20 21:42:17 +00:00
Jeff Layton	76dcc26f1d	cifs: mangle existing header for SMB_COM_NT_CANCEL The NT_CANCEL command looks just like the original command, except for a few small differences. The send_nt_cancel function however currently takes a tcon, which we don't have in SendReceive and SendReceive2. Instead of "respinning" the entire header for an NT_CANCEL, just mangle the existing header by replacing just the fields we need. This means we don't need a tcon and allows us to call it from other places. Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-20 18:08:36 +00:00
Jeff Layton	7749981ec3	cifs: remove code for setting timeouts on requests Since we don't time out individual requests anymore, remove the code that we used to use for setting timeouts on different requests. Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-20 18:07:55 +00:00
Steve French	fda3594362	[CIFS] cifs: reconnect unresponsive servers If the server isn't responding to echoes, we don't want to leave tasks hung waiting for it to reply. At that point, we'll want to reconnect so that soft mounts can return an error to userspace quickly. If the client hasn't received a reply after a specified number of echo intervals, assume that the transport is down and attempt to reconnect the socket. The number of echo_intervals to wait before attempting to reconnect is tunable via a module parameter. Setting it to 0, means that the client will never attempt to reconnect. The default is 5. Signed-off-by: Jeff Layton <jlayton@redhat.com>	2011-01-20 18:06:34 +00:00
Jeff Layton	c74093b694	cifs: set up recurring workqueue job to do SMB echo requests Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-20 17:48:10 +00:00
Jeff Layton	766fdbb57f	cifs: add ability to send an echo request Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-20 17:46:44 +00:00
Jeff Layton	a6827c184e	cifs: add cifs_call_async Add a function that will send a request, and set up the mid for an async reply. Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-20 17:46:04 +00:00
Jeff Layton	2b84a36c55	cifs: allow for different handling of received response In order to incorporate async requests, we need to allow for a more general way to do things on receive, rather than just waking up a process. Turn the task pointer in the mid_q_entry into a callback function and a generic data pointer. When a response comes in, or the socket is reconnected, cifsd can call the callback function in order to wake up the process. The default is to just wake up the current process which should mean no change in behavior for existing code. Also, clean up the locking in cifs_reconnect. There doesn't seem to be any need to hold both the srv_mutex and GlobalMid_Lock when walking the list of mids. Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-20 17:43:59 +00:00
Jeff Layton	74dd92a881	cifs: clean up sync_mid_result Make it use a switch statement based on the value of the midStatus. If the resp_buf is set, then MID_RESPONSE_RECEIVED is too. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-20 17:41:02 +00:00
Jeff Layton	dad255b182	cifs: don't reconnect server when we don't get a response We only want to force a reconnect to the server under very limited and specific circumstances. Now that we have processes waiting indefinitely for responses, we shouldn't reach this point unless a reconnect is already in process. Thus, there's no reason to re-mark the server for reconnect here. Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de> Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-20 17:08:50 +00:00
Jeff Layton	0ade640e9c	cifs: wait indefinitely for responses The client should not be timing out on individual SMB requests. Too much of the state between client and server is tied to the state of the socket. If we time out requests and issue spurious disconnects then that comprimises data integrity. Instead of doing this complicated dance where we try to decide how long to wait for a response for particular requests, have the client instead wait indefinitely for a response. Also, use a TASK_KILLABLE sleep here so that fatal signals will break out of this waiting. Later patches will add support for detecting dead peers and forcing reconnects based on that. Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de> Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-20 17:07:49 +00:00
Shirish Pargaonkar	2fbc2f1729	cifs: Use mask of ACEs for SID Everyone to calculate all three permissions user, group, and other If a DACL has entries for ACEs for SID Everyone and Authenticated Users, factor in mask in respective entries during calculation of permissions for all three, user, group, and other. http://technet.microsoft.com/en-us/library/bb463216.aspx Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-19 21:25:58 +00:00
Fred Isaman	0da2a4ac33	NFS: fix handling of malloc failure during nfs_flush_multi() Cleanup of the allocated list entries should not call put_nfs_open_context() on each entry, as the context will always be NULL, causing an oops. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-19 15:37:49 -05:00
Shirish Pargaonkar	540b2e3777	cifs: Fix regression during share-level security mounts (Repost) NTLM response length was changed to 16 bytes instead of 24 bytes that are sent in Tree Connection Request during share-level security share mounts. Revert it back to 24 bytes. Reported-and-Tested-by: Grzegorz Ozanski <grzegorz.ozanski@intel.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Acked-by: Suresh Jayaraman <sjayaraman@suse.de> Cc: stable@kernel.org Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-19 18:11:18 +00:00
Steve French	1cd3508d5e	[CIFS] Update cifs version number Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-19 17:53:44 +00:00
Jeff Layton	053d503445	cifs: move mid result processing into common function Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de> Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-19 17:52:45 +00:00
Jeff Layton	ddc8cf8fc7	cifs: move locked sections out of DeleteMidQEntry and AllocMidQEntry In later patches, we're going to need to have finer-grained control over the addition and removal of these structs from the pending_mid_q and we'll need to be able to call the destructor while holding the spinlock. Move the locked sections out of both routines and into the callers. Fix up current callers of DeleteMidQEntry to call a new routine that dequeues the entry and then destroys it. Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de> Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-19 17:52:42 +00:00
Jeff Layton	8097531a5c	cifs: clean up accesses to midCount It's an atomic_t and the code accesses the "counter" field in it directly instead of using atomic_read(). It also is sometimes accessed under a spinlock and sometimes not. Move it out of the spinlock since we don't need belt-and-suspenders for something that's just informational. Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de> Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-19 17:52:38 +00:00
Jeff Layton	c5797a945c	cifs: make wait_for_free_request take a TCP_Server_Info pointer The cifsSesInfo pointer is only used to get at the server. Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de> Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-19 17:52:30 +00:00
Jeff Layton	9d78315b03	cifs: no need to mark smb_ses_list as cifs_demultiplex_thread is exiting The TCP_Server_Info is refcounted and every SMB session holds a reference to it. Thus, smb_ses_list is always going to be empty when cifsd is coming down. This is dead code. Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de> Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-19 17:52:30 +00:00
Jeff Layton	941b853d77	cifs: don't fail writepages on -EAGAIN errors If CIFSSMBWrite2 returns -EAGAIN, then the error should be considered temporary. CIFS should retry the write instead of setting an error on the mapping and returning. For WB_SYNC_ALL, just retry the write immediately. In the WB_SYNC_NONE case, call redirty_page_for_writeback on all of the pages that didn't get written out and then move on. Also, fix up the handling of a short write with a successful return code. MS-CIFS says that 0 bytes_written means ENOSPC or EFBIG. It doesn't mention what a short, but non-zero write means, so for now treat it as we would an -EAGAIN return. Reviewed-by: Suresh Jayaraman <sjayaraman@suse.de> Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-19 17:52:29 +00:00
Pavel Shilovsky	12fed00de9	CIFS: Fix oplock break handling (try #2 ) When we get oplock break notification we should set the appropriate value of OplockLevel field in oplock break acknowledge according to the oplock level held by the client in this time. As we only can have level II oplock or no oplock in the case of oplock break, we should be aware only about clientCanCacheRead field in cifsInodeInfo structure. Also fix bug connected with wrong interpretation of OplockLevel field during oplock break notification processing. Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-19 17:52:29 +00:00
Sage Weil	7e57b81c76	ceph: avoid immediate cap check after import The NODELAY flag avoids the heuristics that delay cap (issued/wanted) release. There's no reason for that after we import a cap, and it kills whatever benefit we get from those delays. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-19 09:23:26 -08:00
Sage Weil	088b3f5e9e	ceph: fix flushing of caps vs cap import If we are mid-flush and a cap is migrated to another node, we need to resend the cap flush message to the new MDS, and do so with the original flush_seq to avoid leaking across a sync boundary. Previously we didn't redo the flush (we only flushed newly dirty data), which would cause a later sync to hang forever. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-19 09:23:25 -08:00
Sage Weil	24be0c4810	ceph: fix erroneous cap flush to non-auth mds The int flushing is global and not clear on each iteration of the loop, which can cause a second flush of caps to any MDSs with ids greater than the auth. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-19 09:23:24 -08:00
Sage Weil	50aac4fec5	ceph: fix cap_wanted_delay_{min,max} mount option initialization These were initialized to 0 instead of the default, fallout from the RBD refactor in `3d14c5d2b6`. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-19 09:23:22 -08:00
Jesper Juhl	42b16b3fbb	Kill off warning: ‘inline’ is not at beginning of declaration Fix a bunch of warning: ‘inline’ is not at beginning of declaration messages when building a 'make allyesconfig' kernel with -Wextra. These warnings are trivial to kill, yet rather annoying when building with -Wextra. The more we can cut down on pointless crap like this the better (IMHO). A previous patch to do this for a 'allnoconfig' build has already been merged. This just takes the cleanup a little further. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-01-19 15:43:08 +01:00
Namhyung Kim	f0940cee22	dio: fix typos in comments Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Jiri Kosina <trivial@kernel.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-01-19 15:40:13 +01:00
Steven Whitehouse	24d9765fc1	GFS2: Fix error path in gfs2_lookup_by_inum() In the (impossible, except if there is fs corruption) error path in gfs2_lookup_by_inum() if the call to gfs2_inode_refresh() fails, it was leaving the function by calling iput() rather than iget_failed(). This would cause future lookups of the same inode to block forever. This patch fixes the problem by moving the call to gfs2_inode_refresh() into gfs2_inode_lookup() where iget_failed() is part of the error path already. Also this cleans up some unreachable code and makes gfs2_set_iop() static. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-01-18 14:49:08 +00:00
Benjamin Marzinski	23c3010808	GFS2: remove iopen glocks from cache on failed deletes When a file gets deleted on GFS2, if a node can't get an exclusive lock on the file's iopen glock, it punts on actually freeing up the space, because another node is using the file. When it does this, it needs to drop the iopen glock from its cache so that the other node can get an exclusive lock on it. Now, gfs2_delete_inode() sets GL_NOCACHE before dropping the shared lock on the iopen glock in preparation for grabbing it in the exclusive state. Since the node needs the glock in the exclusive state, dropping the shared lock from the cache doesn't slow down the case where no other nodes are using the file. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-01-18 14:28:29 +00:00
Al Viro	b89b12b462	autofs4: clean ->d_release() and autofs4_free_ino() up The latter is called only when both ino and dentry are about to be freed, so cleaning ->d_fsdata and ->dentry is pointless. Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-18 01:21:29 -05:00
Al Viro	26e6c91067	autofs4: split autofs4_init_ino() split init_ino into new_ino and clean_ino; the former is what used to be init_ino(NULL, sbi), the latter is for cases where we passed non-NULL ino. Lose unused arguments. Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-18 01:21:28 -05:00
Al Viro	5a37db302e	autofs4: mkdir and symlink always get a dentry that had passed lookup ... so ->d_fsdata will have been set up before we get there Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-18 01:21:28 -05:00
Al Viro	726a5e0688	autofs4: autofs4_get_inode() doesn't need autofs_info * argument anymore Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-18 01:21:28 -05:00
Al Viro	0bf71d4d00	autofs4: kill ->size in autofs_info It's used only to pass the length of symlink body to autofs4_get_inode() in autofs4_dir_symlink(). We can bloody well set inode->i_size in autofs4_dir_symlink() directly and be done with that. Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-18 01:21:28 -05:00
Al Viro	09f12c03fa	autofs4: pass mode to autofs4_get_inode() explicitly In all cases we'd set inf->mode to know value just before passing it to autofs4_get_inode(). That kills the need to store it in autofs_info and pass it to autofs_init_ino() Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-18 01:21:27 -05:00
Al Viro	14a2f00bde	autofs4: autofs4_mkroot() is not different from autofs4_init_ino() Kill it. Mind you, it's been an obfuscated call of autofs4_init_ino() ever since 2.3.99pre6-4... Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-18 01:21:27 -05:00
Al Viro	292c5ee802	autofs4: keep symlink body in inode->i_private gets rid of all ->free()/->u.symlink machinery in autofs; we simply keep symlink bodies in inode->i_private and free them in ->evict_inode(). Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-18 01:21:27 -05:00
Ian Kent	c0bcc9d552	autofs4 - fix debug print in autofs4_lookup() oz_mode isn't defined any more, use autofs4_oz_mode(sbi) instead. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-18 01:21:27 -05:00
Ian Kent	8931221411	vfs - fix dentry ref count in do_lookup() There is a ref count problem in fs/namei.c:do_lookup(). When walking in ref-walk mode, if follow_managed() returns a fail we need to drop dentry and possibly vfsmount. Clean up properly, as we do in the other caller of follow_managed(). Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-18 01:21:26 -05:00
Ian Kent	c14cc63a63	autofs4 - fix get_next_positive_dentry() The initialization condition in fs/autofs4/expire.c:get_next_positive_dentry() appears to be incorrect. If prev == NULL I believe that root should be returned. Further down, at the current dentry check for it being simple_positive() it looks like the d_lock for dentry p should be dropped instead of dentry ret, otherwise when p is assinged to ret we end up with no lock on p and a lost lock on ret, which leads to a deadlock. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-18 01:21:26 -05:00
Linus Torvalds	eee2a817df	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (25 commits) Btrfs: forced readonly mounts on errors btrfs: Require CAP_SYS_ADMIN for filesystem rebalance Btrfs: don't warn if we get ENOSPC in btrfs_block_rsv_check btrfs: Fix memory leak in btrfs_read_fs_root_no_radix() btrfs: check NULL or not btrfs: Don't pass NULL ptr to func that may deref it. btrfs: mount failure return value fix btrfs: Mem leak in btrfs_get_acl() btrfs: fix wrong free space information of btrfs btrfs: make the chunk allocator utilize the devices better btrfs: restructure find_free_dev_extent() btrfs: fix wrong calculation of stripe size btrfs: try to reclaim some space when chunk allocation fails btrfs: fix wrong data space statistics fs/btrfs: Fix build of ctree Btrfs: fix off by one while setting block groups readonly Btrfs: Add BTRFS_IOC_SUBVOL_GETFLAGS/SETFLAGS ioctls Btrfs: Add readonly snapshots support Btrfs: Refactor btrfs_ioctl_snap_create() btrfs: Extract duplicate decompress code ...	2011-01-17 14:43:43 -08:00
Artem Bityutskiy	18d1d7fbcc	UBIFS: introduce mounting flag This is a preparational patch which removes the 'c->always_chk_crc' which was set during mounting and remounting to R/W mode and introduces 'c->mounting' flag which is set when mounting. Now the 'c->always_chk_crc' flag is the same as 'c->remounting_rw && c->mounting'. This patch is a preparation for the next one which will need to know when we are mounting and remounting to R/W mode, which is exactly what 'c->always_chk_crc' effectively is, but its name does not suite the next patch. The other possibility would be to just re-name it, but then we'd end up with less logical flags coverage. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-01-17 23:24:30 +02:00
Artem Bityutskiy	d8cdda3efb	UBIFS: re-arrange variables in ubifs_info This is a cosmetic patch which re-arranges variables in 'struct ubifs_info' so that all boolean-like variables which are only changed during mounting or re-mounting to R/W mode are places together. Then they are turned into bit-fields, which makes the structure a little bit smaller. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2011-01-17 23:24:30 +02:00
Linus Torvalds	9e8a462a01	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6: ecryptfs: remove unnecessary decrypt when extending a file ecryptfs: Fix ecryptfs_printk() size_t warnings fs/ecryptfs: Add printf format/argument verification and fix fallout ecryptfs: fixed testing of file descriptor flags ecryptfs: test lower_file pointer when lower_file_mutex is locked ecryptfs: missing initialization of the superblock 'magic' field ecryptfs: moved ECRYPTFS_SUPER_MAGIC definition to linux/magic.h ecryptfs: fix truncation error in ecryptfs_read_update_atime	2011-01-17 12:39:57 -08:00
Geert Uytterhoeven	cf78859f52	xfs: Do not name variables "panic" On platforms that call panic() inside their BUG() macro (m68k/sun3, and all platforms that don't set HAVE_ARCH_BUG), compilation fails with: \| fs/xfs/support/debug.c: In function ‘xfs_cmn_err’: \| fs/xfs/support/debug.c:92: error: called object ‘panic’ is not a function as the local variable "panic" conflicts with the "panic()" function. Rename the local variable to resolve this. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-17 12:39:07 -08:00
liubo	acce952b02	Btrfs: forced readonly mounts on errors This patch comes from "Forced readonly mounts on errors" ideas. As we know, this is the first step in being more fault tolerant of disk corruptions instead of just using BUG() statements. The major content: - add a framework for generating errors that should result in filesystems going readonly. - keep FS state in disk super block. - make sure that all of resource will be freed and released at umount time. - make sure that fter FS is forced readonly on error, there will be no more disk change before FS is corrected. For this, we should stop write operation. After this patch is applied, the conversion from BUG() to such a framework can happen incrementally. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-17 15:13:08 -05:00
Linus Torvalds	1a47f7a84e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: add cruid= mount option cifs: cFYI the entire error code in map_smb_to_linux_error	2011-01-17 11:17:51 -08:00
Linus Torvalds	ab2020f2f1	Merge git://git.infradead.org/mtd-2.6 * git://git.infradead.org/mtd-2.6: (59 commits) mtd: mtdpart: disallow reading OOB past the end of the partition mtd: pxa3xx_nand: NULL dereference in pxa3xx_nand_probe UBI: use mtd->writebufsize to set minimal I/O unit size mtd: initialize writebufsize in the MTD object of a partition mtd: onenand: add mtd->writebufsize initialization mtd: nand: add mtd->writebufsize initialization mtd: cfi: add writebufsize initialization mtd: add writebufsize field to mtd_info struct mtd: OneNAND: OMAP2/3: prevent regulator sleeping while OneNAND is in use mtd: OneNAND: add enable / disable methods to onenand_chip mtd: m25p80: Fix JEDEC ID for AT26DF321 mtd: txx9ndfmc: limit transfer bytes to 512 (ECC provides 6 bytes max) mtd: cfi_cmdset_0002: add support for Samsung K8D3x16UxC NOR chips mtd: cfi_cmdset_0002: add support for Samsung K8D6x16UxM NOR chips mtd: nand: ams-delta: drop omap_read/write, use ioremap mtd: m25p80: add debugging trace in sst_write mtd: nand: ams-delta: select for built-in by default mtd: OneNAND: lighten scary initial bad block messages mtd: OneNAND: OMAP2/3: add support for command line partitioning mtd: nand: rearrange ONFI revision checking, add ONFI 2.3 ... Fix up trivial conflict in drivers/mtd/Kconfig as per DavidW.	2011-01-17 11:15:30 -08:00
Frank Swiderski	24562486be	ecryptfs: remove unnecessary decrypt when extending a file Removes an unecessary page decrypt from ecryptfs_begin_write when the page is beyond the current file size. Previously, the call to ecryptfs_decrypt_page would result in a read of 0 bytes, but still attempt to decrypt an entire page. This patch detects that case and merely zeros the page before marking it up-to-date. Signed-off-by: Frank Swiderski <fes@chromium.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-01-17 13:01:25 -06:00
Tyler Hicks	f24b38874e	ecryptfs: Fix ecryptfs_printk() size_t warnings Commit cb55d21f6fa19d8c6c2680d90317ce88c1f57269 revealed a number of missing 'z' length modifiers in calls to ecryptfs_printk() when printing variables of type size_t. This patch fixes those compiler warnings. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-01-17 13:01:24 -06:00
Joe Perches	888d57bbc9	fs/ecryptfs: Add printf format/argument verification and fix fallout Add __attribute__((format... to __ecryptfs_printk Make formats and arguments match. Add casts to (unsigned long long) for %llu. Signed-off-by: Joe Perches <joe@perches.com> [tyhicks: 80 columns cleanup and fixed typo] Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-01-17 13:01:23 -06:00
Roberto Sassu	0abe116947	ecryptfs: fixed testing of file descriptor flags This patch replaces the check (lower_file->f_flags & O_RDONLY) with ((lower_file & O_ACCMODE) == O_RDONLY). Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-01-17 11:24:43 -06:00
Roberto Sassu	27992890b0	ecryptfs: test lower_file pointer when lower_file_mutex is locked This patch prevents the lower_file pointer in the 'ecryptfs_inode_info' structure to be checked when the mutex 'lower_file_mutex' is not locked. Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-01-17 11:24:42 -06:00
Roberto Sassu	070baa5128	ecryptfs: missing initialization of the superblock 'magic' field This patch initializes the 'magic' field of ecryptfs filesystems to ECRYPTFS_SUPER_MAGIC. Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> [tyhicks: merge with `66cb76666d`] Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-01-17 11:23:01 -06:00
Roberto Sassu	2a8652f4e0	ecryptfs: moved ECRYPTFS_SUPER_MAGIC definition to linux/magic.h The definition of ECRYPTFS_SUPER_MAGIC has been moved to the include file 'linux/magic.h' to become available to other kernel subsystems. Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-01-17 10:44:31 -06:00
Edward Shishkin	38a708d775	ecryptfs: fix truncation error in ecryptfs_read_update_atime This is similar to the bug found in direct-io not so long ago. Fix up truncation (ssize_t->int). This only matters with >2G reads/writes, which the kernel doesn't permit. Signed-off-by: Edward Shishkin <edward.shishkin@gmail.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Eric Sandeen <esandeen@redhat.com> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-01-17 10:44:30 -06:00
Namhyung Kim	ecf5632dd1	fs: fix address space warnings in ioctl_fiemap() The fi_extents_start field of struct fiemap_extent_info is a user pointer but was not marked as __user. This makes sparse emit following warnings: CHECK fs/ioctl.c fs/ioctl.c:114:26: warning: incorrect type in argument 1 (different address spaces) fs/ioctl.c:114:26: expected void [noderef] <asn:1>dst fs/ioctl.c:114:26: got struct fiemap_extent [assigned] dest fs/ioctl.c:202:14: warning: incorrect type in argument 1 (different address spaces) fs/ioctl.c:202:14: expected void const volatile [noderef] <asn:1><noident> fs/ioctl.c:202:14: got struct fiemap_extent [assigned] fi_extents_start fs/ioctl.c:212:27: warning: incorrect type in argument 1 (different address spaces) fs/ioctl.c:212:27: expected void [noderef] <asn:1>dst fs/ioctl.c:212:27: got char <noident> Also add 'ufiemap' variable to eliminate unnecessary casts. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 08:21:42 -05:00
Namhyung Kim	27eaa1c90c	aio: check return value of create_workqueue() Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 05:12:44 -05:00
Dr. David Alan Gilbert	274052ef0b	hpfs_setattr error case avoids unlock_kernel This fixed a case that 'sparse' spotted where hpfs_setattr has an error return that didn't go through it's path that unlocks. This is against git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git version `6313e3c217`. Build tested only, I don't have an hpfs file system to test. Dave Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 05:11:37 -05:00
Namhyung Kim	e0bb6bda43	compat: copy missing fields in compat_statfs64 to user f_flags and f_spare fields were not copied to userspace when compat_sys_[f]statfs64 called. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 04:54:38 -05:00
Namhyung Kim	974d879e80	compat: update comment of compat statfs syscalls The commit `7ed1ee6118` ("Take statfs variants to fs/statfs.c") separates out statfs syscalls from fs/open.c. Thus the comment should be changed also. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Jiri Kosina <trivial@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 04:54:38 -05:00
Namhyung Kim	6a5640f102	compat: remove unnecessary assignment in compat_rw_copy_check_uvector() *@ret_pointer is initialized to @fast_pointer thus the assignment is redundant. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 04:54:38 -05:00
Randy Dunlap	16ebe911eb	fs: FS_POSIX_ACL does not depend on BLOCK - Fix a kconfig unmet dependency warning. - Remove the comment that identifies which filesystems use POSIX ACL utility routines. - Move the FS_POSIX_ACL symbol outside of the BLOCK symbol if/endif block because its functions do not depend on BLOCK and some of the filesystems that use it do not depend on BLOCK. warning: (GENERIC_ACL && JFFS2_FS_POSIX_ACL && NFSD_V4 && NFS_ACL_SUPPORT && 9P_FS_POSIX_ACL) selects FS_POSIX_ACL which has unmet direct dependencies (BLOCK) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 03:30:37 -05:00
Steven Rostedt	3bc0ba4305	fs: Remove unlikely() from fget_light() There's an unlikely() in fget_light() that assumes the file ref count will be 1. Running the annotate branch profiler on a desktop that is performing daily tasks (running firefox, evolution, xchat and is also part of a distcc farm), it shows that the ref count is not 1 that often. correct incorrect % Function File Line ------- --------- - -------- ---- ---- 1035099358 6209599193 85 fget_light file_table.c 315 Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 03:26:27 -05:00
Christoph Hellwig	2fe17c1075	fallocate should be a file operation Currently all filesystems except XFS implement fallocate asynchronously, while XFS forced a commit. Both of these are suboptimal - in case of O_SYNC I/O we really want our allocation on disk, especially for the !KEEP_SIZE case where we actually grow the file with user-visible zeroes. On the other hand always commiting the transaction is a bad idea for fast-path uses of fallocate like for example in recent Samba versions. Given that block allocation is a data plane operation anyway change it from an inode operation to a file operation so that we have the file structure available that lets us check for O_SYNC. This also includes moving the code around for a few of the filesystems, and remove the already unnedded S_ISDIR checks given that we only wire up fallocate for regular files. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 02:25:31 -05:00
Christoph Hellwig	64c23e8687	make the feature checks in ->fallocate future proof Instead of various home grown checks that might need updates for new flags just check for any bit outside the mask of the features supported by the filesystem. This makes the check future proof for any newly added flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 02:25:30 -05:00
Al Viro	b1e75df45a	tidy up around finish_automount() do_add_mount() and mnt_clear_expiry() are not needed outside of namespace.c anymore, now that namei has finish_automount() to use. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 01:47:59 -05:00
Al Viro	15f9a3f3e1	don't drop newmnt on error in do_add_mount() That gets rid of the kludge in finish_automount() - we need to keep refcount on the vfsmount as-is until we evict it from expiry list. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 01:41:58 -05:00
Al Viro	19a167af7c	Take the completion of automount into new helper ... and shift it from namei.c to namespace.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 01:35:23 -05:00
Linus Torvalds	8a335bc631	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/scsi-post-merge-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/scsi-post-merge-2.6: ocfs2: Make OCFS2_FS depend on CONFIGFS_FS dlm: Make DLM depend on CONFIGFS_FS net: Make NETCONSOLE_DYNAMIC depend on CONFIGFS_FS configfs: change depends -> select SYSFS [SCSI] sd,sr: kill compat SDEV_MEDIA_CHANGE event [SCSI] sd: implement sd_check_events()	2011-01-16 15:06:43 -08:00
Al Viro	7e3d0eb0b0	VFS: Fix UP compile error in fs/namespace.c mnt_longterm is there only on SMP Reported-and-tested-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-16 14:59:45 -08:00
Nicholas Bellinger	7b1fff7e4f	ocfs2: Make OCFS2_FS depend on CONFIGFS_FS This patch fixes the following kconfig error after changing CONFIGFS_FS -> select SYSFS: fs/sysfs/Kconfig:1:error: recursive dependency detected! fs/sysfs/Kconfig:1: symbol SYSFS is selected by CONFIGFS_FS fs/configfs/Kconfig:1: symbol CONFIGFS_FS is selected by OCFS2_FS fs/ocfs2/Kconfig:1: symbol OCFS2_FS depends on SYSFS Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org> Cc: Joel Becker <jlbec@evilplan.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: James Bottomley <James.Bottomley@suse.de>	2011-01-16 21:22:40 +00:00
Nicholas Bellinger	86c747d2a4	dlm: Make DLM depend on CONFIGFS_FS This patch fixes the following kconfig error after changing CONFIGFS_FS -> select SYSFS: fs/sysfs/Kconfig:1:error: recursive dependency detected! fs/sysfs/Kconfig:1: symbol SYSFS is selected by CONFIGFS_FS fs/configfs/Kconfig:1: symbol CONFIGFS_FS is selected by DLM fs/dlm/Kconfig:1: symbol DLM depends on SYSFS Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org> Cc: Joel Becker <jlbec@evilplan.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: James Bottomley <James.Bottomley@suse.de>	2011-01-16 21:22:37 +00:00
Nicholas Bellinger	e205117285	configfs: change depends -> select SYSFS This patch changes configfs to select SYSFS to fix the following: warning: (TARGET_CORE && GFS2_FS) selects CONFIGFS_FS which has unmet direct dependencies (SYSFS) Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org> Acked-by: Joel Becker <jlbec@evilplan.org>	2011-01-16 21:22:29 +00:00
Stefan Schmidt	f8b18087fd	fs/btrfs: Fix build of ctree Fix the build failure in some configurations: CC [M] fs/btrfs/ctree.o In file included from fs/btrfs/ctree.c:21:0: fs/btrfs/ctree.h:1003:17: error: field 'super_kobj' has incomplete type fs/btrfs/ctree.h:1074:17: error: field 'root_kobj' has incomplete type make[2]: * [fs/btrfs/ctree.o] Error 1 make[1]: * [fs/btrfs] Error 2 make: *** [fs] Error 2 caused by commit `57cc7215b7` ("headers: kobject.h redux") We need to include kobject.h here. Reported-by: Jeff Garzik <jeff@garzik.org> Fix-suggested-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-16 12:59:42 -08:00
Linus Torvalds	5520ebd308	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus * git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus: Squashfs: simplify CONFIG_SQUASHFS_LZO handling Squashfs: move squashfs_i() definition from squashfs.h Squashfs: get rid of default n in Kconfig Squashfs: add missing check in zlib_wrapper Squashfs: remove unnecessary variable in zlib_wrapper Squashfs: Add XZ compression configuration option Squashfs: add XZ compression support	2011-01-16 12:11:13 -08:00
Linus Torvalds	f8206b925f	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (23 commits) sanitize vfsmount refcounting changes fix old umount_tree() breakage autofs4: Merge the remaining dentry ops tables Unexport do_add_mount() and add in follow_automount(), not ->d_automount() Allow d_manage() to be used in RCU-walk mode Remove a further kludge from __do_follow_link() autofs4: Bump version autofs4: Add v4 pseudo direct mount support autofs4: Fix wait validation autofs4: Clean up autofs4_free_ino() autofs4: Clean up dentry operations autofs4: Clean up inode operations autofs4: Remove unused code autofs4: Add d_manage() dentry operation autofs4: Add d_automount() dentry operation Remove the automount through follow_link() kludge code from pathwalk CIFS: Use d_automount() rather than abusing follow_link() NFS: Use d_automount() rather than abusing follow_link() AFS: Use d_automount() rather than abusing follow_link() Add an AT_NO_AUTOMOUNT flag to suppress terminal automount ...	2011-01-16 11:31:50 -08:00
Al Viro	f03c65993b	sanitize vfsmount refcounting changes Instead of splitting refcount between (per-cpu) mnt_count and (SMP-only) mnt_longrefs, make all references contribute to mnt_count again and keep track of how many are longterm ones. Accounting rules for longterm count: * 1 for each fs_struct.root.mnt * 1 for each fs_struct.pwd.mnt * 1 for having non-NULL ->mnt_ns * decrement to 0 happens only under vfsmount lock exclusive That allows nice common case for mntput() - since we can't drop the final reference until after mnt_longterm has reached 0 due to the rules above, mntput() can grab vfsmount lock shared and check mnt_longterm. If it turns out to be non-zero (which is the common case), we know that this is not the final mntput() and can just blindly decrement percpu mnt_count. Otherwise we grab vfsmount lock exclusive and do usual decrement-and-check of percpu mnt_count. For fs_struct.c we have mnt_make_longterm() and mnt_make_shortterm(); namespace.c uses the latter in places where we don't already hold vfsmount lock exclusive and opencodes a few remaining spots where we need to manipulate mnt_longterm. Note that we mostly revert the code outside of fs/namespace.c back to what we used to have; in particular, normal code doesn't need to care about two kinds of references, etc. And we get to keep the optimization Nick's variant had bought us... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-16 13:47:07 -05:00
Al Viro	7b8a53fd81	fix old umount_tree() breakage Expiry-related code calls umount_tree() several times with the same list to collect vfsmounts to. Which is fine, except that umount_tree() implicitly assumed that the list would be empty on each call - it moves the victims over there and then iterates through the list kicking them out. It's almost idempotent, so everything nearly worked. However, mnt->ghosts handling (and thus expirability checks) had been broken - that part was not idempotent... The fix is trivial - use local temporary list, splice it to the the collector list when we are through. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-16 13:47:01 -05:00
Ben Hutchings	6f88a4403d	btrfs: Require CAP_SYS_ADMIN for filesystem rebalance Filesystem rebalancing (BTRFS_IOC_BALANCE) affects the entire filesystem and may run uninterruptibly for a long time. This does not seem to be something that an unprivileged user should be able to do. Reported-by: Aron Xu <happyaron.xu@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:20 -05:00
Josef Bacik	f690efb1aa	Btrfs: don't warn if we get ENOSPC in btrfs_block_rsv_check If we run low on space we could get a bunch of warnings out of btrfs_block_rsv_check, but this is mostly just called via the transaction code to see if we need to end the transaction, it expects to see failures, so let's not WARN and freak everybody out for no reason. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:20 -05:00
Tsutomu Itoh	5e540f7715	btrfs: Fix memory leak in btrfs_read_fs_root_no_radix() In btrfs_read_fs_root_no_radix(), 'root' is not freed if btrfs_search_slot() returns error. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:20 -05:00
Tsutomu Itoh	91ca338d77	btrfs: check NULL or not Should check if functions returns NULL or not. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:20 -05:00
Jesper Juhl	ff175d57f0	btrfs: Don't pass NULL ptr to func that may deref it. Hi, In fs/btrfs/inode.c::fixup_tree_root_location() we have this code: ... if (!path) { err = -ENOMEM; goto out; } ... out: btrfs_free_path(path); return err; btrfs_free_path() passes its argument on to other functions and some of them end up dereferencing the pointer. In the code above that pointer is clearly NULL, so btrfs_free_path() will eventually cause a NULL dereference. There are many ways to cut this cake (fix the bug). The one I chose was to make btrfs_free_path() deal gracefully with NULL pointers. If you disagree, feel free to come up with an alternative patch. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:20 -05:00
Dave Young	20b450773d	btrfs: mount failure return value fix I happened to pass swap partition as root partition in cmdline, then kernel panic and tell me about "Cannot open root device". It is not correct, in fact it is a fs type mismatch instead of 'no device'. Eventually I found btrfs mounting failed with -EIO, it should be -EINVAL. The logic in init/do_mounts.c: for (p = fs_names; *p; p += strlen(p)+1) { int err = do_mount_root(name, p, flags, root_mount_data); switch (err) { case 0: goto out; case -EACCES: flags \|= MS_RDONLY; goto retry; case -EINVAL: continue; } print "Cannot open root device" panic } SO fs type after btrfs will have no chance to mount Here fix the return value as -EINVAL Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:19 -05:00
Jesper Juhl	42838bb265	btrfs: Mem leak in btrfs_get_acl() It seems to me that we leak the memory allocated to 'value' in btrfs_get_acl() if the call to posix_acl_from_xattr() fails. Here's a patch that attempts to correct that problem. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:19 -05:00
Miao Xie	6d07bcec96	btrfs: fix wrong free space information of btrfs When we store data by raid profile in btrfs with two or more different size disks, df command shows there is some free space in the filesystem, but the user can not write any data in fact, df command shows the wrong free space information of btrfs. # mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10 # btrfs-show Label: none uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64 Total devices 2 FS bytes used 28.00KB devid 1 size 5.01GB used 2.03GB path /dev/sda9 devid 2 size 10.00GB used 2.01GB path /dev/sda10 # btrfs device scan /dev/sda9 /dev/sda10 # mount /dev/sda9 /mnt # dd if=/dev/zero of=tmpfile0 bs=4K count=9999999999 (fill the filesystem) # sync # df -TH Filesystem Type Size Used Avail Use% Mounted on /dev/sda9 btrfs 17G 8.6G 5.4G 62% /mnt # btrfs-show Label: none uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64 Total devices 2 FS bytes used 3.99GB devid 1 size 5.01GB used 5.01GB path /dev/sda9 devid 2 size 10.00GB used 4.99GB path /dev/sda10 It is because btrfs cannot allocate chunks when one of the pairing disks has no space, the free space on the other disks can not be used for ever, and should be subtracted from the total space, but btrfs doesn't subtract this space from the total. It is strange to the user. This patch fixes it by calcing the free space that can be used to allocate chunks. Implementation: 1. get all the devices free space, and align them by stripe length. 2. sort the devices by the free space. 3. check the free space of the devices, 3.1. if it is not zero, and then check the number of the devices that has more free space than this device, if the number of the devices is beyond the min stripe number, the free space can be used, and add into total free space. if the number of the devices is below the min stripe number, we can not use the free space, the check ends. 3.2. if the free space is zero, check the next devices, goto 3.1 This implementation is just likely fake chunk allocation. After appling this patch, df can show correct space information: # df -TH Filesystem Type Size Used Avail Use% Mounted on /dev/sda9 btrfs 17G 8.6G 0 100% /mnt Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:19 -05:00
Miao Xie	b2117a39fa	btrfs: make the chunk allocator utilize the devices better With this patch, we change the handling method when we can not get enough free extents with default size. Implementation: 1. Look up the suitable free extent on each device and keep the search result. If not find a suitable free extent, keep the max free extent 2. If we get enough suitable free extents with default size, chunk allocation succeeds. 3. If we can not get enough free extents, but the number of the extent with default size is >= min_stripes, we just change the mapping information (reduce the number of stripes in the extent map), and chunk allocation succeeds. 4. If the number of the extent with default size is < min_stripes, sort the devices by its max free extent's size descending 5. Use the size of the max free extent on the (num_stripes - 1)th device as the stripe size to allocate the device space By this way, the chunk allocator can allocate chunks as large as possible when the devices' space is not enough and make full use of the devices. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:19 -05:00
Miao Xie	7bfc837df9	btrfs: restructure find_free_dev_extent() - make it return the start position and length of the max free space when it can not find a suitable free space. - make it more readability Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:19 -05:00
Miao Xie	1974a3b42d	btrfs: fix wrong calculation of stripe size There are two tiny problem: - One is When we check the chunk size is greater than the max chunk size or not, we should take mirrors into account, but the original code didn't. - The other is btrfs shouldn't use the size of the residual free space as the length of of a dup chunk when doing chunk allocation. It is because the device space that a dup chunk needs is twice as large as the chunk size, if we use the size of the residual free space as the length of a dup chunk, we can not get enough free space. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:19 -05:00
Miao Xie	d52a5b5f1f	btrfs: try to reclaim some space when chunk allocation fails We cannot write data into files when when there is tiny space in the filesystem. Reproduce steps: # mkfs.btrfs /dev/sda1 # mount /dev/sda1 /mnt # dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=1 # dd if=/dev/zero of=/mnt/tmpfile1 bs=4K count=99999999999999 (fill the filesystem) # umount /mnt # mount /dev/sda1 /mnt # rm -f /mnt/tmpfile0 # dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=1 (failed with nospec) But if we do the last step again, we can write data successfully. The reason of the problem is that btrfs didn't try to commit the current transaction and reclaim some space when chunk allocation failed. This patch fixes it by committing the current transaction to reclaim some space when chunk allocation fails. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:19 -05:00
Miao Xie	299a08b1c3	btrfs: fix wrong data space statistics Josef has implemented mixed data/metadata chunks, we must add those chunks' space just like data chunks. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:19 -05:00
Stefan Schmidt	f580eb0931	fs/btrfs: Fix build of ctree CC [M] fs/btrfs/ctree.o In file included from fs/btrfs/ctree.c:21:0: fs/btrfs/ctree.h:1003:17: error: field <91>super_kobj<92> has incomplete type fs/btrfs/ctree.h:1074:17: error: field <91>root_kobj<92> has incomplete type make[2]: * [fs/btrfs/ctree.o] Error 1 make[1]: * [fs/btrfs] Error 2 make: *** [fs] Error 2 We need to include kobject.h here. Reported-by: Jeff Garzik <jeff@garzik.org> Fix-suggested-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:19 -05:00
Chris Mason	f892436eb2	Merge branch 'lzo-support' of git://repo.or.cz/linux-btrfs-devel into btrfs-38	2011-01-16 11:25:54 -05:00
Chris Mason	26c79f6ba0	Merge branch 'readonly-snapshots' of git://repo.or.cz/linux-btrfs-devel into btrfs-38	2011-01-16 11:24:45 -05:00
David Howells	b650c858c2	autofs4: Merge the remaining dentry ops tables Merge the remaining autofs4 dentry ops tables. It doesn't matter if d_automount and d_manage are present on something that's not mountable or holdable as these ops are only used if the appropriate flags are set in dentry->d_flags. [AV] switch to ->s_d_op, since now _everything_ on autofs4 is using the same dentry_operations. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:49 -05:00
David Howells	ea5b778a8b	Unexport do_add_mount() and add in follow_automount(), not ->d_automount() Unexport do_add_mount() and make ->d_automount() return the vfsmount to be added rather than calling do_add_mount() itself. follow_automount() will then do the addition. This slightly complicates things as ->d_automount() normally wants to add the new vfsmount to an expiration list and start an expiration timer. The problem with that is that the vfsmount will be deleted if it has a refcount of 1 and the timer will not repeat if the expiration list is empty. To this end, we require the vfsmount to be returned from d_automount() with a refcount of (at least) 2. One of these refs will be dropped unconditionally. In addition, follow_automount() must get a 3rd ref around the call to do_add_mount() lest it eat a ref and return an error, leaving the mount we have open to being expired as we would otherwise have only 1 ref on it. d_automount() should also add the the vfsmount to the expiration list (by calling mnt_set_expiry()) and start the expiration timer before returning, if this mechanism is to be used. The vfsmount will be unlinked from the expiration list by follow_automount() if do_add_mount() fails. This patch also fixes the call to do_add_mount() for AFS to propagate the mount flags from the parent vfsmount. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:48 -05:00
David Howells	ab90911ff9	Allow d_manage() to be used in RCU-walk mode Allow d_manage() to be called from pathwalk when it is in RCU-walk mode as well as when it is in Ref-walk mode. This permits __follow_mount_rcu() to call d_manage() directly. d_manage() needs a parameter to indicate that it is in RCU-walk mode as it isn't allowed to sleep if in that mode (but should return -ECHILD instead). autofs4_d_manage() can then be set to retain RCU-walk mode if the daemon accesses it and otherwise request dropping back to ref-walk mode. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:47 -05:00
David Howells	87556ef199	Remove a further kludge from __do_follow_link() Remove a further kludge from __do_follow_link() as it's no longer required with the automount code. This reverts the non-helper-function parts of `051d381259`, which breaks union mounts. Reported-by: vaurora@redhat.com Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:46 -05:00
Ian Kent	dd89f90d2d	autofs4: Add v4 pseudo direct mount support Version 4 of autofs provides a pseudo direct mount implementation that relies on directories at the leaves of a directory tree under an indirect mount to trigger mounts. This patch adds support for that functionality. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:44 -05:00
Ian Kent	9e3fea16ba	autofs4: Fix wait validation It is possible for the check in wait.c:validate_request() to return an incorrect result if the dentry that was mounted upon has changed during the callback. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:43 -05:00
Ian Kent	6651149371	autofs4: Clean up autofs4_free_ino() When this function is called the local reference count does't need to be updated since the dentry is going away and dput definitely must not be called here. Also the autofs info struct field inode isn't used so remove it. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:42 -05:00
Ian Kent	71e469db24	autofs4: Clean up dentry operations There are now two distinct dentry operations uses. One for dentrys that trigger mounts and one for dentrys that do not. Rationalize the use of these dentry operations and rename them to reflect their function. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:41 -05:00
Ian Kent	e61da20a50	autofs4: Clean up inode operations Since the use of ->follow_link() has been eliminated there is no need to separate the indirect and direct inode operations. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:40 -05:00
Ian Kent	8c13a676d5	autofs4: Remove unused code Remove code that is not used due to the use of ->d_automount() and ->d_manage(). Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:39 -05:00
Ian Kent	b5b801779d	autofs4: Add d_manage() dentry operation This patch required a previous patch to add the ->d_automount() dentry operation. Add a function to use the newly defined ->d_manage() dentry operation for blocking during mount and expire. Whether the VFS calls the dentry operations d_automount() and d_manage() is controled by the DMANAGED_AUTOMOUNT and DMANAGED_TRANSIT flags. autofs uses the d_automount() operation to callback to user space to request mount operations and the d_manage() operation to block walks into mounts that are under construction or destruction. In order to prevent these functions from being called unnecessarily the DMANAGED_* flags are cleared for cases which would cause this. In the common case the DMANAGED_AUTOMOUNT and DMANAGED_TRANSIT flags are both set for dentrys waiting to be mounted. The DMANAGED_TRANSIT flag is cleared upon successful mount request completion and set during expire runs, both during the dentry expire check, and if selected for expire, is left set until a subsequent successful mount request completes. The exception to this is the so-called rootless multi-mount which has no actual mount at its base. In this case the DMANAGED_AUTOMOUNT flag is cleared upon successful mount request completion as well and set again after a successful expire. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:38 -05:00
Ian Kent	10584211e4	autofs4: Add d_automount() dentry operation Add a function to use the newly defined ->d_automount() dentry operation for triggering mounts instead of doing the user space callback in ->lookup() and ->d_revalidate(). Note, to be useful the subsequent patch to add the ->d_manage() dentry operation is also needed so the discussion of functionality is deferred to that patch. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:37 -05:00
David Howells	db3729153e	Remove the automount through follow_link() kludge code from pathwalk Remove the automount through follow_link() kludge code from pathwalk in favour of using d_automount(). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:36 -05:00
David Howells	01c64feac4	CIFS: Use d_automount() rather than abusing follow_link() Make CIFS use the new d_automount() dentry operation rather than abusing follow_link() on directories. [NOTE: THIS IS UNTESTED!] Signed-off-by: David Howells <dhowells@redhat.com> Cc: Steve French <sfrench@samba.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:35 -05:00
David Howells	36d43a4376	NFS: Use d_automount() rather than abusing follow_link() Make NFS use the new d_automount() dentry operation rather than abusing follow_link() on directories. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:34 -05:00
David Howells	d18610b0ce	AFS: Use d_automount() rather than abusing follow_link() Make AFS use the new d_automount() dentry operation rather than abusing follow_link() on directories. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:33 -05:00
David Howells	6f45b65672	Add an AT_NO_AUTOMOUNT flag to suppress terminal automount Add an AT_NO_AUTOMOUNT flag to suppress terminal automounting of automount point directories. This can be used by fstatat() users to permit the gathering of attributes on an automount point and also prevent mass-automounting of a directory of automount points by ls. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:33 -05:00
David Howells	cc53ce53c8	Add a dentry op to allow processes to be held during pathwalk transit Add a dentry op (d_manage) to permit a filesystem to hold a process and make it sleep when it tries to transit away from one of that filesystem's directories during a pathwalk. The operation is keyed off a new dentry flag (DCACHE_MANAGE_TRANSIT). The filesystem is allowed to be selective about which processes it holds and which it permits to continue on or prohibits from transiting from each flagged directory. This will allow autofs to hold up client processes whilst letting its userspace daemon through to maintain the directory or the stuff behind it or mounted upon it. The ->d_manage() dentry operation: int (d_manage)(struct path path, bool mounting_here); takes a pointer to the directory about to be transited away from and a flag indicating whether the transit is undertaken by do_add_mount() or do_move_mount() skipping through a pile of filesystems mounted on a mountpoint. It should return 0 if successful and to let the process continue on its way; -EISDIR to prohibit the caller from skipping to overmounted filesystems or automounting, and to use this directory; or some other error code to return to the user. ->d_manage() is called with namespace_sem writelocked if mounting_here is true and no other locks held, so it may sleep. However, if mounting_here is true, it may not initiate or wait for a mount or unmount upon the parameter directory, even if the act is actually performed by userspace. Within fs/namei.c, follow_managed() is extended to check with d_manage() first on each managed directory, before transiting away from it or attempting to automount upon it. follow_down() is renamed follow_down_one() and should only be used where the filesystem deliberately intends to avoid management steps (e.g. autofs). A new follow_down() is added that incorporates the loop done by all other callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS and CIFS do use it, their use is removed by converting them to use d_automount()). The new follow_down() calls d_manage() as appropriate. It also takes an extra parameter to indicate if it is being called from mount code (with namespace_sem writelocked) which it passes to d_manage(). follow_down() ignores automount points so that it can be used to mount on them. __follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to sleep. It would be possible to enter d_manage() in rcu-walk mode too, and have that determine whether to abort or not itself. That would allow the autofs daemon to continue on in rcu-walk mode. Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't required as every tranist from that directory will cause d_manage() to be invoked. It can always be set again when necessary. ========================== WHAT THIS MEANS FOR AUTOFS ========================== Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to trigger the automounting of indirect mounts, and both of these can be called with i_mutex held. autofs knows that the i_mutex will be held by the caller in lookup(), and so can drop it before invoking the daemon - but this isn't so for d_revalidate(), since the lock is only held on _some_ of the code paths that call it. This means that autofs can't risk dropping i_mutex from its d_revalidate() function before it calls the daemon. The bug could manifest itself as, for example, a process that's trying to validate an automount dentry that gets made to wait because that dentry is expired and needs cleaning up: mkdir S ffffffff8014e05a 0 32580 24956 Call Trace: [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f [<ffffffff80057a2f>] lookup_create+0x46/0x80 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4 versus the automount daemon which wants to remove that dentry, but can't because the normal process is holding the i_mutex lock: automount D ffffffff8014e05a 0 32581 1 32561 Call Trace: [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14 [<ffffffff800e6d55>] do_rmdir+0x77/0xde [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 which means that the system is deadlocked. This patch allows autofs to hold up normal processes whilst the daemon goes ahead and does things to the dentry tree behind the automouter point without risking a deadlock as almost no locks are held in d_manage() and none in d_automount(). Signed-off-by: David Howells <dhowells@redhat.com> Was-Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:31 -05:00
David Howells	9875cf8064	Add a dentry op to handle automounting rather than abusing follow_link() Add a dentry op (d_automount) to handle automounting directories rather than abusing the follow_link() inode operation. The operation is keyed off a new dentry flag (DCACHE_NEED_AUTOMOUNT). This also makes it easier to add an AT_ flag to suppress terminal segment automount during pathwalk and removes the need for the kludge code in the pathwalk algorithm to handle directories with follow_link() semantics. The ->d_automount() dentry operation: struct vfsmount (d_automount)(struct path mountpoint); takes a pointer to the directory to be mounted upon, which is expected to provide sufficient data to determine what should be mounted. If successful, it should return the vfsmount struct it creates (which it should also have added to the namespace using do_add_mount() or similar). If there's a collision with another automount attempt, NULL should be returned. If the directory specified by the parameter should be used directly rather than being mounted upon, -EISDIR should be returned. In any other case, an error code should be returned. The ->d_automount() operation is called with no locks held and may sleep. At this point the pathwalk algorithm will be in ref-walk mode. Within fs/namei.c itself, a new pathwalk subroutine (follow_automount()) is added to handle mountpoints. It will return -EREMOTE if the automount flag was set, but no d_automount() op was supplied, -ELOOP if we've encountered too many symlinks or mountpoints, -EISDIR if the walk point should be used without mounting and 0 if successful. The path will be updated to point to the mounted filesystem if a successful automount took place. __follow_mount() is replaced by follow_managed() which is more generic (especially with the patch that adds ->d_manage()). This handles transits from directories during pathwalk, including automounting and skipping over mountpoints (and holding processes with the next patch). __follow_mount_rcu() will jump out of RCU-walk mode if it encounters an automount point with nothing mounted on it. follow_dotdot() does not handle automounts as you don't want to trigger them whilst following "..". I've also extracted the mount/don't-mount logic from autofs4 and included it here. It makes the mount go ahead anyway if someone calls open() or creat(), tries to traverse the directory, tries to chdir/chroot/etc. into the directory, or sticks a '/' on the end of the pathname. If they do a stat(), however, they'll only trigger the automount if they didn't also say O_NOFOLLOW. I've also added an inode flag (S_AUTOMOUNT) so that filesystems can mark their inodes as automount points. This flag is automatically propagated to the dentry as DCACHE_NEED_AUTOMOUNT by __d_instantiate(). This saves NFS and could save AFS a private flag bit apiece, but is not strictly necessary. It would be preferable to do the propagation in d_set_d_op(), but that doesn't normally have access to the inode. [AV: fixed breakage in case if __follow_mount_rcu() fails and nameidata_drop_rcu() succeeds in RCU case of do_lookup(); we need to fall through to non-RCU case after that, rather than just returning with ungrabbed *path] Signed-off-by: David Howells <dhowells@redhat.com> Was-Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:05:03 -05:00
Al Viro	1a8edf40e7	do_lookup() fix do_lookup() has a path leading from LOOKUP_RCU case to non-RCU crossing of mountpoints, which breaks things badly. If we hit need_revalidate: and do nothing in there, we need to come back into LOOKUP_RCU half of things, not to done: in non-RCU one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:03:39 -05:00
Linus Torvalds	7cb3920a65	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: prevent NMI timeouts in cmn_err xfs: Add log level to assertion printk xfs: fix an assignment within an ASSERT() xfs: fix error handling for synchronous writes xfs: add FITRIM support xfs: ensure log covering transactions are synchronous xfs: serialise unaligned direct IOs xfs: factor common write setup code xfs: split buffered IO write path from xfs_file_aio_write xfs: split direct IO write path from xfs_file_aio_write xfs: introduce xfs_rw_lock() helpers for locking the inode xfs: factor post-write newsize updates xfs: factor common post-write isize handling code xfs: ensure sync write errors are returned	2011-01-14 15:24:17 -08:00

... 5 6 7 8 9 ...

21764 Commits