linux

Commit Graph

Author	SHA1	Message	Date
Sage Weil	088b3f5e9e	ceph: fix flushing of caps vs cap import If we are mid-flush and a cap is migrated to another node, we need to resend the cap flush message to the new MDS, and do so with the original flush_seq to avoid leaking across a sync boundary. Previously we didn't redo the flush (we only flushed newly dirty data), which would cause a later sync to hang forever. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-19 09:23:25 -08:00
Sage Weil	24be0c4810	ceph: fix erroneous cap flush to non-auth mds The int flushing is global and not clear on each iteration of the loop, which can cause a second flush of caps to any MDSs with ids greater than the auth. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-19 09:23:24 -08:00
Sage Weil	50aac4fec5	ceph: fix cap_wanted_delay_{min,max} mount option initialization These were initialized to 0 instead of the default, fallout from the RBD refactor in `3d14c5d2b6`. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-19 09:23:22 -08:00
Steven Whitehouse	24d9765fc1	GFS2: Fix error path in gfs2_lookup_by_inum() In the (impossible, except if there is fs corruption) error path in gfs2_lookup_by_inum() if the call to gfs2_inode_refresh() fails, it was leaving the function by calling iput() rather than iget_failed(). This would cause future lookups of the same inode to block forever. This patch fixes the problem by moving the call to gfs2_inode_refresh() into gfs2_inode_lookup() where iget_failed() is part of the error path already. Also this cleans up some unreachable code and makes gfs2_set_iop() static. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-01-18 14:49:08 +00:00
Benjamin Marzinski	23c3010808	GFS2: remove iopen glocks from cache on failed deletes When a file gets deleted on GFS2, if a node can't get an exclusive lock on the file's iopen glock, it punts on actually freeing up the space, because another node is using the file. When it does this, it needs to drop the iopen glock from its cache so that the other node can get an exclusive lock on it. Now, gfs2_delete_inode() sets GL_NOCACHE before dropping the shared lock on the iopen glock in preparation for grabbing it in the exclusive state. Since the node needs the glock in the exclusive state, dropping the shared lock from the cache doesn't slow down the case where no other nodes are using the file. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-01-18 14:28:29 +00:00
Al Viro	b89b12b462	autofs4: clean ->d_release() and autofs4_free_ino() up The latter is called only when both ino and dentry are about to be freed, so cleaning ->d_fsdata and ->dentry is pointless. Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-18 01:21:29 -05:00
Al Viro	26e6c91067	autofs4: split autofs4_init_ino() split init_ino into new_ino and clean_ino; the former is what used to be init_ino(NULL, sbi), the latter is for cases where we passed non-NULL ino. Lose unused arguments. Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-18 01:21:28 -05:00
Al Viro	5a37db302e	autofs4: mkdir and symlink always get a dentry that had passed lookup ... so ->d_fsdata will have been set up before we get there Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-18 01:21:28 -05:00
Al Viro	726a5e0688	autofs4: autofs4_get_inode() doesn't need autofs_info * argument anymore Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-18 01:21:28 -05:00
Al Viro	0bf71d4d00	autofs4: kill ->size in autofs_info It's used only to pass the length of symlink body to autofs4_get_inode() in autofs4_dir_symlink(). We can bloody well set inode->i_size in autofs4_dir_symlink() directly and be done with that. Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-18 01:21:28 -05:00
Al Viro	09f12c03fa	autofs4: pass mode to autofs4_get_inode() explicitly In all cases we'd set inf->mode to know value just before passing it to autofs4_get_inode(). That kills the need to store it in autofs_info and pass it to autofs_init_ino() Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-18 01:21:27 -05:00
Al Viro	14a2f00bde	autofs4: autofs4_mkroot() is not different from autofs4_init_ino() Kill it. Mind you, it's been an obfuscated call of autofs4_init_ino() ever since 2.3.99pre6-4... Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-18 01:21:27 -05:00
Al Viro	292c5ee802	autofs4: keep symlink body in inode->i_private gets rid of all ->free()/->u.symlink machinery in autofs; we simply keep symlink bodies in inode->i_private and free them in ->evict_inode(). Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-18 01:21:27 -05:00
Ian Kent	c0bcc9d552	autofs4 - fix debug print in autofs4_lookup() oz_mode isn't defined any more, use autofs4_oz_mode(sbi) instead. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-18 01:21:27 -05:00
Ian Kent	8931221411	vfs - fix dentry ref count in do_lookup() There is a ref count problem in fs/namei.c:do_lookup(). When walking in ref-walk mode, if follow_managed() returns a fail we need to drop dentry and possibly vfsmount. Clean up properly, as we do in the other caller of follow_managed(). Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-18 01:21:26 -05:00
Ian Kent	c14cc63a63	autofs4 - fix get_next_positive_dentry() The initialization condition in fs/autofs4/expire.c:get_next_positive_dentry() appears to be incorrect. If prev == NULL I believe that root should be returned. Further down, at the current dentry check for it being simple_positive() it looks like the d_lock for dentry p should be dropped instead of dentry ret, otherwise when p is assinged to ret we end up with no lock on p and a lost lock on ret, which leads to a deadlock. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-18 01:21:26 -05:00
Linus Torvalds	eee2a817df	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (25 commits) Btrfs: forced readonly mounts on errors btrfs: Require CAP_SYS_ADMIN for filesystem rebalance Btrfs: don't warn if we get ENOSPC in btrfs_block_rsv_check btrfs: Fix memory leak in btrfs_read_fs_root_no_radix() btrfs: check NULL or not btrfs: Don't pass NULL ptr to func that may deref it. btrfs: mount failure return value fix btrfs: Mem leak in btrfs_get_acl() btrfs: fix wrong free space information of btrfs btrfs: make the chunk allocator utilize the devices better btrfs: restructure find_free_dev_extent() btrfs: fix wrong calculation of stripe size btrfs: try to reclaim some space when chunk allocation fails btrfs: fix wrong data space statistics fs/btrfs: Fix build of ctree Btrfs: fix off by one while setting block groups readonly Btrfs: Add BTRFS_IOC_SUBVOL_GETFLAGS/SETFLAGS ioctls Btrfs: Add readonly snapshots support Btrfs: Refactor btrfs_ioctl_snap_create() btrfs: Extract duplicate decompress code ...	2011-01-17 14:43:43 -08:00
Linus Torvalds	9e8a462a01	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6: ecryptfs: remove unnecessary decrypt when extending a file ecryptfs: Fix ecryptfs_printk() size_t warnings fs/ecryptfs: Add printf format/argument verification and fix fallout ecryptfs: fixed testing of file descriptor flags ecryptfs: test lower_file pointer when lower_file_mutex is locked ecryptfs: missing initialization of the superblock 'magic' field ecryptfs: moved ECRYPTFS_SUPER_MAGIC definition to linux/magic.h ecryptfs: fix truncation error in ecryptfs_read_update_atime	2011-01-17 12:39:57 -08:00
Geert Uytterhoeven	cf78859f52	xfs: Do not name variables "panic" On platforms that call panic() inside their BUG() macro (m68k/sun3, and all platforms that don't set HAVE_ARCH_BUG), compilation fails with: \| fs/xfs/support/debug.c: In function ‘xfs_cmn_err’: \| fs/xfs/support/debug.c:92: error: called object ‘panic’ is not a function as the local variable "panic" conflicts with the "panic()" function. Rename the local variable to resolve this. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-17 12:39:07 -08:00
liubo	acce952b02	Btrfs: forced readonly mounts on errors This patch comes from "Forced readonly mounts on errors" ideas. As we know, this is the first step in being more fault tolerant of disk corruptions instead of just using BUG() statements. The major content: - add a framework for generating errors that should result in filesystems going readonly. - keep FS state in disk super block. - make sure that all of resource will be freed and released at umount time. - make sure that fter FS is forced readonly on error, there will be no more disk change before FS is corrected. For this, we should stop write operation. After this patch is applied, the conversion from BUG() to such a framework can happen incrementally. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-17 15:13:08 -05:00
Linus Torvalds	1a47f7a84e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: add cruid= mount option cifs: cFYI the entire error code in map_smb_to_linux_error	2011-01-17 11:17:51 -08:00
Linus Torvalds	ab2020f2f1	Merge git://git.infradead.org/mtd-2.6 * git://git.infradead.org/mtd-2.6: (59 commits) mtd: mtdpart: disallow reading OOB past the end of the partition mtd: pxa3xx_nand: NULL dereference in pxa3xx_nand_probe UBI: use mtd->writebufsize to set minimal I/O unit size mtd: initialize writebufsize in the MTD object of a partition mtd: onenand: add mtd->writebufsize initialization mtd: nand: add mtd->writebufsize initialization mtd: cfi: add writebufsize initialization mtd: add writebufsize field to mtd_info struct mtd: OneNAND: OMAP2/3: prevent regulator sleeping while OneNAND is in use mtd: OneNAND: add enable / disable methods to onenand_chip mtd: m25p80: Fix JEDEC ID for AT26DF321 mtd: txx9ndfmc: limit transfer bytes to 512 (ECC provides 6 bytes max) mtd: cfi_cmdset_0002: add support for Samsung K8D3x16UxC NOR chips mtd: cfi_cmdset_0002: add support for Samsung K8D6x16UxM NOR chips mtd: nand: ams-delta: drop omap_read/write, use ioremap mtd: m25p80: add debugging trace in sst_write mtd: nand: ams-delta: select for built-in by default mtd: OneNAND: lighten scary initial bad block messages mtd: OneNAND: OMAP2/3: add support for command line partitioning mtd: nand: rearrange ONFI revision checking, add ONFI 2.3 ... Fix up trivial conflict in drivers/mtd/Kconfig as per DavidW.	2011-01-17 11:15:30 -08:00
Frank Swiderski	24562486be	ecryptfs: remove unnecessary decrypt when extending a file Removes an unecessary page decrypt from ecryptfs_begin_write when the page is beyond the current file size. Previously, the call to ecryptfs_decrypt_page would result in a read of 0 bytes, but still attempt to decrypt an entire page. This patch detects that case and merely zeros the page before marking it up-to-date. Signed-off-by: Frank Swiderski <fes@chromium.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-01-17 13:01:25 -06:00
Tyler Hicks	f24b38874e	ecryptfs: Fix ecryptfs_printk() size_t warnings Commit cb55d21f6fa19d8c6c2680d90317ce88c1f57269 revealed a number of missing 'z' length modifiers in calls to ecryptfs_printk() when printing variables of type size_t. This patch fixes those compiler warnings. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-01-17 13:01:24 -06:00
Joe Perches	888d57bbc9	fs/ecryptfs: Add printf format/argument verification and fix fallout Add __attribute__((format... to __ecryptfs_printk Make formats and arguments match. Add casts to (unsigned long long) for %llu. Signed-off-by: Joe Perches <joe@perches.com> [tyhicks: 80 columns cleanup and fixed typo] Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-01-17 13:01:23 -06:00
Roberto Sassu	0abe116947	ecryptfs: fixed testing of file descriptor flags This patch replaces the check (lower_file->f_flags & O_RDONLY) with ((lower_file & O_ACCMODE) == O_RDONLY). Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-01-17 11:24:43 -06:00
Roberto Sassu	27992890b0	ecryptfs: test lower_file pointer when lower_file_mutex is locked This patch prevents the lower_file pointer in the 'ecryptfs_inode_info' structure to be checked when the mutex 'lower_file_mutex' is not locked. Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-01-17 11:24:42 -06:00
Roberto Sassu	070baa5128	ecryptfs: missing initialization of the superblock 'magic' field This patch initializes the 'magic' field of ecryptfs filesystems to ECRYPTFS_SUPER_MAGIC. Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> [tyhicks: merge with `66cb76666d`] Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-01-17 11:23:01 -06:00
Roberto Sassu	2a8652f4e0	ecryptfs: moved ECRYPTFS_SUPER_MAGIC definition to linux/magic.h The definition of ECRYPTFS_SUPER_MAGIC has been moved to the include file 'linux/magic.h' to become available to other kernel subsystems. Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-01-17 10:44:31 -06:00
Edward Shishkin	38a708d775	ecryptfs: fix truncation error in ecryptfs_read_update_atime This is similar to the bug found in direct-io not so long ago. Fix up truncation (ssize_t->int). This only matters with >2G reads/writes, which the kernel doesn't permit. Signed-off-by: Edward Shishkin <edward.shishkin@gmail.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Eric Sandeen <esandeen@redhat.com> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-01-17 10:44:30 -06:00
Namhyung Kim	ecf5632dd1	fs: fix address space warnings in ioctl_fiemap() The fi_extents_start field of struct fiemap_extent_info is a user pointer but was not marked as __user. This makes sparse emit following warnings: CHECK fs/ioctl.c fs/ioctl.c:114:26: warning: incorrect type in argument 1 (different address spaces) fs/ioctl.c:114:26: expected void [noderef] <asn:1>dst fs/ioctl.c:114:26: got struct fiemap_extent [assigned] dest fs/ioctl.c:202:14: warning: incorrect type in argument 1 (different address spaces) fs/ioctl.c:202:14: expected void const volatile [noderef] <asn:1><noident> fs/ioctl.c:202:14: got struct fiemap_extent [assigned] fi_extents_start fs/ioctl.c:212:27: warning: incorrect type in argument 1 (different address spaces) fs/ioctl.c:212:27: expected void [noderef] <asn:1>dst fs/ioctl.c:212:27: got char <noident> Also add 'ufiemap' variable to eliminate unnecessary casts. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 08:21:42 -05:00
Namhyung Kim	27eaa1c90c	aio: check return value of create_workqueue() Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 05:12:44 -05:00
Dr. David Alan Gilbert	274052ef0b	hpfs_setattr error case avoids unlock_kernel This fixed a case that 'sparse' spotted where hpfs_setattr has an error return that didn't go through it's path that unlocks. This is against git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git version `6313e3c217`. Build tested only, I don't have an hpfs file system to test. Dave Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 05:11:37 -05:00
Namhyung Kim	e0bb6bda43	compat: copy missing fields in compat_statfs64 to user f_flags and f_spare fields were not copied to userspace when compat_sys_[f]statfs64 called. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 04:54:38 -05:00
Namhyung Kim	974d879e80	compat: update comment of compat statfs syscalls The commit `7ed1ee6118` ("Take statfs variants to fs/statfs.c") separates out statfs syscalls from fs/open.c. Thus the comment should be changed also. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Jiri Kosina <trivial@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 04:54:38 -05:00
Namhyung Kim	6a5640f102	compat: remove unnecessary assignment in compat_rw_copy_check_uvector() *@ret_pointer is initialized to @fast_pointer thus the assignment is redundant. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 04:54:38 -05:00
Randy Dunlap	16ebe911eb	fs: FS_POSIX_ACL does not depend on BLOCK - Fix a kconfig unmet dependency warning. - Remove the comment that identifies which filesystems use POSIX ACL utility routines. - Move the FS_POSIX_ACL symbol outside of the BLOCK symbol if/endif block because its functions do not depend on BLOCK and some of the filesystems that use it do not depend on BLOCK. warning: (GENERIC_ACL && JFFS2_FS_POSIX_ACL && NFSD_V4 && NFS_ACL_SUPPORT && 9P_FS_POSIX_ACL) selects FS_POSIX_ACL which has unmet direct dependencies (BLOCK) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 03:30:37 -05:00
Steven Rostedt	3bc0ba4305	fs: Remove unlikely() from fget_light() There's an unlikely() in fget_light() that assumes the file ref count will be 1. Running the annotate branch profiler on a desktop that is performing daily tasks (running firefox, evolution, xchat and is also part of a distcc farm), it shows that the ref count is not 1 that often. correct incorrect % Function File Line ------- --------- - -------- ---- ---- 1035099358 6209599193 85 fget_light file_table.c 315 Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 03:26:27 -05:00
Christoph Hellwig	2fe17c1075	fallocate should be a file operation Currently all filesystems except XFS implement fallocate asynchronously, while XFS forced a commit. Both of these are suboptimal - in case of O_SYNC I/O we really want our allocation on disk, especially for the !KEEP_SIZE case where we actually grow the file with user-visible zeroes. On the other hand always commiting the transaction is a bad idea for fast-path uses of fallocate like for example in recent Samba versions. Given that block allocation is a data plane operation anyway change it from an inode operation to a file operation so that we have the file structure available that lets us check for O_SYNC. This also includes moving the code around for a few of the filesystems, and remove the already unnedded S_ISDIR checks given that we only wire up fallocate for regular files. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 02:25:31 -05:00
Christoph Hellwig	64c23e8687	make the feature checks in ->fallocate future proof Instead of various home grown checks that might need updates for new flags just check for any bit outside the mask of the features supported by the filesystem. This makes the check future proof for any newly added flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 02:25:30 -05:00
Al Viro	b1e75df45a	tidy up around finish_automount() do_add_mount() and mnt_clear_expiry() are not needed outside of namespace.c anymore, now that namei has finish_automount() to use. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 01:47:59 -05:00
Al Viro	15f9a3f3e1	don't drop newmnt on error in do_add_mount() That gets rid of the kludge in finish_automount() - we need to keep refcount on the vfsmount as-is until we evict it from expiry list. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 01:41:58 -05:00
Al Viro	19a167af7c	Take the completion of automount into new helper ... and shift it from namei.c to namespace.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 01:35:23 -05:00
Linus Torvalds	8a335bc631	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/scsi-post-merge-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/scsi-post-merge-2.6: ocfs2: Make OCFS2_FS depend on CONFIGFS_FS dlm: Make DLM depend on CONFIGFS_FS net: Make NETCONSOLE_DYNAMIC depend on CONFIGFS_FS configfs: change depends -> select SYSFS [SCSI] sd,sr: kill compat SDEV_MEDIA_CHANGE event [SCSI] sd: implement sd_check_events()	2011-01-16 15:06:43 -08:00
Al Viro	7e3d0eb0b0	VFS: Fix UP compile error in fs/namespace.c mnt_longterm is there only on SMP Reported-and-tested-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-16 14:59:45 -08:00
Nicholas Bellinger	7b1fff7e4f	ocfs2: Make OCFS2_FS depend on CONFIGFS_FS This patch fixes the following kconfig error after changing CONFIGFS_FS -> select SYSFS: fs/sysfs/Kconfig:1:error: recursive dependency detected! fs/sysfs/Kconfig:1: symbol SYSFS is selected by CONFIGFS_FS fs/configfs/Kconfig:1: symbol CONFIGFS_FS is selected by OCFS2_FS fs/ocfs2/Kconfig:1: symbol OCFS2_FS depends on SYSFS Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org> Cc: Joel Becker <jlbec@evilplan.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: James Bottomley <James.Bottomley@suse.de>	2011-01-16 21:22:40 +00:00
Nicholas Bellinger	86c747d2a4	dlm: Make DLM depend on CONFIGFS_FS This patch fixes the following kconfig error after changing CONFIGFS_FS -> select SYSFS: fs/sysfs/Kconfig:1:error: recursive dependency detected! fs/sysfs/Kconfig:1: symbol SYSFS is selected by CONFIGFS_FS fs/configfs/Kconfig:1: symbol CONFIGFS_FS is selected by DLM fs/dlm/Kconfig:1: symbol DLM depends on SYSFS Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org> Cc: Joel Becker <jlbec@evilplan.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: James Bottomley <James.Bottomley@suse.de>	2011-01-16 21:22:37 +00:00
Nicholas Bellinger	e205117285	configfs: change depends -> select SYSFS This patch changes configfs to select SYSFS to fix the following: warning: (TARGET_CORE && GFS2_FS) selects CONFIGFS_FS which has unmet direct dependencies (SYSFS) Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org> Acked-by: Joel Becker <jlbec@evilplan.org>	2011-01-16 21:22:29 +00:00
Stefan Schmidt	f8b18087fd	fs/btrfs: Fix build of ctree Fix the build failure in some configurations: CC [M] fs/btrfs/ctree.o In file included from fs/btrfs/ctree.c:21:0: fs/btrfs/ctree.h:1003:17: error: field 'super_kobj' has incomplete type fs/btrfs/ctree.h:1074:17: error: field 'root_kobj' has incomplete type make[2]: * [fs/btrfs/ctree.o] Error 1 make[1]: * [fs/btrfs] Error 2 make: *** [fs] Error 2 caused by commit `57cc7215b7` ("headers: kobject.h redux") We need to include kobject.h here. Reported-by: Jeff Garzik <jeff@garzik.org> Fix-suggested-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-16 12:59:42 -08:00
Linus Torvalds	5520ebd308	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus * git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus: Squashfs: simplify CONFIG_SQUASHFS_LZO handling Squashfs: move squashfs_i() definition from squashfs.h Squashfs: get rid of default n in Kconfig Squashfs: add missing check in zlib_wrapper Squashfs: remove unnecessary variable in zlib_wrapper Squashfs: Add XZ compression configuration option Squashfs: add XZ compression support	2011-01-16 12:11:13 -08:00
Linus Torvalds	f8206b925f	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (23 commits) sanitize vfsmount refcounting changes fix old umount_tree() breakage autofs4: Merge the remaining dentry ops tables Unexport do_add_mount() and add in follow_automount(), not ->d_automount() Allow d_manage() to be used in RCU-walk mode Remove a further kludge from __do_follow_link() autofs4: Bump version autofs4: Add v4 pseudo direct mount support autofs4: Fix wait validation autofs4: Clean up autofs4_free_ino() autofs4: Clean up dentry operations autofs4: Clean up inode operations autofs4: Remove unused code autofs4: Add d_manage() dentry operation autofs4: Add d_automount() dentry operation Remove the automount through follow_link() kludge code from pathwalk CIFS: Use d_automount() rather than abusing follow_link() NFS: Use d_automount() rather than abusing follow_link() AFS: Use d_automount() rather than abusing follow_link() Add an AT_NO_AUTOMOUNT flag to suppress terminal automount ...	2011-01-16 11:31:50 -08:00
Al Viro	f03c65993b	sanitize vfsmount refcounting changes Instead of splitting refcount between (per-cpu) mnt_count and (SMP-only) mnt_longrefs, make all references contribute to mnt_count again and keep track of how many are longterm ones. Accounting rules for longterm count: * 1 for each fs_struct.root.mnt * 1 for each fs_struct.pwd.mnt * 1 for having non-NULL ->mnt_ns * decrement to 0 happens only under vfsmount lock exclusive That allows nice common case for mntput() - since we can't drop the final reference until after mnt_longterm has reached 0 due to the rules above, mntput() can grab vfsmount lock shared and check mnt_longterm. If it turns out to be non-zero (which is the common case), we know that this is not the final mntput() and can just blindly decrement percpu mnt_count. Otherwise we grab vfsmount lock exclusive and do usual decrement-and-check of percpu mnt_count. For fs_struct.c we have mnt_make_longterm() and mnt_make_shortterm(); namespace.c uses the latter in places where we don't already hold vfsmount lock exclusive and opencodes a few remaining spots where we need to manipulate mnt_longterm. Note that we mostly revert the code outside of fs/namespace.c back to what we used to have; in particular, normal code doesn't need to care about two kinds of references, etc. And we get to keep the optimization Nick's variant had bought us... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-16 13:47:07 -05:00
Al Viro	7b8a53fd81	fix old umount_tree() breakage Expiry-related code calls umount_tree() several times with the same list to collect vfsmounts to. Which is fine, except that umount_tree() implicitly assumed that the list would be empty on each call - it moves the victims over there and then iterates through the list kicking them out. It's almost idempotent, so everything nearly worked. However, mnt->ghosts handling (and thus expirability checks) had been broken - that part was not idempotent... The fix is trivial - use local temporary list, splice it to the the collector list when we are through. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-16 13:47:01 -05:00
Ben Hutchings	6f88a4403d	btrfs: Require CAP_SYS_ADMIN for filesystem rebalance Filesystem rebalancing (BTRFS_IOC_BALANCE) affects the entire filesystem and may run uninterruptibly for a long time. This does not seem to be something that an unprivileged user should be able to do. Reported-by: Aron Xu <happyaron.xu@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:20 -05:00
Josef Bacik	f690efb1aa	Btrfs: don't warn if we get ENOSPC in btrfs_block_rsv_check If we run low on space we could get a bunch of warnings out of btrfs_block_rsv_check, but this is mostly just called via the transaction code to see if we need to end the transaction, it expects to see failures, so let's not WARN and freak everybody out for no reason. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:20 -05:00
Tsutomu Itoh	5e540f7715	btrfs: Fix memory leak in btrfs_read_fs_root_no_radix() In btrfs_read_fs_root_no_radix(), 'root' is not freed if btrfs_search_slot() returns error. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:20 -05:00
Tsutomu Itoh	91ca338d77	btrfs: check NULL or not Should check if functions returns NULL or not. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:20 -05:00
Jesper Juhl	ff175d57f0	btrfs: Don't pass NULL ptr to func that may deref it. Hi, In fs/btrfs/inode.c::fixup_tree_root_location() we have this code: ... if (!path) { err = -ENOMEM; goto out; } ... out: btrfs_free_path(path); return err; btrfs_free_path() passes its argument on to other functions and some of them end up dereferencing the pointer. In the code above that pointer is clearly NULL, so btrfs_free_path() will eventually cause a NULL dereference. There are many ways to cut this cake (fix the bug). The one I chose was to make btrfs_free_path() deal gracefully with NULL pointers. If you disagree, feel free to come up with an alternative patch. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:20 -05:00
Dave Young	20b450773d	btrfs: mount failure return value fix I happened to pass swap partition as root partition in cmdline, then kernel panic and tell me about "Cannot open root device". It is not correct, in fact it is a fs type mismatch instead of 'no device'. Eventually I found btrfs mounting failed with -EIO, it should be -EINVAL. The logic in init/do_mounts.c: for (p = fs_names; *p; p += strlen(p)+1) { int err = do_mount_root(name, p, flags, root_mount_data); switch (err) { case 0: goto out; case -EACCES: flags \|= MS_RDONLY; goto retry; case -EINVAL: continue; } print "Cannot open root device" panic } SO fs type after btrfs will have no chance to mount Here fix the return value as -EINVAL Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:19 -05:00
Jesper Juhl	42838bb265	btrfs: Mem leak in btrfs_get_acl() It seems to me that we leak the memory allocated to 'value' in btrfs_get_acl() if the call to posix_acl_from_xattr() fails. Here's a patch that attempts to correct that problem. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:19 -05:00
Miao Xie	6d07bcec96	btrfs: fix wrong free space information of btrfs When we store data by raid profile in btrfs with two or more different size disks, df command shows there is some free space in the filesystem, but the user can not write any data in fact, df command shows the wrong free space information of btrfs. # mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10 # btrfs-show Label: none uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64 Total devices 2 FS bytes used 28.00KB devid 1 size 5.01GB used 2.03GB path /dev/sda9 devid 2 size 10.00GB used 2.01GB path /dev/sda10 # btrfs device scan /dev/sda9 /dev/sda10 # mount /dev/sda9 /mnt # dd if=/dev/zero of=tmpfile0 bs=4K count=9999999999 (fill the filesystem) # sync # df -TH Filesystem Type Size Used Avail Use% Mounted on /dev/sda9 btrfs 17G 8.6G 5.4G 62% /mnt # btrfs-show Label: none uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64 Total devices 2 FS bytes used 3.99GB devid 1 size 5.01GB used 5.01GB path /dev/sda9 devid 2 size 10.00GB used 4.99GB path /dev/sda10 It is because btrfs cannot allocate chunks when one of the pairing disks has no space, the free space on the other disks can not be used for ever, and should be subtracted from the total space, but btrfs doesn't subtract this space from the total. It is strange to the user. This patch fixes it by calcing the free space that can be used to allocate chunks. Implementation: 1. get all the devices free space, and align them by stripe length. 2. sort the devices by the free space. 3. check the free space of the devices, 3.1. if it is not zero, and then check the number of the devices that has more free space than this device, if the number of the devices is beyond the min stripe number, the free space can be used, and add into total free space. if the number of the devices is below the min stripe number, we can not use the free space, the check ends. 3.2. if the free space is zero, check the next devices, goto 3.1 This implementation is just likely fake chunk allocation. After appling this patch, df can show correct space information: # df -TH Filesystem Type Size Used Avail Use% Mounted on /dev/sda9 btrfs 17G 8.6G 0 100% /mnt Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:19 -05:00
Miao Xie	b2117a39fa	btrfs: make the chunk allocator utilize the devices better With this patch, we change the handling method when we can not get enough free extents with default size. Implementation: 1. Look up the suitable free extent on each device and keep the search result. If not find a suitable free extent, keep the max free extent 2. If we get enough suitable free extents with default size, chunk allocation succeeds. 3. If we can not get enough free extents, but the number of the extent with default size is >= min_stripes, we just change the mapping information (reduce the number of stripes in the extent map), and chunk allocation succeeds. 4. If the number of the extent with default size is < min_stripes, sort the devices by its max free extent's size descending 5. Use the size of the max free extent on the (num_stripes - 1)th device as the stripe size to allocate the device space By this way, the chunk allocator can allocate chunks as large as possible when the devices' space is not enough and make full use of the devices. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:19 -05:00
Miao Xie	7bfc837df9	btrfs: restructure find_free_dev_extent() - make it return the start position and length of the max free space when it can not find a suitable free space. - make it more readability Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:19 -05:00
Miao Xie	1974a3b42d	btrfs: fix wrong calculation of stripe size There are two tiny problem: - One is When we check the chunk size is greater than the max chunk size or not, we should take mirrors into account, but the original code didn't. - The other is btrfs shouldn't use the size of the residual free space as the length of of a dup chunk when doing chunk allocation. It is because the device space that a dup chunk needs is twice as large as the chunk size, if we use the size of the residual free space as the length of a dup chunk, we can not get enough free space. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:19 -05:00
Miao Xie	d52a5b5f1f	btrfs: try to reclaim some space when chunk allocation fails We cannot write data into files when when there is tiny space in the filesystem. Reproduce steps: # mkfs.btrfs /dev/sda1 # mount /dev/sda1 /mnt # dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=1 # dd if=/dev/zero of=/mnt/tmpfile1 bs=4K count=99999999999999 (fill the filesystem) # umount /mnt # mount /dev/sda1 /mnt # rm -f /mnt/tmpfile0 # dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=1 (failed with nospec) But if we do the last step again, we can write data successfully. The reason of the problem is that btrfs didn't try to commit the current transaction and reclaim some space when chunk allocation failed. This patch fixes it by committing the current transaction to reclaim some space when chunk allocation fails. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:19 -05:00
Miao Xie	299a08b1c3	btrfs: fix wrong data space statistics Josef has implemented mixed data/metadata chunks, we must add those chunks' space just like data chunks. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:19 -05:00
Stefan Schmidt	f580eb0931	fs/btrfs: Fix build of ctree CC [M] fs/btrfs/ctree.o In file included from fs/btrfs/ctree.c:21:0: fs/btrfs/ctree.h:1003:17: error: field <91>super_kobj<92> has incomplete type fs/btrfs/ctree.h:1074:17: error: field <91>root_kobj<92> has incomplete type make[2]: * [fs/btrfs/ctree.o] Error 1 make[1]: * [fs/btrfs] Error 2 make: *** [fs] Error 2 We need to include kobject.h here. Reported-by: Jeff Garzik <jeff@garzik.org> Fix-suggested-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-01-16 11:30:19 -05:00
Chris Mason	f892436eb2	Merge branch 'lzo-support' of git://repo.or.cz/linux-btrfs-devel into btrfs-38	2011-01-16 11:25:54 -05:00
Chris Mason	26c79f6ba0	Merge branch 'readonly-snapshots' of git://repo.or.cz/linux-btrfs-devel into btrfs-38	2011-01-16 11:24:45 -05:00
David Howells	b650c858c2	autofs4: Merge the remaining dentry ops tables Merge the remaining autofs4 dentry ops tables. It doesn't matter if d_automount and d_manage are present on something that's not mountable or holdable as these ops are only used if the appropriate flags are set in dentry->d_flags. [AV] switch to ->s_d_op, since now _everything_ on autofs4 is using the same dentry_operations. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:49 -05:00
David Howells	ea5b778a8b	Unexport do_add_mount() and add in follow_automount(), not ->d_automount() Unexport do_add_mount() and make ->d_automount() return the vfsmount to be added rather than calling do_add_mount() itself. follow_automount() will then do the addition. This slightly complicates things as ->d_automount() normally wants to add the new vfsmount to an expiration list and start an expiration timer. The problem with that is that the vfsmount will be deleted if it has a refcount of 1 and the timer will not repeat if the expiration list is empty. To this end, we require the vfsmount to be returned from d_automount() with a refcount of (at least) 2. One of these refs will be dropped unconditionally. In addition, follow_automount() must get a 3rd ref around the call to do_add_mount() lest it eat a ref and return an error, leaving the mount we have open to being expired as we would otherwise have only 1 ref on it. d_automount() should also add the the vfsmount to the expiration list (by calling mnt_set_expiry()) and start the expiration timer before returning, if this mechanism is to be used. The vfsmount will be unlinked from the expiration list by follow_automount() if do_add_mount() fails. This patch also fixes the call to do_add_mount() for AFS to propagate the mount flags from the parent vfsmount. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:48 -05:00
David Howells	ab90911ff9	Allow d_manage() to be used in RCU-walk mode Allow d_manage() to be called from pathwalk when it is in RCU-walk mode as well as when it is in Ref-walk mode. This permits __follow_mount_rcu() to call d_manage() directly. d_manage() needs a parameter to indicate that it is in RCU-walk mode as it isn't allowed to sleep if in that mode (but should return -ECHILD instead). autofs4_d_manage() can then be set to retain RCU-walk mode if the daemon accesses it and otherwise request dropping back to ref-walk mode. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:47 -05:00
David Howells	87556ef199	Remove a further kludge from __do_follow_link() Remove a further kludge from __do_follow_link() as it's no longer required with the automount code. This reverts the non-helper-function parts of `051d381259`, which breaks union mounts. Reported-by: vaurora@redhat.com Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:46 -05:00
Ian Kent	dd89f90d2d	autofs4: Add v4 pseudo direct mount support Version 4 of autofs provides a pseudo direct mount implementation that relies on directories at the leaves of a directory tree under an indirect mount to trigger mounts. This patch adds support for that functionality. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:44 -05:00
Ian Kent	9e3fea16ba	autofs4: Fix wait validation It is possible for the check in wait.c:validate_request() to return an incorrect result if the dentry that was mounted upon has changed during the callback. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:43 -05:00
Ian Kent	6651149371	autofs4: Clean up autofs4_free_ino() When this function is called the local reference count does't need to be updated since the dentry is going away and dput definitely must not be called here. Also the autofs info struct field inode isn't used so remove it. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:42 -05:00
Ian Kent	71e469db24	autofs4: Clean up dentry operations There are now two distinct dentry operations uses. One for dentrys that trigger mounts and one for dentrys that do not. Rationalize the use of these dentry operations and rename them to reflect their function. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:41 -05:00
Ian Kent	e61da20a50	autofs4: Clean up inode operations Since the use of ->follow_link() has been eliminated there is no need to separate the indirect and direct inode operations. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:40 -05:00
Ian Kent	8c13a676d5	autofs4: Remove unused code Remove code that is not used due to the use of ->d_automount() and ->d_manage(). Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:39 -05:00
Ian Kent	b5b801779d	autofs4: Add d_manage() dentry operation This patch required a previous patch to add the ->d_automount() dentry operation. Add a function to use the newly defined ->d_manage() dentry operation for blocking during mount and expire. Whether the VFS calls the dentry operations d_automount() and d_manage() is controled by the DMANAGED_AUTOMOUNT and DMANAGED_TRANSIT flags. autofs uses the d_automount() operation to callback to user space to request mount operations and the d_manage() operation to block walks into mounts that are under construction or destruction. In order to prevent these functions from being called unnecessarily the DMANAGED_* flags are cleared for cases which would cause this. In the common case the DMANAGED_AUTOMOUNT and DMANAGED_TRANSIT flags are both set for dentrys waiting to be mounted. The DMANAGED_TRANSIT flag is cleared upon successful mount request completion and set during expire runs, both during the dentry expire check, and if selected for expire, is left set until a subsequent successful mount request completes. The exception to this is the so-called rootless multi-mount which has no actual mount at its base. In this case the DMANAGED_AUTOMOUNT flag is cleared upon successful mount request completion as well and set again after a successful expire. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:38 -05:00
Ian Kent	10584211e4	autofs4: Add d_automount() dentry operation Add a function to use the newly defined ->d_automount() dentry operation for triggering mounts instead of doing the user space callback in ->lookup() and ->d_revalidate(). Note, to be useful the subsequent patch to add the ->d_manage() dentry operation is also needed so the discussion of functionality is deferred to that patch. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:37 -05:00
David Howells	db3729153e	Remove the automount through follow_link() kludge code from pathwalk Remove the automount through follow_link() kludge code from pathwalk in favour of using d_automount(). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:36 -05:00
David Howells	01c64feac4	CIFS: Use d_automount() rather than abusing follow_link() Make CIFS use the new d_automount() dentry operation rather than abusing follow_link() on directories. [NOTE: THIS IS UNTESTED!] Signed-off-by: David Howells <dhowells@redhat.com> Cc: Steve French <sfrench@samba.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:35 -05:00
David Howells	36d43a4376	NFS: Use d_automount() rather than abusing follow_link() Make NFS use the new d_automount() dentry operation rather than abusing follow_link() on directories. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:34 -05:00
David Howells	d18610b0ce	AFS: Use d_automount() rather than abusing follow_link() Make AFS use the new d_automount() dentry operation rather than abusing follow_link() on directories. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:33 -05:00
David Howells	6f45b65672	Add an AT_NO_AUTOMOUNT flag to suppress terminal automount Add an AT_NO_AUTOMOUNT flag to suppress terminal automounting of automount point directories. This can be used by fstatat() users to permit the gathering of attributes on an automount point and also prevent mass-automounting of a directory of automount points by ls. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:33 -05:00
David Howells	cc53ce53c8	Add a dentry op to allow processes to be held during pathwalk transit Add a dentry op (d_manage) to permit a filesystem to hold a process and make it sleep when it tries to transit away from one of that filesystem's directories during a pathwalk. The operation is keyed off a new dentry flag (DCACHE_MANAGE_TRANSIT). The filesystem is allowed to be selective about which processes it holds and which it permits to continue on or prohibits from transiting from each flagged directory. This will allow autofs to hold up client processes whilst letting its userspace daemon through to maintain the directory or the stuff behind it or mounted upon it. The ->d_manage() dentry operation: int (d_manage)(struct path path, bool mounting_here); takes a pointer to the directory about to be transited away from and a flag indicating whether the transit is undertaken by do_add_mount() or do_move_mount() skipping through a pile of filesystems mounted on a mountpoint. It should return 0 if successful and to let the process continue on its way; -EISDIR to prohibit the caller from skipping to overmounted filesystems or automounting, and to use this directory; or some other error code to return to the user. ->d_manage() is called with namespace_sem writelocked if mounting_here is true and no other locks held, so it may sleep. However, if mounting_here is true, it may not initiate or wait for a mount or unmount upon the parameter directory, even if the act is actually performed by userspace. Within fs/namei.c, follow_managed() is extended to check with d_manage() first on each managed directory, before transiting away from it or attempting to automount upon it. follow_down() is renamed follow_down_one() and should only be used where the filesystem deliberately intends to avoid management steps (e.g. autofs). A new follow_down() is added that incorporates the loop done by all other callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS and CIFS do use it, their use is removed by converting them to use d_automount()). The new follow_down() calls d_manage() as appropriate. It also takes an extra parameter to indicate if it is being called from mount code (with namespace_sem writelocked) which it passes to d_manage(). follow_down() ignores automount points so that it can be used to mount on them. __follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to sleep. It would be possible to enter d_manage() in rcu-walk mode too, and have that determine whether to abort or not itself. That would allow the autofs daemon to continue on in rcu-walk mode. Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't required as every tranist from that directory will cause d_manage() to be invoked. It can always be set again when necessary. ========================== WHAT THIS MEANS FOR AUTOFS ========================== Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to trigger the automounting of indirect mounts, and both of these can be called with i_mutex held. autofs knows that the i_mutex will be held by the caller in lookup(), and so can drop it before invoking the daemon - but this isn't so for d_revalidate(), since the lock is only held on _some_ of the code paths that call it. This means that autofs can't risk dropping i_mutex from its d_revalidate() function before it calls the daemon. The bug could manifest itself as, for example, a process that's trying to validate an automount dentry that gets made to wait because that dentry is expired and needs cleaning up: mkdir S ffffffff8014e05a 0 32580 24956 Call Trace: [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f [<ffffffff80057a2f>] lookup_create+0x46/0x80 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4 versus the automount daemon which wants to remove that dentry, but can't because the normal process is holding the i_mutex lock: automount D ffffffff8014e05a 0 32581 1 32561 Call Trace: [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14 [<ffffffff800e6d55>] do_rmdir+0x77/0xde [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 which means that the system is deadlocked. This patch allows autofs to hold up normal processes whilst the daemon goes ahead and does things to the dentry tree behind the automouter point without risking a deadlock as almost no locks are held in d_manage() and none in d_automount(). Signed-off-by: David Howells <dhowells@redhat.com> Was-Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:31 -05:00
David Howells	9875cf8064	Add a dentry op to handle automounting rather than abusing follow_link() Add a dentry op (d_automount) to handle automounting directories rather than abusing the follow_link() inode operation. The operation is keyed off a new dentry flag (DCACHE_NEED_AUTOMOUNT). This also makes it easier to add an AT_ flag to suppress terminal segment automount during pathwalk and removes the need for the kludge code in the pathwalk algorithm to handle directories with follow_link() semantics. The ->d_automount() dentry operation: struct vfsmount (d_automount)(struct path mountpoint); takes a pointer to the directory to be mounted upon, which is expected to provide sufficient data to determine what should be mounted. If successful, it should return the vfsmount struct it creates (which it should also have added to the namespace using do_add_mount() or similar). If there's a collision with another automount attempt, NULL should be returned. If the directory specified by the parameter should be used directly rather than being mounted upon, -EISDIR should be returned. In any other case, an error code should be returned. The ->d_automount() operation is called with no locks held and may sleep. At this point the pathwalk algorithm will be in ref-walk mode. Within fs/namei.c itself, a new pathwalk subroutine (follow_automount()) is added to handle mountpoints. It will return -EREMOTE if the automount flag was set, but no d_automount() op was supplied, -ELOOP if we've encountered too many symlinks or mountpoints, -EISDIR if the walk point should be used without mounting and 0 if successful. The path will be updated to point to the mounted filesystem if a successful automount took place. __follow_mount() is replaced by follow_managed() which is more generic (especially with the patch that adds ->d_manage()). This handles transits from directories during pathwalk, including automounting and skipping over mountpoints (and holding processes with the next patch). __follow_mount_rcu() will jump out of RCU-walk mode if it encounters an automount point with nothing mounted on it. follow_dotdot() does not handle automounts as you don't want to trigger them whilst following "..". I've also extracted the mount/don't-mount logic from autofs4 and included it here. It makes the mount go ahead anyway if someone calls open() or creat(), tries to traverse the directory, tries to chdir/chroot/etc. into the directory, or sticks a '/' on the end of the pathname. If they do a stat(), however, they'll only trigger the automount if they didn't also say O_NOFOLLOW. I've also added an inode flag (S_AUTOMOUNT) so that filesystems can mark their inodes as automount points. This flag is automatically propagated to the dentry as DCACHE_NEED_AUTOMOUNT by __d_instantiate(). This saves NFS and could save AFS a private flag bit apiece, but is not strictly necessary. It would be preferable to do the propagation in d_set_d_op(), but that doesn't normally have access to the inode. [AV: fixed breakage in case if __follow_mount_rcu() fails and nameidata_drop_rcu() succeeds in RCU case of do_lookup(); we need to fall through to non-RCU case after that, rather than just returning with ungrabbed *path] Signed-off-by: David Howells <dhowells@redhat.com> Was-Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:05:03 -05:00
Al Viro	1a8edf40e7	do_lookup() fix do_lookup() has a path leading from LOOKUP_RCU case to non-RCU crossing of mountpoints, which breaks things badly. If we hit need_revalidate: and do nothing in there, we need to come back into LOOKUP_RCU half of things, not to done: in non-RCU one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:03:39 -05:00
Linus Torvalds	7cb3920a65	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: prevent NMI timeouts in cmn_err xfs: Add log level to assertion printk xfs: fix an assignment within an ASSERT() xfs: fix error handling for synchronous writes xfs: add FITRIM support xfs: ensure log covering transactions are synchronous xfs: serialise unaligned direct IOs xfs: factor common write setup code xfs: split buffered IO write path from xfs_file_aio_write xfs: split direct IO write path from xfs_file_aio_write xfs: introduce xfs_rw_lock() helpers for locking the inode xfs: factor post-write newsize updates xfs: factor common post-write isize handling code xfs: ensure sync write errors are returned	2011-01-14 15:24:17 -08:00
Linus Torvalds	6ab8219649	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: restore multiple bd_link_disk_holder() support block cfq: compensate preempted queue even if it has no slice assigned block cfq: make queue preempt work for queues from different workload	2011-01-14 13:32:07 -08:00
Linus Torvalds	6f7f7caab2	Turn d_set_d_op() BUG_ON() into WARN_ON_ONCE() It's indicative of a real problem, and it actually triggers with autofs4, but the BUG_ON() is excessive. The autofs4 case is being fixed (to only set d_op in the ->lookup method) but not merged yet. In the meantime this gets the code limping along. Reported-by: Alex Elder <aelder@sgi.com> Cc: Ian Kent <raven@themaw.net> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-14 13:26:18 -08:00
Linus Torvalds	18bce371ae	Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux * 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: (62 commits) nfsd4: fix callback restarting nfsd: break lease on unlink, link, and rename nfsd4: break lease on nfsd setattr nfsd: don't support msnfs export option nfsd4: initialize cb_per_client nfsd4: allow restarting callbacks nfsd4: simplify nfsd4_cb_prepare nfsd4: give out delegations more quickly in 4.1 case nfsd4: add helper function to run callbacks nfsd4: make sure sequence flags are set after destroy_session nfsd4: re-probe callback on connection loss nfsd4: set sequence flag when backchannel is down nfsd4: keep finer-grained callback status rpc: allow xprt_class->setup to return a preexisting xprt rpc: keep backchannel xprt as long as server connection rpc: move sk_bc_xprt to svc_xprt nfsd4: allow backchannel recovery nfsd4: support BIND_CONN_TO_SESSION nfsd4: modify session list under cl_lock Documentation: fl_mylease no longer exists ... Fix up conflicts in fs/nfsd/vfs.c with the vfs-scale work. The vfs-scale work touched some msnfs cases, and this merge removes support for that entirely, so the conflict was trivial to resolve.	2011-01-14 13:17:26 -08:00
J. Bruce Fields	a8f2800b4f	nfsd4: fix callback restarting Ensure a new callback is added to the client's list of callbacks at most once. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-01-14 14:51:31 -05:00
Jeff Layton	bd76331955	cifs: add cruid= mount option In commit `3e4b3e1f` we separated the "uid" mount option such that it no longer determined the owner of the credential cache by default. When we did this, we added a new option to cifs.upcall (--legacy-uid) to try to make it so that it would behave the same was as it did before. This ignored a rather important point -- the kernel has no way to know what options are being passed to cifs.upcall, so it doesn't know what uid it should use to determine whether to match an existing krb5 session. The simplest solution is to simply add a new "cruid=" mount option that only governs the uid owner of the credential cache for the mount. Unfortunately, this means that the --legacy-uid option in cifs.upcall was ill-considered and is now useless, but I don't see a better way to deal with this. A patch for the mount.cifs manpage will follow once this patch has been accepted. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-14 18:51:11 +00:00
Jeff Layton	56c24305d1	cifs: cFYI the entire error code in map_smb_to_linux_error We currently only print the DOS error part. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-01-14 18:51:11 +00:00
Tejun Heo	49731baa41	block: restore multiple bd_link_disk_holder() support Commit `e09b457b` (block: simplify holder symlink handling) incorrectly assumed that there is only one link at maximum. dm may use multiple links and expects block layer to track reference count for each link, which is different from and unrelated to the exclusive device holder identified by @holder when the device is opened. Remove the single holder assumption and automatic removal of the link and revive the per-link reference count tracking. The code essentially behaves the same as before commit `e09b457b` sans the unnecessary kobject reference count dancing. While at it, note that this facility should not be used by anyone else than the current ones. Sysfs symlinks shouldn't be abused like this and the whole thing doesn't belong in the block layer at all. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Milan Broz <mbroz@redhat.com> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Neil Brown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-01-14 18:44:22 +01:00
Tejun Heo	0ad53eeefc	afs: add afs_wq and use it instead of the system workqueue flush_scheduled_work() is going away. afs needs to make sure all the works it has queued have finished before being unloaded and there can be arbitrary number of pending works. Add afs_wq and use it as the flush domain instead of the system workqueue. Also, convert cancel_delayed_work() + flush_scheduled_work() to cancel_delayed_work_sync() in afs_mntpt_kill_timer(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Cc: linux-afs@lists.infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-14 09:25:11 -08:00
Akshat Aranya	ba28b93a52	FS-Cache: Fix operation handling fscache_submit_exclusive_op() adds an operation to the pending list if other operations are pending. Fix the check for pending ops as n_ops must be greater than 0 at the point it is checked as it is incremented immediately before under lock. Signed-off-by: Akshat Aranya <aranya@nec-labs.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-14 09:23:36 -08:00
Linus Torvalds	acda4721ae	Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: kernel: fix hlist_bl again cgroups: Fix a lockdep warning at cgroup removal fs: namei fix ->put_link on wrong inode in do_filp_open	2011-01-14 09:08:29 -08:00
Nick Piggin	7b9337aaf9	fs: namei fix ->put_link on wrong inode in do_filp_open J. R. Okajima noticed that ->put_link is being attempted on the wrong inode, and suggested the way to fix it. I changed it a bit according to Al's suggestion to keep an explicit link path around. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-14 08:42:43 +00:00
Linus Torvalds	db9effe99a	Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: fs: fix do_last error case when need_reval_dot nfs: add missing rcu-walk check fs: hlist UP debug fixup fs: fix dropping of rcu-walk from force_reval_path fs: force_reval_path drop rcu-walk before d_invalidate fs: small rcu-walk documentation fixes Fixed up trivial conflicts in Documentation/filesystems/porting	2011-01-13 20:14:13 -08:00
J. R. Okajima	f20877d94a	fs: fix do_last error case when need_reval_dot When open(2) without O_DIRECTORY opens an existing dir, it should return EISDIR. In do_last(), the variable 'error' is initialized EISDIR, but it is changed by d_revalidate() which returns any positive to represent 'the target dir is valid.' Should we keep and return the initialized 'error' in this case. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-14 03:56:04 +00:00
Nick Piggin	657e94b673	nfs: add missing rcu-walk check Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-14 02:48:39 +00:00
Nick Piggin	90dbb77ba4	fs: fix dropping of rcu-walk from force_reval_path As J. R. Okajima noted, force_reval_path passes in the same dentry to d_revalidate as the one in the nameidata structure (other callers pass in a child), so the locking breaks. This can oops with a chrooted nfs mount, for example. Similarly there can be other problems with revalidating a dentry which is already in nameidata of the path walk. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-14 02:36:19 +00:00
Nick Piggin	bb20c18db6	fs: force_reval_path drop rcu-walk before d_invalidate d_revalidate can return in rcu-walk mode even when it returns 0. We can't just call any old dcache function on rcu-walk dentry (the dentry is unstable, so even through d_lock can safely be taken, the result may no longer be what we expect -- careful re-checks would be required). So just drop rcu in this case. (I missed this conversion when switching to the rcu-walk convention that Linus suggested) Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-14 02:35:53 +00:00
J. Bruce Fields	4795bb37ef	nfsd: break lease on unlink, link, and rename Any change to any of the links pointing to an entry should also break delegations. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-01-13 21:04:09 -05:00
J. Bruce Fields	6a76bebefe	nfsd4: break lease on nfsd setattr Leases (delegations) should really be broken on any metadata change, not just on size change. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-01-13 21:04:08 -05:00
J. Bruce Fields	9ce137eee4	nfsd: don't support msnfs export option We've long had these pointless #ifdef MSNFS's sprinkled throughout the code--pointless because MSNFS is always defined (and we give no config option to make that easy to change). So we could just remove the ifdef's and compile the resulting code unconditionally. But as long as we're there: why not just rip out this code entirely? The only purpose is to implement the "msnfs" export option which turns on Windows-like behavior in some cases, and: - the export option isn't documented anywhere; - the userland utilities (which would need to be able to parse "msnfs" in an export file) don't support it; - I don't know how to maintain this, as I don't know what the proper behavior is; and - google shows no evidence that anyone has ever used this. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-01-13 21:04:07 -05:00
J. Bruce Fields	9ee1ba5402	nfsd4: initialize cb_per_client Otherwise a callback that is aborted before it runs will result in a list_del on an uninitialized list head. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-01-13 21:04:06 -05:00
Stefan Hajnoczi	cb9ef8d5e3	fs/fs-writeback.c: fix sync_inodes_sb() return value kernel-doc The sync_inodes_sb() function does not have a return value. Remove the outdated documentation comment. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:48 -08:00
Andrea Arcangeli	5f24ce5fd3	thp: remove PG_buddy PG_buddy can be converted to _mapcount == -2. So the PG_compound_lock can be added to page->flags without overflowing (because of the sparse section bits increasing) with CONFIG_X86_PAE=y and CONFIG_X86_PAT=y. This also has to move the memory hotplug code from _mapcount to lru.next to avoid any risk of clashes. We can't use lru.next for PG_buddy removal, but memory hotplug can use lru.next even more easily than the mapcount instead. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:43 -08:00
Andrea Arcangeli	79134171df	thp: transparent hugepage vmstat Add hugepage stat information to /proc/vmstat and /proc/meminfo. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:43 -08:00
Mandeep Singh Baines	dabb16f639	oom: allow a non-CAP_SYS_RESOURCE proces to oom_score_adj down We'd like to be able to oom_score_adj a process up/down as it enters/leaves the foreground. Currently, it is not possible to oom_adj down without CAP_SYS_RESOURCE. This patch allows a task to decrease its oom_score_adj back to the value that a CAP_SYS_RESOURCE thread set it to or its inherited value at fork. Assuming the thread that has forked it has oom_score_adj of 0, each process could decrease it back from 0 upon activation unless a CAP_SYS_RESOURCE thread elevated it to something higher. Alternative considered: * a setuid binary * a daemon with CAP_SYS_RESOURCE Since you don't wan't all processes to be able to reduce their oom_adj, a setuid or daemon implementation would be complex. The alternatives also have much higher overhead. This patch updated from original patch based on feedback from David Rientjes. Signed-off-by: Mandeep Singh Baines <msb@chromium.org> Acked-by: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:35 -08:00
Nikanth Karthikesan	2d90508f63	mm: smaps: export mlock information Currently there is no way to find whether a process has locked its pages in memory or not. And which of the memory regions are locked in memory. Add a new field "Locked" to export this information via the smaps file. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:33 -08:00
Hai Shan	c32b0d4b3f	fs/mpage.c: consolidate code Merge mpage_end_io_read() and mpage_end_io_write() into mpage_end_io() to eliminate code duplication. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Hai Shan <shan.hai@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:32 -08:00
Andrew Morton	c691b9d983	sync_inode_metadata: fix comment Use correct function name, remove incorrect apostrophe Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:32 -08:00
Jan Kara	b9543dac5b	writeback: avoid livelocking WB_SYNC_ALL writeback When wb_writeback() is called in WB_SYNC_ALL mode, work->nr_to_write is usually set to LONG_MAX. The logic in wb_writeback() then calls __writeback_inodes_sb() with nr_to_write == MAX_WRITEBACK_PAGES and we easily end up with non-positive nr_to_write after the function returns, if the inode has more than MAX_WRITEBACK_PAGES dirty pages at the moment. When nr_to_write is <= 0 wb_writeback() decides we need another round of writeback but this is wrong in some cases! For example when a single large file is continuously dirtied, we would never finish syncing it because each pass would be able to write MAX_WRITEBACK_PAGES and inode dirty timestamp never gets updated (as inode is never completely clean). Thus __writeback_inodes_sb() would write the redirtied inode again and again. Fix the issue by setting nr_to_write to LONG_MAX in WB_SYNC_ALL mode. We do not need nr_to_write in WB_SYNC_ALL mode anyway since write_cache_pages() does livelock avoidance using page tagging in WB_SYNC_ALL mode. This makes wb_writeback() call __writeback_inodes_sb() only once on WB_SYNC_ALL. The latter function won't livelock because it works on - a finite set of files by doing queue_io() once at the beginning - a finite set of pages by PAGECACHE_TAG_TOWRITE page tagging After this patch, program from http://lkml.org/lkml/2010/10/24/154 is no longer able to stall sync forever. [fengguang.wu@intel.com: fix locking comment] Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Engelhardt <jengelh@medozas.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:32 -08:00
Jan Kara	aa373cf550	writeback: stop background/kupdate works from livelocking other works Background writeback is easily livelockable in a loop in wb_writeback() by a process continuously re-dirtying pages (or continuously appending to a file). This is in fact intended as the target of background writeback is to write dirty pages it can find as long as we are over dirty_background_threshold. But the above behavior gets inconvenient at times because no other work queued in the flusher thread's queue gets processed. In particular, since e.g. sync(1) relies on flusher thread to do all the IO for it, sync(1) can hang forever waiting for flusher thread to do the work. Generally, when a flusher thread has some work queued, someone submitted the work to achieve a goal more specific than what background writeback does. Moreover by working on the specific work, we also reduce amount of dirty pages which is exactly the target of background writeout. So it makes sense to give specific work a priority over a generic page cleaning. Thus we interrupt background writeback if there is some other work to do. We return to the background writeback after completing all the queued work. This may delay the writeback of expired inodes for a while, however the expired inodes will eventually be flushed to disk as long as the other works won't livelock. [fengguang.wu@intel.com: update comment] Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Engelhardt <jengelh@medozas.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:32 -08:00
Wu Fengguang	71927e84e0	writeback: trace wakeup event for background writeback This tracks when balance_dirty_pages() tries to wakeup the flusher thread for background writeback (if it was not started already). Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Engelhardt <jengelh@medozas.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:32 -08:00
Jan Kara	6585027a5e	writeback: integrated background writeback work Check whether background writeback is needed after finishing each work. When bdi flusher thread finishes doing some work check whether any kind of background writeback needs to be done (either because dirty_background_ratio is exceeded or because we need to start flushing old inodes). If so, just do background write back. This way, bdi_start_background_writeback() just needs to wake up the flusher thread. It will do background writeback as soon as there is no other work. This is a preparatory patch for the next patch which stops background writeback as soon as there is other work to do. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Engelhardt <jengelh@medozas.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:32 -08:00
Linus Torvalds	6254b32b57	ecryptfs: fix broken build Stephen Rothwell reports that the vfs merge broke the build of ecryptfs. The breakage comes from commit `66cb76666d` ("sanitize ecryptfs ->mount()") which was obviously not even build tested. Tssk, tssk, Al. This is the minimal build fixup for the situation, although I don't have a filesystem to actually test it with. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:19:38 -08:00
Sage Weil	17db143fc0	ceph: fix xattr rbtree search Fix xattr name comparison in rbtree search for strings that share a prefix. The name argument is null terminated, but the xattr name is not, so we need to use strncmp, but that means adjusting for the case where name is a prefix of xattr->name. The corresponding case in __set_xattr() already handles this properly (although in that case name is also not null terminated). Reported-by: Sergiy Kibrik <sakib@meta.ua> Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-13 15:50:11 -08:00
Yehuda Sadeh	1c1266bb91	ceph: fix getattr on directory when using norbytes The norbytes mount option was broken, and when doing getattr on a directory it return the rbytes instead of the number of entities. This commit fixes it. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-13 15:50:06 -08:00
Phillip Lougher	01a678c5a2	Squashfs: simplify CONFIG_SQUASHFS_LZO handling Get rid of messy repeated #if(n)def CONFIG_SQUASHFS_LZO code in decompressor.c Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2011-01-13 21:38:46 +00:00
Phillip Lougher	8fcd97216f	Squashfs: move squashfs_i() definition from squashfs.h Move squashfs_i() definition out of squashfs.h, this eliminates the need to #include squashfs_fs_i.h from numerous files. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2011-01-13 21:24:15 +00:00
Phillip Lougher	6197fd8678	Squashfs: get rid of default n in Kconfig As pointed out by Geert Uytterhoeven, "default n" is the default, no reason to specify it. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2011-01-13 21:21:52 +00:00
Phillip Lougher	e7ee11f0ec	Squashfs: add missing check in zlib_wrapper On file system corruption zlib can return Z_STREAM_OK with input buffers remaining, which will not be released. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2011-01-13 21:21:00 +00:00
Phillip Lougher	170cf02165	Squashfs: remove unnecessary variable in zlib_wrapper Get rid of unnecessary bytes variable, and remove redundant initialisation of zlib_err. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2011-01-13 21:20:52 +00:00
Phillip Lougher	7a43ae5237	Squashfs: Add XZ compression configuration option Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2011-01-13 21:16:52 +00:00
Phillip Lougher	81bb8debd0	Squashfs: add XZ compression support Add support for reading file systems compressed with the XZ compression algorithm. This patch adds the XZ decompressor wrapper code. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2011-01-13 20:51:20 +00:00
Trond Myklebust	8a0eebf66e	NFS: Fix NFSv3 exclusive open semantics Commit `c0204fd2b8` (NFS: Clean up nfs4_proc_create()) broke NFSv3 exclusive open by removing the code that passes the O_EXCL flag down to nfs3_proc_create(). This patch reverts that offending hunk from the original commit. Reported-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org [2.6.37] Tested-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 12:06:29 -08:00
Linus Torvalds	275220f0fc	Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits) block: ensure that completion error gets properly traced blktrace: add missing probe argument to block_bio_complete block cfq: don't use atomic_t for cfq_group block cfq: don't use atomic_t for cfq_queue block: trace event block fix unassigned field block: add internal hd part table references block: fix accounting bug on cross partition merges kref: add kref_test_and_get bio-integrity: mark kintegrityd_wq highpri and CPU intensive block: make kblockd_workqueue smarter Revert "sd: implement sd_check_events()" block: Clean up exit_io_context() source code. Fix compile warnings due to missing removal of a 'ret' variable fs/block: type signature of major_to_index(int) to major_to_index(unsigned) block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p) cfq-iosched: don't check cfqg in choose_service_tree() fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors cdrom: export cdrom_check_events() sd: implement sd_check_events() sr: implement sr_check_events() ...	2011-01-13 10:45:01 -08:00
Linus Torvalds	b2034d474b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (41 commits) fs: add documentation on fallocate hole punching Gfs2: fail if we try to use hole punch Btrfs: fail if we try to use hole punch Ext4: fail if we try to use hole punch Ocfs2: handle hole punching via fallocate properly XFS: handle hole punching via fallocate properly fs: add hole punching to fallocate vfs: pass struct file to do_truncate on O_TRUNC opens (try #2) fix signedness mess in rw_verify_area() on 64bit architectures fs: fix kernel-doc for dcache::prepend_path fs: fix kernel-doc for dcache::d_validate sanitize ecryptfs ->mount() switch afs move internal-only parts of ncpfs headers to fs/ncpfs switch ncpfs switch 9p pass default dentry_operations to mount_pseudo() switch hostfs switch affs switch configfs ...	2011-01-13 10:27:28 -08:00
Linus Torvalds	a170315420	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix cleanup when trying to mount inexistent image net/ceph: make ceph_msgr_wq non-reentrant ceph: fsc->*_wq's aren't used in memory reclaim path ceph: Always free allocated memory in osdmap_decode() ceph: Makefile: Remove unnessary code ceph: associate requests with opening sessions ceph: drop redundant r_mds field ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS ceph: add dir_layout to inode	2011-01-13 10:25:24 -08:00
Linus Torvalds	008d23e485	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits) Documentation/trace/events.txt: Remove obsolete sched_signal_send. writeback: fix global_dirty_limits comment runtime -> real-time ppc: fix comment typo singal -> signal drivers: fix comment typo diable -> disable. m68k: fix comment typo diable -> disable. wireless: comment typo fix diable -> disable. media: comment typo fix diable -> disable. remove doc for obsolete dynamic-printk kernel-parameter remove extraneous 'is' from Documentation/iostats.txt Fix spelling milisec -> ms in snd_ps3 module parameter description Fix spelling mistakes in comments Revert conflicting V4L changes i7core_edac: fix typos in comments mm/rmap.c: fix comment sound, ca0106: Fix assignment to 'channel'. hrtimer: fix a typo in comment init/Kconfig: fix typo anon_inodes: fix wrong function name in comment fix comment typos concerning "consistent" poll: fix a typo in comment ... Fix up trivial conflicts in: - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c) - fs/ext4/ext4.h Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.	2011-01-13 10:05:56 -08:00
Stefani Seibold	6f772fe65c	cramfs: generate unique inode number for better inode cache usage Generate a unique inode numbers for any entries in the cram file system. For files which did not contain data's (device nodes, fifos and sockets) the offset of the directory entry inside the cramfs plus 1 will be used as inode number. The + 1 for the inode will it make possible to distinguish between a file which contains no data and files which has data, the later one has a inode value where the lower two bits are always 0. It also reimplements the behavior to set the size and the number of block to 0 for special file, which is the right value for empty files, devices, fifos and sockets As a little benefit it will be also more compatible which older mkcramfs, because it will never use the cramfs_inode->offset for creating a inode number for special files. [akpm@linux-foundation.org: trivial comment fix] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Stefani Seibold <stefani@seibold.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 08:03:23 -08:00
Jeff Moyer	d3486f8b9e	aio: remove unused aio_run_iocbs() aio_run_iocbs() is not used at all, so get rid of it. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 08:03:22 -08:00
Namhyung Kim	2e41025598	aio: remove unnecessary check 'nr >= min_nr >= 0' always satisfies 'nr >= 0' so the check is unnecesary. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 08:03:22 -08:00
Namhyung Kim	e6d7202b66	fs/char_dev.c: remove unused cdev_index() Commit `66fa12c571` ("ieee1394: remove the old IEEE 1394 driver stack") eliminated the only user of cdev_index(). So it can be removed too. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 08:03:17 -08:00
Dave Anderson	ceff1a7709	/proc/kcore: fix seeking Commit `34aacb2920` ("procfs: Use generic_file_llseek in /proc/kcore") broke seeking on /proc/kcore. This changes it back to use default_llseek in order to restore the original behavior. The problem with generic_file_llseek is that it only allows seeks up to inode->i_sb->s_maxbytes, which is 2GB-1 on procfs, where the memory file offset values in the /proc/kcore PT_LOAD segments may exceed or start beyond that offset value. A similar revert was made for /proc/vmcore. Signed-off-by: Dave Anderson <anderson@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 08:03:17 -08:00
Alexey Dobriyan	bf33cbdf8a	proc: move proc_console.c to fs/proc/consoles.c Filename is supposed to match procfile name for random junk. Add __init while I'm at it. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 08:03:17 -08:00
Alexey Dobriyan	3740a20c4f	proc: less LOCK/UNLOCK in remove_proc_entry() For the common case where a proc entry is being removed and nobody is in the process of using it, save a LOCK/UNLOCK pair. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 08:03:17 -08:00
Petr Holasek	a6fc86d2b4	kpagecount: add slab page checking because _mapcount is in a union Add a PageSlab() check before adding the _mapcount value to /kpagecount. page->_mapcount is in a union with the SLAB structure so for pages controlled by SLAB, page_mapcount() returns nonsense. Signed-off-by: Petr Holasek <pholasek@redhat.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 08:03:17 -08:00
Jovi Zhang	c6a3405846	proc: use single_open() correctly single_open()'s third argument is for copying into seq_file->private. Use that, rather than open-coding it. Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 08:03:16 -08:00
Alexey Dobriyan	6d1b6e4eff	proc: ->low_ino cleanup - ->low_ino is write-once field -- reading it under locks is unnecessary. - /proc/$PID stuff never reaches pde_put()/free_proc_entry() -- PROC_DYNAMIC_FIRST check never triggers. - in proc_get_inode(), inode number always matches proc dir entry, so save one parameter. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 08:03:16 -08:00
Alexey Dobriyan	9d6de12f70	proc: use seq_puts()/seq_putc() where possible For string without format specifiers, use seq_puts(). For seq_printf("\n"), use seq_putc('\n'). text data bss dec hex filename 61866 488 112 62466 f402 fs/proc/proc.o 61729 488 112 62329 f379 fs/proc/proc.o ---------------------------------------------------- -139 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 08:03:16 -08:00
Alexey Dobriyan	a2ade7b6ca	proc: use unsigned long inside /proc//statm /proc//statm code needlessly truncates data from unsigned long to int. One needs only 8+ TB of RAM to make truncation visible. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 08:03:16 -08:00
Joe Perches	34e49d4f63	fs/proc/base.c, kernel/latencytop.c: convert sprintf_symbol() to %ps Use temporary lr for struct latency_record for improved readability and fewer columns used. Removed trailing space from output. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Joe Perches <joe@perches.com> Cc: Jiri Kosina <trivial@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 08:03:16 -08:00
Jesper Juhl	566538a6cf	reiserfs: make sure va_end() is always called after va_start(). A call to va_start() must always be followed by a call to va_end() in the same function. In fs/reiserfs/prints.c::print_block() this is not always the case. If 'bh' is NULL we'll return without calling va_end(). One could add a call to va_end() before the 'return' statement, but it's nicer to just move the call to va_start() after the test for 'bh' being NULL. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Acked-by: Edward Shishkin <edward.shishkin@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 08:03:15 -08:00
Jesper Juhl	e0e3d32bb4	befs: don't pass huge structs by value 'struct befs_disk_data_stream' is huge (~144 bytes) and it's being passed by value in fs/befs/endian.h::cpu_to_fsrun(). It would be better to pass a pointer. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Cc: Will Dyson <will_dyson@pobox.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 08:03:15 -08:00
Davide Libenzi	e462c448fd	pipe: use event aware wakeups Send the events the wakeup refers to, so that epoll, and even the new poll code in fs/select.c can avoid wakeups if the events do not match the requested set. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 08:03:15 -08:00
Mikael Pettersson	f670d0ecda	binfmt_elf: cleanups This cleans up a few bits in binfmt_elf.c and binfmts.h: - the hasvdso field in struct linux_binfmt is unused, so remove it and the only initialization of it - the elf_map CPP symbol is not defined anywhere in the kernel, so remove an unnecessary #ifndef elf_map - reduce excessive indentation in elf_format's initializer - add missing spaces, remove extraneous spaces No functional changes, but tested on x86 (32 and 64 bit), powerpc (32 and 64 bit), sparc64, arm, and alpha. Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 08:03:12 -08:00
Robin Holt	52bd19f769	epoll: convert max_user_watches to long On a 16TB machine, max_user_watches has an integer overflow. Convert it to use a long and handle the associated fallout. Signed-off-by: Robin Holt <holt@sgi.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 08:03:12 -08:00
Vasiliy Kulikov	65329bf46b	fs/select.c: fix information leak to userspace On some architectures __kernel_suseconds_t is int. On these archs struct timeval has padding bytes at the end. This struct is copied to userspace with these padding bytes uninitialized. This leads to leaking of contents of kernel stack memory. This bug was added with v2.6.27-rc5-286-gb773ad4. [akpm@linux-foundation.org: avoid the memset on architectures which don't need it] Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 08:03:12 -08:00
Andrew Morton	6db26ffc91	fs/ext4/inode.c: use pr_warn_ratelimited() pr_warning_ratelimited() doesn't exist. Also include printk.h, which defines these things. Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 08:03:05 -08:00
Jens Axboe	81c5e2ae33	Merge branch 'for-2.6.38/event-handling' into for-2.6.38/core	2011-01-13 14:47:54 +01:00
Josef Bacik	9ecf639a96	Gfs2: fail if we try to use hole punch Gfs2 doesn't have the ability to punch holes yet, so make sure we return EOPNOTSUPP if we try to use hole punching through fallocate. This support can be added later. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:16:44 -05:00
Josef Bacik	23a8519b55	Btrfs: fail if we try to use hole punch Btrfs doesn't have the ability to punch holes yet, so make sure we return EOPNOTSUPP if we try to use hole punching through fallocate. This support can be added later. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:16:44 -05:00
Josef Bacik	d6dc8462f4	Ext4: fail if we try to use hole punch Ext4 doesn't have the ability to punch holes yet, so make sure we return EOPNOTSUPP if we try to use hole punching through fallocate. This support can be added later. Thanks, Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:16:44 -05:00
Josef Bacik	db47fef2cd	Ocfs2: handle hole punching via fallocate properly This patch just makes ocfs2 use its UNRESERVP ioctl when we get the hole punch flag in fallocate. I didn't test it, but it seems simple enough. Thanks, Acked-by: Jan Kara <jack@suse.cz> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:16:43 -05:00
Josef Bacik	c25d246715	XFS: handle hole punching via fallocate properly This patch simply allows XFS to handle the hole punching flag in fallocate properly. I've tested this with a little program that does a bunch of random hole punching with FL_KEEP_SIZE and without it to make sure it does the right thing. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:16:43 -05:00
Josef Bacik	79124f18b3	fs: add hole punching to fallocate Hole punching has already been implemented by XFS and OCFS2, and has the potential to be implemented on both BTRFS and EXT4 so we need a generic way to get to this feature. The simplest way in my mind is to add FALLOC_FL_PUNCH_HOLE to fallocate() since it already looks like the normal fallocate() operation. I've tested this patch with XFS and BTRFS to make sure XFS did what it's supposed to do and that BTRFS failed like it was supposed to. Thank you, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:16:43 -05:00
Jeff Layton	e1181ee657	vfs: pass struct file to do_truncate on O_TRUNC opens (try #2 ) When a file is opened with O_TRUNC, the truncate processing is handled by handle_truncate(). This function however doesn't receive any info about the newly instantiated filp, and therefore can't pass that info along so that the setattr can use it. This makes NFSv4 misbehave. The client does an open and gets a valid stateid, and then doesn't use that stateid on the subsequent truncate. It uses the zero-stateid instead. Most servers ignore this fact and just do the truncate anyway, but some don't like it (notably, RHEL4). It seems more correct that since we have a fully instantiated file at the time that handle_truncate is called, that we pass that along so that the truncate operation can properly use it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:06:59 -05:00
Al Viro	cccb5a1e69	fix signedness mess in rw_verify_area() on 64bit architectures ... and clean the unsigned-f_pos code, while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:06:58 -05:00
Randy Dunlap	208898c17a	fs: fix kernel-doc for dcache::prepend_path Fix function kernel-doc warning for prepend_path(): Warning(fs/dcache.c:1924): missing initial short description on line: Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:06:57 -05:00
Randy Dunlap	1c977540fd	fs: fix kernel-doc for dcache::d_validate Fix function parameter kernel-doc for d_validate(): Warning(fs/dcache.c:1495): No description found for parameter 'parent' Warning(fs/dcache.c:1495): Excess function parameter 'dparent' description in 'd_validate' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:06:55 -05:00
Al Viro	66cb76666d	sanitize ecryptfs ->mount() kill ecryptfs_read_super(), reorder code allowing to use normal d_alloc_root() instead of opencoding it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:04:37 -05:00
Al Viro	d61dcce297	switch afs Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:04:20 -05:00
Al Viro	32c419d95f	move internal-only parts of ncpfs headers to fs/ncpfs Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:03:43 -05:00
Al Viro	0378c4051a	switch ncpfs merge dentry_operations for root and non-root Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:03:43 -05:00
Al Viro	98cd3fb0a2	switch 9p here we actually want ->d_op for root; setting it allows to get rid of kludge in v9fs_kill_super() since now we have proper ->d_release() for root and don't need to call it manually. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:03:43 -05:00
Al Viro	c74a1cbb3c	pass default dentry_operations to mount_pseudo() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:03:43 -05:00
Al Viro	f772c4a6a3	switch hostfs ->d_delete() doesn't matter for s_root anyway Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:03:42 -05:00
Al Viro	a129880daf	switch affs either d_op instance would work for root, actually... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:03:42 -05:00
Al Viro	d463a0c4b5	switch configfs Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:03:12 -05:00
Al Viro	31a203df9c	take coda-private headers out of include/linux Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:02:48 -05:00
Al Viro	9501e4c48e	switch coda Coda ->d_revalidate() actually checks for root, ->d_delete() is irrelevant. So we can use the same d_op for all coda dentries Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:02:48 -05:00
Al Viro	43d344d772	switch hpfs Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:02:47 -05:00
Al Viro	af53d29ac1	switch btrfs, close races Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:02:47 -05:00
Al Viro	ba87167c06	switch ocfs2, close races Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:02:46 -05:00
Al Viro	41ced6dcf3	switch gfs2, close races Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:02:46 -05:00
Al Viro	1c929cfe6d	switch cifs Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:02:46 -05:00
Al Viro	8b244ff2fa	switch nfs to ->s_d_op Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:02:45 -05:00
Al Viro	96e1391414	switch adfs Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:02:45 -05:00
Al Viro	eddf790bd4	switch hfsplus Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:02:45 -05:00
Al Viro	518c79d28e	switch hfs Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:02:45 -05:00
Al Viro	c6cb412366	minixfs: kill dead code ->d_op of root stays NULL these days on minixfs Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:02:44 -05:00
Al Viro	30304aba6a	switch sysv Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:02:44 -05:00
Al Viro	c35eebe993	switch fuse Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:02:44 -05:00
Al Viro	94b77bd86f	switch jfs to ->s_d_op, close exportfs races Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:02:43 -05:00
Al Viro	3d23985d6c	switch fat to ->s_d_op, close exportfs races there don't bother with lock_super() in fat_fill_super() callers, while we are at it - there won't be any concurrency anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:02:43 -05:00
Al Viro	6cc9c1d2c1	fix isofs d_op handling switch to ->s_d_op; d_obtain_alias() will DTRT now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:02:43 -05:00
Al Viro	c8aebb0c9f	per-superblock default ->d_op Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:02:34 -05:00
Tejun Heo	01e6acc4ea	ceph: fsc->_wq's aren't used in memory reclaim path fsc->_wq's aren't depended upon during memory reclaim. Convert to alloc_workqueue() w/o WQ_MEM_RECLAIM. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Sage Weil <sage@newdream.net> Cc: ceph-devel@vger.kernel.org Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-12 15:15:14 -08:00
Tracey Dent	582c86e690	ceph: Makefile: Remove unnessary code Remove the if and else conditional because the code is in mainline and there is no need in it being there. Also, Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-12 15:15:13 -08:00
Sage Weil	dc69e2e9fc	ceph: associate requests with opening sessions Associate request with sessions that aren't yep open. This makes the debugfs mdsc request list more informative. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-12 15:15:13 -08:00
Sage Weil	4af25fdda6	ceph: drop redundant r_mds field The r_mds field is redundant, since we can find the same information at r_session->s_mds, and when r_session is NULL then r_mds is meaningless. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-12 15:15:13 -08:00
Sage Weil	14303d20f3	ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS This implements the DIRLAYOUTHASH protocol feature, which passes the dir layout over the wire from the MDS. This gives the client knowledge of the correct hash function to use for mapping dentries among dir fragments. Note that if this feature is _not_ present on the client but is on the MDS, the client may misdirect requests. This will result in a forward and degrade performance. It may also result in inaccurate NFS filehandle generation, which will prevent fh resolution when the inode is not present in the client cache and the parent directories have been fragmented. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-12 15:15:13 -08:00
Sage Weil	6c0f3af72c	ceph: add dir_layout to inode Add a ceph_dir_layout to the inode, and calculate dentry hash values based on the parent directory's specified dir_hash function. This is needed because the old default Linux dcache hash function is extremely week and leads to a poor distribution of files among dir fragments. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-12 15:15:12 -08:00

... 2 3 4 5 6 ...

21352 Commits