linux

Commit Graph

Author	SHA1	Message	Date
Trond Myklebust	cf8d2c11cb	VFS: Add VFS helper functions for setting up private namespaces The purpose of this patch is to improve the remote mount path lookup support for distributed filesystems such as the NFSv4 client. When given a mount command of the form "mount server:/foo/bar /mnt", the NFSv4 client is required to look up the filehandle for "server:/", and then look up each component of the remote mount path "foo/bar" in order to find the directory that is actually going to be mounted on /mnt. Following that remote mount path may involve following symlinks, crossing server-side mount points and even following referrals to filesystem volumes on other servers. Since the standard VFS path lookup code already supports walking paths that contain all these features (using in-kernel automounts for following referrals) we would like to be able to reuse that rather than duplicate the full path traversal functionality in the NFSv4 client code. This patch therefore defines a VFS helper function create_mnt_ns(), that sets up a temporary filesystem namespace and attaches a root filesystem to it. It exports the create_mnt_ns() and put_mnt_ns() function for use by filesystem modules. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-22 21:28:25 -07:00
Trond Myklebust	616511d039	VFS: Uninline the function put_mnt_ns() In order to allow modules to use it without having to export vfsmount_lock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-22 21:28:25 -07:00
Al Viro	4aa98cf768	Push BKL down into do_remount_sb() [folded fix from Jiri Slaby] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:08 -04:00
Al Viro	7f78d4cd4c	Push BKL down beyond VFS-only parts of do_mount() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:08 -04:00
Al Viro	6fac98dd21	Push BKL into do_mount() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:08 -04:00
Alexey Dobriyan	f3da392e9f	dcache: extrace and use d_unlinked() d_unlinked() will be used in middle-term to ban checkpointing when opened but unlinked file is detected, and in long term, to detect such situation and special case on it. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:06 -04:00
npiggin@suse.de	96029c4e09	fs: introduce mnt_clone_write This patch speeds up lmbench lat_mmap test by about another 2% after the first patch. Before: avg = 462.286 std = 5.46106 After: avg = 453.12 std = 9.58257 (50 runs of each, stddev gives a reasonable confidence) It does this by introducing mnt_clone_write, which avoids some heavyweight operations of mnt_want_write if called on a vfsmount which we know already has a write count; and mnt_want_write_file, which can call mnt_clone_write if the file is open for write. After these two patches, mnt_want_write and mnt_drop_write go from 7% on the profile down to 1.3% (including mnt_clone_write). [AV: mnt_want_write_file() should take file alone and derive mnt from it; not only all callers have that form, but that's the only mnt about which we know that it's already held for write if file is opened for write] Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:02 -04:00
npiggin@suse.de	d3ef3d7351	fs: mnt_want_write speedup This patch speeds up lmbench lat_mmap test by about 8%. lat_mmap is set up basically to mmap a 64MB file on tmpfs, fault in its pages, then unmap it. A microbenchmark yes, but it exercises some important paths in the mm. Before: avg = 501.9 std = 14.7773 After: avg = 462.286 std = 5.46106 (50 runs of each, stddev gives a reasonable confidence, but there is quite a bit of variation there still) It does this by removing the complex per-cpu locking and counter-cache and replaces it with a percpu counter in struct vfsmount. This makes the code much simpler, and avoids spinlocks (although the msync is still pretty costly, unfortunately). It results in about 900 bytes smaller code too. It does increase the size of a vfsmount, however. It should also give a speedup on large systems if CPUs are frequently operating on different mounts (because the existing scheme has to operate on an atomic in the struct vfsmount when switching between mounts). But I'm most interested in the single threaded path performance for the moment. [AV: minor cleanup] Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:02 -04:00
Al Viro	1c755af4df	switch lookup_mnt() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:01 -04:00
Al Viro	9393bd07cf	switch follow_down() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:01 -04:00
Al Viro	589ff870ed	Switch collect_mounts() to struct path Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:01 -04:00
Al Viro	dd5cae6e97	Don't bother with check_mnt() in do_add_mount() on shrinkable ones These guys are what we add as submounts; checks for "is that attached in our namespace" are simply irrelevant for those and counterproductive for use of private vfsmount trees a-la what NFS folks want. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:35:59 -04:00
Al Viro	2a32cebd6c	Fix races around the access to ->s_options Put generic_show_options read access to s_options under rcu_read_lock, split save_mount_options() into "we are setting it the first time" (uses in foo_fill_super()) and "we are relacing and freeing the old one", synchronize_rcu() before kfree() in the latter. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-05-09 10:51:34 -04:00
Alessio Igor Bogani	67e55205ec	vfs: umount_begin BKL pushdown Push BKL down into ->umount_begin() Signed-off-by: Alessio Igor Bogani <abogani@texware.it> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-05-09 10:49:38 -04:00
Al Viro	e5d67f0715	Touch all affected namespaces on propagation of mount We shouldn't just touch the namespace of current process Caught-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-04-20 23:01:15 -04:00
Andi Kleen	613cbe3d48	Don't set relatime when noatime is specified Since commit `0a1c01c947` ("Make relatime default") when a file system is mounted explicitely with noatime it gets both the MNT_RELATIME and MNT_NOATIME bits set. This shows up like this in /proc/mounts: /dev/xxx /yyy ext3 rw,noatime,relatime,errors=continue,data=writeback 0 0 That looks strange. The VFS uses noatime in this case, but both flags are set. So it's more a cosmetic issue, but still better to fix. Cc: mjg@redhat.com Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-19 10:46:47 -07:00
Al Viro	5ad4e53bd5	Get rid of indirect include of fs_struct.h Don't pull it in sched.h; very few files actually need it and those can include directly. sched.h itself only needs forward declaration of struct fs_struct; Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-31 23:00:27 -04:00
Al Viro	3e93cd6718	Take fs_struct handling to new file (fs/fs_struct.c) Pure code move; two new helper functions for nfsd and daemonize (unshare_fs_struct() and daemonize_fs_struct() resp.; for now - the same code as used to be in callers). unshare_fs_struct() exported (for nfsd, as copy_fs_struct()/exit_fs() used to be), copy_fs_struct() and exit_fs() don't need exports anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-31 23:00:26 -04:00
Al Viro	f8ef3ed2be	Get rid of bumping fs_struct refcount in pivot_root(2) Not because execve races with _that_ are serious - we really need a situation when final drop of fs_struct refcount is done by something that used to have it as current->fs. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-31 23:00:25 -04:00
Linus Torvalds	3ae5080f4c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (37 commits) fs: avoid I_NEW inodes Merge code for single and multiple-instance mounts Remove get_init_pts_sb() Move common mknod_ptmx() calls into caller Parse mount options just once and copy them to super block Unroll essentials of do_remount_sb() into devpts vfs: simple_set_mnt() should return void fs: move bdev code out of buffer.c constify dentry_operations: rest constify dentry_operations: configfs constify dentry_operations: sysfs constify dentry_operations: JFS constify dentry_operations: OCFS2 constify dentry_operations: GFS2 constify dentry_operations: FAT constify dentry_operations: FUSE constify dentry_operations: procfs constify dentry_operations: ecryptfs constify dentry_operations: CIFS constify dentry_operations: AFS ...	2009-03-27 16:23:12 -07:00
Sukadev Bhattiprolu	a3ec947c85	vfs: simple_set_mnt() should return void simple_set_mnt() is defined as returning 'int' but always returns 0. Callers assume simple_set_mnt() never fails and don't properly cleanup if it were to _ever_ fail. For instance, get_sb_single() and get_sb_nodev() should: up_write(sb->s_unmount); deactivate_super(sb); if simple_set_mnt() fails. Since simple_set_mnt() never fails, would be cleaner if it did not return anything. [akpm@linux-foundation.org: fix build] Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:03 -04:00
Matthew Garrett	0a1c01c947	Make relatime default Change the default behaviour of the kernel to use relatime for all filesystems. This can be overridden with the "strictatime" mount option. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-26 11:01:10 -07:00
Matthew Garrett	d0adde574b	Add a strictatime mount option Add support for explicitly requesting full atime updates. This makes it possible for kernels to default to relatime but still allow userspace to override it. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-26 10:56:35 -07:00
Al Viro	1a88b5364b	Fix incomplete __mntput locking Getting this wrong caused WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() due to optimistically checking cpu_writer->mnt outside the spinlock. Here's what we really want: * we know that nobody will set cpu_writer->mnt to mnt from now on * all changes to that sucker are done under cpu_writer->lock * we want the laziest equivalent of spin_lock(&cpu_writer->lock); if (likely(cpu_writer->mnt != mnt)) { spin_unlock(&cpu_writer->lock); continue; } /* do stuff */ that would make sure we won't miss earlier setting of ->mnt done by another CPU. Anyway, for now we just move the spin_lock() earlier and move the test into the properly locked region. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reported-and-tested-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-17 14:02:08 -08:00
Heiko Carstens	3480b25743	[CVE-2009-0029] System call wrappers part 14 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2009-01-14 14:15:24 +01:00
Heiko Carstens	bdc480e3be	[CVE-2009-0029] System call wrappers part 10 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2009-01-14 14:15:22 +01:00
Julia Lawall	5cc4a0341a	fs/namespace.c: drop code after return The extra semicolon serves no purpose. Signed-off-by: Julia Lawall <julia@diku.dk> Reviewed-by: Richard Genoud <richard.genoud@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-12-31 18:07:38 -05:00
James Morris	2b82892565	Merge branch 'master' into next Conflicts: security/keys/internal.h security/keys/process_keys.c security/keys/request_key.c Fixed conflicts above by using the non 'tsk' versions. Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 11:29:12 +11:00
David Howells	da9592edeb	CRED: Wrap task credential accesses in the filesystem subsystem Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(\|e\|s\|fs)[ug]id to current_(\|e\|s\|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:05 +11:00
Eric W. Biederman	afef80b3d8	vfs: fix shrink_submounts In the last refactoring of shrink_submounts a variable was not completely renamed. So finish the renaming of mnt to m now. Without this if you attempt to mount an nfs mount that has both automatic nfs sub mounts on it, and has normal mounts on it. The unmount will succeed when it should not. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@ZenIV.linux.org.uk Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-11-12 17:17:17 -08:00
Dan Williams	0e55a7cca4	[RFC PATCH] touch_mnt_namespace when the mount flags change Daemons that need to be launched while the rootfs is read-only can now poll /proc/mounts to be notified when their O_RDWR requests may no longer end in EROFS. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-10-23 05:13:23 -04:00
Al Viro	0a0d8a4675	[PATCH] no need for noinline stuff in fs/namespace.c anymore Stack footprint from hell had been due to many struct nameidata in there. No more. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-23 03:34:22 -04:00
Al Viro	2d92ab3c62	[PATCH] finally get rid of nameidata in namespace.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-23 03:34:20 -04:00
Al Viro	8d66bf5481	[PATCH] pass struct path * to do_add_mount() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-08-01 11:25:32 -04:00
Al Viro	2d8f30380a	[PATCH] sanitize __user_walk_fd() et.al. * do not pass nameidata; struct path is all the callers want. * switch to new helpers: user_path_at(dfd, pathname, flags, &path) user_path(pathname, &path) user_lpath(pathname, &path) user_path_dir(pathname, &path) (fail if not a directory) The last 3 are trivial macro wrappers for the first one. * remove nameidata in callers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:34 -04:00
Li Zefan	88b387824f	[PATCH] vfs: use kstrdup() and check failing allocation - use kstrdup() instead of kmalloc() + memcpy() - return NULL if allocating ->mnt_devname failed - mnt_devname should be const Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:24 -04:00
Al Viro	7f2da1e7d0	[PATCH] kill altroot long overdue... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:20 -04:00
Arjan van de Ven	5c752ad9f3	Use WARN() in fs/ Use WARN() instead of a printk+WARN_ON() pair; this way the message becomes part of the warning section for better reporting/collection. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:07 -07:00
Eric Paris	2069f45784	LSM/SELinux: show LSM mount options in /proc/mounts This patch causes SELinux mount options to show up in /proc/mounts. As with other code in the area seq_put errors are ignored. Other LSM's will not have their mount options displayed until they fill in their own security_sb_show_options() function. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: James Morris <jmorris@namei.org>	2008-07-14 15:02:05 +10:00
Harvey Harrison	8e24eea728	fs: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-30 08:29:54 -07:00
Jan Blunck	7ec02ef159	vfs: remove lives_below_in_same_fs() Remove lives_below_in_same_fs() since is_subdir() from fs/dcache.c is providing the same functionality. Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:06 -07:00
Jan Kara	8794b5b246	quota: remove superfluous DQUOT_OFF() in fs/namespace.c We don't need to turn quotas off before remounting root ro, because do_remount_sb() already handles this. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:33 -07:00
Al Viro	42faad9965	[PATCH] restore sane ->umount_begin() API Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-25 09:23:25 -04:00
Miklos Szeredi	97e7e0f71d	[patch 7/7] vfs: mountinfo: show dominating group id Show peer group ID of nearest dominating group that has intersection with the mount's namespace. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-23 00:05:09 -04:00
Ram Pai	2d4d4864ac	[patch 6/7] vfs: mountinfo: add /proc/<pid>/mountinfo [mszeredi@suse.cz] rewrite and split big patch into managable chunks /proc/mounts in its current form lacks important information: - propagation state - root of mount for bind mounts - the st_dev value used within the filesystem - identifier for each mount and it's parent It also suffers from the following problems: - not easily extendable - ambiguity of mountpoints within a chrooted environment - doesn't distinguish between filesystem dependent and independent options - doesn't distinguish between per mount and per super block options This patch introduces /proc/<pid>/mountinfo which attempts to address all these deficiencies. Code shared between /proc/<pid>/mounts and /proc/<pid>/mountinfo is extracted into separate functions. Thanks to Al Viro for the help in getting the design right. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-23 00:05:03 -04:00
Miklos Szeredi	a1a2c409b6	[patch 5/7] vfs: mountinfo: allow using process root Allow /proc/<pid>/mountinfo to use the root of <pid> to calculate mountpoints. - move definition of 'struct proc_mounts' to <linux/mnt_namespace.h> - add the process's namespace and root to this structure - pass a pointer to 'struct proc_mounts' into seq_operations In addition the following cleanups are made: - use a common open function for /proc/<pid>/{mounts,mountstat} - surround namespace.c part of these proc files with #ifdef CONFIG_PROC_FS - make the seq_operations structures const Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-23 00:04:57 -04:00
Miklos Szeredi	719f5d7f0b	[patch 4/7] vfs: mountinfo: add mount peer group ID Add a unique ID to each peer group using the IDR infrastructure. The identifiers are reused after the peer group dissolves. The IDR structures are protected by holding namepspace_sem for write while allocating or deallocating IDs. IDs are allocated when a previously unshared vfsmount becomes the first member of a peer group. When a new member is added to an existing group, the ID is copied from one of the old members. IDs are freed when the last member of a peer group is unshared. Setting the MNT_SHARED flag on members of a subtree is done as a separate step, after all the IDs have been allocated. This way an allocation failure can be cleaned up easilty, without affecting the propagation state. Based on design sketch by Al Viro. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-23 00:04:51 -04:00
Miklos Szeredi	73cd49ecdd	[patch 3/7] vfs: mountinfo: add mount ID Add a unique ID to each vfsmount using the IDR infrastructure. The identifiers are reused after the vfsmount is freed. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-23 00:04:45 -04:00
Al Viro	8c3ee42e80	[PATCH] get rid of more nameidata passing in namespace.c Further reduction of stack footprint (sys_pivot_root()); lose useless BKL in there, while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-21 23:13:47 -04:00
Al Viro	b5266eb4c8	[PATCH] switch a bunch of LSM hooks from nameidata to path Namely, ones from namespace.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-21 23:13:23 -04:00

1 2 3

141 Commits