linux/fs/autofs4/expire.c

546 lines
13 KiB
C
Raw Normal View History

/* -*- c -*- --------------------------------------------------------------- *
*
* linux/fs/autofs/expire.c
*
* Copyright 1997-1998 Transmeta Corporation -- All Rights Reserved
* Copyright 1999-2000 Jeremy Fitzhardinge <jeremy@goop.org>
* Copyright 2001-2006 Ian Kent <raven@themaw.net>
*
* This file is part of the Linux kernel and is made available under
* the terms of the GNU General Public License, version 2, or at your
* option, any later version, incorporated herein by reference.
*
* ------------------------------------------------------------------------- */
#include "autofs_i.h"
static unsigned long now;
/* Check if a dentry can be expired */
static inline int autofs4_can_expire(struct dentry *dentry,
unsigned long timeout, int do_now)
{
struct autofs_info *ino = autofs4_dentry_ino(dentry);
/* dentry in the process of being deleted */
if (ino == NULL)
return 0;
/* No point expiring a pending mount */
if (ino->flags & AUTOFS_INF_PENDING)
return 0;
if (!do_now) {
/* Too young to die */
if (!timeout || time_after(ino->last_used + timeout, now))
return 0;
/* update last_used here :-
- obviously makes sense if it is in use now
- less obviously, prevents rapid-fire expire
attempts if expire fails the first time */
ino->last_used = now;
}
return 1;
}
/* Check a mount point for busyness */
static int autofs4_mount_busy(struct vfsmount *mnt, struct dentry *dentry)
{
struct dentry *top = dentry;
struct path path = {.mnt = mnt, .dentry = dentry};
int status = 1;
DPRINTK("dentry %p %.*s",
dentry, (int)dentry->d_name.len, dentry->d_name.name);
path_get(&path);
Add a dentry op to allow processes to be held during pathwalk transit Add a dentry op (d_manage) to permit a filesystem to hold a process and make it sleep when it tries to transit away from one of that filesystem's directories during a pathwalk. The operation is keyed off a new dentry flag (DCACHE_MANAGE_TRANSIT). The filesystem is allowed to be selective about which processes it holds and which it permits to continue on or prohibits from transiting from each flagged directory. This will allow autofs to hold up client processes whilst letting its userspace daemon through to maintain the directory or the stuff behind it or mounted upon it. The ->d_manage() dentry operation: int (*d_manage)(struct path *path, bool mounting_here); takes a pointer to the directory about to be transited away from and a flag indicating whether the transit is undertaken by do_add_mount() or do_move_mount() skipping through a pile of filesystems mounted on a mountpoint. It should return 0 if successful and to let the process continue on its way; -EISDIR to prohibit the caller from skipping to overmounted filesystems or automounting, and to use this directory; or some other error code to return to the user. ->d_manage() is called with namespace_sem writelocked if mounting_here is true and no other locks held, so it may sleep. However, if mounting_here is true, it may not initiate or wait for a mount or unmount upon the parameter directory, even if the act is actually performed by userspace. Within fs/namei.c, follow_managed() is extended to check with d_manage() first on each managed directory, before transiting away from it or attempting to automount upon it. follow_down() is renamed follow_down_one() and should only be used where the filesystem deliberately intends to avoid management steps (e.g. autofs). A new follow_down() is added that incorporates the loop done by all other callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS and CIFS do use it, their use is removed by converting them to use d_automount()). The new follow_down() calls d_manage() as appropriate. It also takes an extra parameter to indicate if it is being called from mount code (with namespace_sem writelocked) which it passes to d_manage(). follow_down() ignores automount points so that it can be used to mount on them. __follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to sleep. It would be possible to enter d_manage() in rcu-walk mode too, and have that determine whether to abort or not itself. That would allow the autofs daemon to continue on in rcu-walk mode. Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't required as every tranist from that directory will cause d_manage() to be invoked. It can always be set again when necessary. ========================== WHAT THIS MEANS FOR AUTOFS ========================== Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to trigger the automounting of indirect mounts, and both of these can be called with i_mutex held. autofs knows that the i_mutex will be held by the caller in lookup(), and so can drop it before invoking the daemon - but this isn't so for d_revalidate(), since the lock is only held on _some_ of the code paths that call it. This means that autofs can't risk dropping i_mutex from its d_revalidate() function before it calls the daemon. The bug could manifest itself as, for example, a process that's trying to validate an automount dentry that gets made to wait because that dentry is expired and needs cleaning up: mkdir S ffffffff8014e05a 0 32580 24956 Call Trace: [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f [<ffffffff80057a2f>] lookup_create+0x46/0x80 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4 versus the automount daemon which wants to remove that dentry, but can't because the normal process is holding the i_mutex lock: automount D ffffffff8014e05a 0 32581 1 32561 Call Trace: [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14 [<ffffffff800e6d55>] do_rmdir+0x77/0xde [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 which means that the system is deadlocked. This patch allows autofs to hold up normal processes whilst the daemon goes ahead and does things to the dentry tree behind the automouter point without risking a deadlock as almost no locks are held in d_manage() and none in d_automount(). Signed-off-by: David Howells <dhowells@redhat.com> Was-Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 02:45:26 +08:00
if (!follow_down_one(&path))
goto done;
if (is_autofs4_dentry(path.dentry)) {
struct autofs_sb_info *sbi = autofs4_sbi(path.dentry->d_sb);
/* This is an autofs submount, we can't expire it */
if (autofs_type_indirect(sbi->type))
goto done;
/*
* Otherwise it's an offset mount and we need to check
* if we can umount its mount, if there is one.
*/
if (!d_mountpoint(path.dentry)) {
status = 0;
goto done;
}
}
/* Update the expiry counter if fs is busy */
if (!may_umount_tree(path.mnt)) {
struct autofs_info *ino = autofs4_dentry_ino(top);
ino->last_used = jiffies;
goto done;
}
status = 0;
done:
DPRINTK("returning = %d", status);
path_put(&path);
return status;
}
/*
* Calculate and dget next entry in top down tree traversal.
*/
static struct dentry *get_next_positive_dentry(struct dentry *prev,
struct dentry *root)
{
struct list_head *next;
struct dentry *p, *ret;
if (prev == NULL)
return dget(prev);
spin_lock(&autofs4_lock);
relock:
p = prev;
spin_lock(&p->d_lock);
again:
next = p->d_subdirs.next;
if (next == &p->d_subdirs) {
while (1) {
struct dentry *parent;
if (p == root) {
spin_unlock(&p->d_lock);
spin_unlock(&autofs4_lock);
dput(prev);
return NULL;
}
parent = p->d_parent;
if (!spin_trylock(&parent->d_lock)) {
spin_unlock(&p->d_lock);
cpu_relax();
goto relock;
}
spin_unlock(&p->d_lock);
next = p->d_u.d_child.next;
p = parent;
if (next != &parent->d_subdirs)
break;
}
}
ret = list_entry(next, struct dentry, d_u.d_child);
spin_lock_nested(&ret->d_lock, DENTRY_D_LOCK_NESTED);
/* Negative dentry - try next */
if (!simple_positive(ret)) {
spin_unlock(&ret->d_lock);
p = ret;
goto again;
}
dget_dlock(ret);
spin_unlock(&ret->d_lock);
spin_unlock(&p->d_lock);
spin_unlock(&autofs4_lock);
dput(prev);
return ret;
}
/*
* Check a direct mount point for busyness.
* Direct mounts have similar expiry semantics to tree mounts.
* The tree is not busy iff no mountpoints are busy and there are no
* autofs submounts.
*/
static int autofs4_direct_busy(struct vfsmount *mnt,
struct dentry *top,
unsigned long timeout,
int do_now)
{
DPRINTK("top %p %.*s",
top, (int) top->d_name.len, top->d_name.name);
/* If it's busy update the expiry counters */
if (!may_umount_tree(mnt)) {
struct autofs_info *ino = autofs4_dentry_ino(top);
if (ino)
ino->last_used = jiffies;
return 1;
}
/* Timeout of a direct mount is determined by its top dentry */
if (!autofs4_can_expire(top, timeout, do_now))
return 1;
return 0;
}
/* Check a directory tree of mount points for busyness
* The tree is not busy iff no mountpoints are busy
*/
static int autofs4_tree_busy(struct vfsmount *mnt,
struct dentry *top,
unsigned long timeout,
int do_now)
{
struct autofs_info *top_ino = autofs4_dentry_ino(top);
struct dentry *p;
DPRINTK("top %p %.*s",
top, (int)top->d_name.len, top->d_name.name);
/* Negative dentry - give up */
if (!simple_positive(top))
return 1;
p = NULL;
while ((p = get_next_positive_dentry(p, top))) {
DPRINTK("dentry %p %.*s",
p, (int) p->d_name.len, p->d_name.name);
/*
* Is someone visiting anywhere in the subtree ?
* If there's no mount we need to check the usage
* count for the autofs dentry.
* If the fs is busy update the expiry counter.
*/
if (d_mountpoint(p)) {
if (autofs4_mount_busy(mnt, p)) {
top_ino->last_used = jiffies;
dput(p);
return 1;
}
} else {
struct autofs_info *ino = autofs4_dentry_ino(p);
unsigned int ino_count = atomic_read(&ino->count);
/*
* Clean stale dentries below that have not been
* invalidated after a mount fail during lookup
*/
d_invalidate(p);
/* allow for dget above and top is already dgot */
if (p == top)
ino_count += 2;
else
ino_count++;
if (p->d_count > ino_count) {
top_ino->last_used = jiffies;
dput(p);
return 1;
}
}
}
/* Timeout of a tree mount is ultimately determined by its top dentry */
if (!autofs4_can_expire(top, timeout, do_now))
return 1;
return 0;
}
static struct dentry *autofs4_check_leaves(struct vfsmount *mnt,
struct dentry *parent,
unsigned long timeout,
int do_now)
{
struct dentry *p;
DPRINTK("parent %p %.*s",
parent, (int)parent->d_name.len, parent->d_name.name);
p = NULL;
while ((p = get_next_positive_dentry(p, parent))) {
DPRINTK("dentry %p %.*s",
p, (int) p->d_name.len, p->d_name.name);
if (d_mountpoint(p)) {
/* Can we umount this guy */
if (autofs4_mount_busy(mnt, p))
continue;
/* Can we expire this guy */
if (autofs4_can_expire(p, timeout, do_now))
return p;
}
}
return NULL;
}
/* Check if we can expire a direct mount (possibly a tree) */
struct dentry *autofs4_expire_direct(struct super_block *sb,
struct vfsmount *mnt,
struct autofs_sb_info *sbi,
int how)
{
unsigned long timeout;
struct dentry *root = dget(sb->s_root);
int do_now = how & AUTOFS_EXP_IMMEDIATE;
if (!root)
return NULL;
now = jiffies;
timeout = sbi->exp_timeout;
spin_lock(&sbi->fs_lock);
if (!autofs4_direct_busy(mnt, root, timeout, do_now)) {
struct autofs_info *ino = autofs4_dentry_ino(root);
if (d_mountpoint(root)) {
ino->flags |= AUTOFS_INF_MOUNTPOINT;
fs: dcache remove d_mounted Rather than keep a d_mounted count in the dentry, set a dentry flag instead. The flag can be cleared by checking the hash table to see if there are any mounts left, which is not time critical because it is performed at detach time. The mounted state of a dentry is only used to speculatively take a look in the mount hash table if it is set -- before following the mount, vfsmount lock is taken and mount re-checked without races. This saves 4 bytes on 32-bit, nothing on 64-bit but it does provide a hole I might use later (and some configs have larger than 32-bit spinlocks which might make use of the hole). Autofs4 conversion and changelog by Ian Kent <raven@themaw.net>: In autofs4, when expring direct (or offset) mounts we need to ensure that we block user path walks into the autofs mount, which is covered by another mount. To do this we clear the mounted status so that follows stop before walking into the mount and are essentially blocked until the expire is completed. The automount daemon still finds the correct dentry for the umount due to the follow mount logic in fs/autofs4/root.c:autofs4_follow_link(), which is set as an inode operation for direct and offset mounts only and is called following the lookup that stopped at the covered mount. At the end of the expire the covering mount probably has gone away so the mounted status need not be restored. But we need to check this and only restore the mounted status if the expire failed. XXX: autofs may not work right if we have other mounts go over the top of it? Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 14:49:54 +08:00
spin_lock(&root->d_lock);
root->d_flags &= ~DCACHE_MOUNTED;
spin_unlock(&root->d_lock);
}
ino->flags |= AUTOFS_INF_EXPIRING;
managed_dentry_set_automount(root);
init_completion(&ino->expire_complete);
spin_unlock(&sbi->fs_lock);
return root;
}
spin_unlock(&sbi->fs_lock);
dput(root);
return NULL;
}
/*
* Find an eligible tree to time-out
* A tree is eligible if :-
* - it is unused by any user process
* - it has been unused for exp_timeout time
*/
struct dentry *autofs4_expire_indirect(struct super_block *sb,
struct vfsmount *mnt,
struct autofs_sb_info *sbi,
int how)
{
unsigned long timeout;
struct dentry *root = sb->s_root;
struct dentry *dentry;
struct dentry *expired = NULL;
int do_now = how & AUTOFS_EXP_IMMEDIATE;
int exp_leaves = how & AUTOFS_EXP_LEAVES;
struct autofs_info *ino;
unsigned int ino_count;
if (!root)
return NULL;
now = jiffies;
timeout = sbi->exp_timeout;
dentry = NULL;
while ((dentry = get_next_positive_dentry(dentry, root))) {
spin_lock(&sbi->fs_lock);
ino = autofs4_dentry_ino(dentry);
/*
* Case 1: (i) indirect mount or top level pseudo direct mount
* (autofs-4.1).
* (ii) indirect mount with offset mount, check the "/"
* offset (autofs-5.0+).
*/
if (d_mountpoint(dentry)) {
DPRINTK("checking mountpoint %p %.*s",
dentry, (int)dentry->d_name.len, dentry->d_name.name);
/* Path walk currently on this dentry? */
ino_count = atomic_read(&ino->count) + 2;
if (dentry->d_count > ino_count)
goto next;
/* Can we umount this guy */
if (autofs4_mount_busy(mnt, dentry))
goto next;
/* Can we expire this guy */
if (autofs4_can_expire(dentry, timeout, do_now)) {
expired = dentry;
goto found;
}
goto next;
}
if (simple_empty(dentry))
goto next;
/* Case 2: tree mount, expire iff entire tree is not busy */
if (!exp_leaves) {
/* Path walk currently on this dentry? */
ino_count = atomic_read(&ino->count) + 1;
if (dentry->d_count > ino_count)
goto next;
if (!autofs4_tree_busy(mnt, dentry, timeout, do_now)) {
expired = dentry;
goto found;
}
/*
* Case 3: pseudo direct mount, expire individual leaves
* (autofs-4.1).
*/
} else {
/* Path walk currently on this dentry? */
ino_count = atomic_read(&ino->count) + 1;
if (dentry->d_count > ino_count)
goto next;
expired = autofs4_check_leaves(mnt, dentry, timeout, do_now);
if (expired) {
dput(dentry);
goto found;
}
}
next:
spin_unlock(&sbi->fs_lock);
}
return NULL;
found:
DPRINTK("returning %p %.*s",
expired, (int)expired->d_name.len, expired->d_name.name);
ino = autofs4_dentry_ino(expired);
ino->flags |= AUTOFS_INF_EXPIRING;
managed_dentry_set_automount(expired);
init_completion(&ino->expire_complete);
spin_unlock(&sbi->fs_lock);
spin_lock(&autofs4_lock);
spin_lock(&expired->d_parent->d_lock);
spin_lock_nested(&expired->d_lock, DENTRY_D_LOCK_NESTED);
list_move(&expired->d_parent->d_subdirs, &expired->d_u.d_child);
spin_unlock(&expired->d_lock);
spin_unlock(&expired->d_parent->d_lock);
spin_unlock(&autofs4_lock);
return expired;
}
int autofs4_expire_wait(struct dentry *dentry)
{
struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
struct autofs_info *ino = autofs4_dentry_ino(dentry);
int status;
/* Block on any pending expire */
spin_lock(&sbi->fs_lock);
if (ino->flags & AUTOFS_INF_EXPIRING) {
spin_unlock(&sbi->fs_lock);
DPRINTK("waiting for expire %p name=%.*s",
dentry, dentry->d_name.len, dentry->d_name.name);
status = autofs4_wait(sbi, dentry, NFY_NONE);
wait_for_completion(&ino->expire_complete);
DPRINTK("expire done status=%d", status);
if (d_unhashed(dentry))
return -EAGAIN;
return status;
}
spin_unlock(&sbi->fs_lock);
return 0;
}
/* Perform an expiry operation */
int autofs4_expire_run(struct super_block *sb,
struct vfsmount *mnt,
struct autofs_sb_info *sbi,
struct autofs_packet_expire __user *pkt_p)
{
struct autofs_packet_expire pkt;
struct autofs_info *ino;
struct dentry *dentry;
int ret = 0;
memset(&pkt,0,sizeof pkt);
pkt.hdr.proto_version = sbi->version;
pkt.hdr.type = autofs_ptype_expire;
if ((dentry = autofs4_expire_indirect(sb, mnt, sbi, 0)) == NULL)
return -EAGAIN;
pkt.len = dentry->d_name.len;
memcpy(pkt.name, dentry->d_name.name, pkt.len);
pkt.name[pkt.len] = '\0';
dput(dentry);
if ( copy_to_user(pkt_p, &pkt, sizeof(struct autofs_packet_expire)) )
ret = -EFAULT;
spin_lock(&sbi->fs_lock);
ino = autofs4_dentry_ino(dentry);
ino->flags &= ~AUTOFS_INF_EXPIRING;
if (!d_unhashed(dentry))
managed_dentry_clear_automount(dentry);
complete_all(&ino->expire_complete);
spin_unlock(&sbi->fs_lock);
return ret;
}
int autofs4_do_expire_multi(struct super_block *sb, struct vfsmount *mnt,
struct autofs_sb_info *sbi, int when)
{
struct dentry *dentry;
int ret = -EAGAIN;
if (autofs_type_trigger(sbi->type))
dentry = autofs4_expire_direct(sb, mnt, sbi, when);
else
dentry = autofs4_expire_indirect(sb, mnt, sbi, when);
if (dentry) {
struct autofs_info *ino = autofs4_dentry_ino(dentry);
/* This is synchronous because it makes the daemon a
little easier */
ret = autofs4_wait(sbi, dentry, NFY_EXPIRE);
spin_lock(&sbi->fs_lock);
if (ino->flags & AUTOFS_INF_MOUNTPOINT) {
fs: dcache remove d_mounted Rather than keep a d_mounted count in the dentry, set a dentry flag instead. The flag can be cleared by checking the hash table to see if there are any mounts left, which is not time critical because it is performed at detach time. The mounted state of a dentry is only used to speculatively take a look in the mount hash table if it is set -- before following the mount, vfsmount lock is taken and mount re-checked without races. This saves 4 bytes on 32-bit, nothing on 64-bit but it does provide a hole I might use later (and some configs have larger than 32-bit spinlocks which might make use of the hole). Autofs4 conversion and changelog by Ian Kent <raven@themaw.net>: In autofs4, when expring direct (or offset) mounts we need to ensure that we block user path walks into the autofs mount, which is covered by another mount. To do this we clear the mounted status so that follows stop before walking into the mount and are essentially blocked until the expire is completed. The automount daemon still finds the correct dentry for the umount due to the follow mount logic in fs/autofs4/root.c:autofs4_follow_link(), which is set as an inode operation for direct and offset mounts only and is called following the lookup that stopped at the covered mount. At the end of the expire the covering mount probably has gone away so the mounted status need not be restored. But we need to check this and only restore the mounted status if the expire failed. XXX: autofs may not work right if we have other mounts go over the top of it? Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 14:49:54 +08:00
spin_lock(&sb->s_root->d_lock);
/*
* If we haven't been expired away, then reset
* mounted status.
*/
if (mnt->mnt_parent != mnt)
sb->s_root->d_flags |= DCACHE_MOUNTED;
spin_unlock(&sb->s_root->d_lock);
ino->flags &= ~AUTOFS_INF_MOUNTPOINT;
}
ino->flags &= ~AUTOFS_INF_EXPIRING;
if (ret)
managed_dentry_clear_automount(dentry);
complete_all(&ino->expire_complete);
spin_unlock(&sbi->fs_lock);
dput(dentry);
}
return ret;
}
/* Call repeatedly until it returns -EAGAIN, meaning there's nothing
more to be done */
int autofs4_expire_multi(struct super_block *sb, struct vfsmount *mnt,
struct autofs_sb_info *sbi, int __user *arg)
{
int do_now = 0;
if (arg && get_user(do_now, arg))
return -EFAULT;
return autofs4_do_expire_multi(sb, mnt, sbi, do_now);
}