linux/fs
Filipe Manana bde6c24202 Btrfs: fix stale dir entries after unlink, inode eviction and fsync
If we remove a hard link from an inode, the inode gets evicted, then
we fsync the inode and then power fail/crash, when the log tree is
replayed, the parent directory inode still has entries pointing to
the name that no longer exists, while our inode no longer has the
BTRFS_INODE_REF_KEY item matching the deleted hard link (as expected),
leaving the filesystem in an inconsistent state. The stale directory
entries can not be deleted (an attempt to delete them causes -ESTALE
errors), which makes it impossible to delete the parent directory.

This happens because we track the id of the transaction where the last
unlink operation for the inode happened (last_unlink_trans) in an
in-memory only field of the inode, that is, a value that is never
persisted in the inode item stored on the fs/subvol btree. So if an
inode is evicted and loaded again, the value for last_unlink_trans is
set to 0, which prevents the fsync from logging the parent directory
at btrfs_log_inode_parent(). So fix this by setting last_unlink_trans
to the id of the transaction that last modified the inode when we
load the inode. This is a pessimistic approach but it always ensures
correctness with the trade off of ocassional full transaction commits
when an fsync is done against the inode in the same transaction where
it was evicted and reloaded when our inode is a directory and often
logging its parent unnecessarily when our inode is not a directory.

The following test case for fstests triggers the problem:

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"
  tmp=/tmp/$$
  status=1	# failure is the default!
  trap "_cleanup; exit \$status" 0 1 2 3 15

  _cleanup()
  {
      _cleanup_flakey
      rm -f $tmp.*
  }

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter
  . ./common/dmflakey

  # real QA test starts here
  _need_to_be_root
  _supported_fs generic
  _supported_os Linux
  _require_scratch
  _require_dm_flakey
  _require_metadata_journaling $SCRATCH_DEV

  rm -f $seqres.full

  _scratch_mkfs >>$seqres.full 2>&1
  _init_flakey
  _mount_flakey

  # Create our test file with 2 hard links.
  mkdir $SCRATCH_MNT/testdir
  touch $SCRATCH_MNT/testdir/foo
  ln $SCRATCH_MNT/testdir/foo $SCRATCH_MNT/testdir/bar

  # Make sure everything done so far is durably persisted.
  sync

  # Now remove one of the links, trigger inode eviction and then fsync
  # our inode.
  unlink $SCRATCH_MNT/testdir/bar
  echo 2 > /proc/sys/vm/drop_caches
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir/foo

  # Silently drop all writes on our scratch device to simulate a power failure.
  _load_flakey_table $FLAKEY_DROP_WRITES
  _unmount_flakey

  # Allow writes again and mount the fs to trigger log/journal replay.
  _load_flakey_table $FLAKEY_ALLOW_WRITES
  _mount_flakey

  # Now verify our directory entries.
  echo "Entries in testdir:"
  ls -1 $SCRATCH_MNT/testdir

  # If we remove our inode, its parent should become empty and therefore we should
  # be able to remove the parent.
  rm -f $SCRATCH_MNT/testdir/*
  rmdir $SCRATCH_MNT/testdir

  _unmount_flakey

  # The fstests framework will call fsck against our filesystem which will verify
  # that all metadata is in a consistent state.

  status=0
  exit

The test failed on btrfs with:

  generic/098 4s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//generic/098.out.bad)
    --- tests/generic/098.out	2015-07-23 18:01:12.616175932 +0100
    +++ /home/fdmanana/git/hub/xfstests/results//generic/098.out.bad	2015-07-23 18:04:58.924138308 +0100
    @@ -1,3 +1,6 @@
     QA output created by 098
     Entries in testdir:
    +bar
     foo
    +rm: cannot remove '/home/fdmanana/btrfs-tests/scratch_1/testdir/foo': Stale file handle
    +rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/testdir': Directory not empty
    ...
    (Run 'diff -u tests/generic/098.out /home/fdmanana/git/hub/xfstests/results//generic/098.out.bad'  to see the entire diff)
  _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent (see /home/fdmanana/git/hub/xfstests/results//generic/098.full)

  $ cat /home/fdmanana/git/hub/xfstests/results//generic/098.full
  (...)
  checking fs roots
  root 5 inode 258 errors 2001, no inode item, link count wrong
     unresolved ref dir 257 index 0 namelen 3 name foo filetype 1 errors 6, no dir index, no inode ref
     unresolved ref dir 257 index 3 namelen 3 name bar filetype 1 errors 5, no dir item, no inode ref
  Checking filesystem on /dev/sdc
  (...)

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09 06:16:58 -07:00
..
9p 9p: don't leave a half-initialized inode sitting around 2015-07-12 11:22:05 -04:00
adfs fs/adfs: remove unneeded cast 2015-06-30 19:44:57 -07:00
affs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-07-04 19:36:06 -07:00
afs net: Add a struct net parameter to sock_create_kern 2015-05-11 10:50:17 -04:00
autofs4 make simple_positive() public 2015-06-23 18:02:01 -04:00
befs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-07-04 19:36:06 -07:00
bfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-04-26 17:22:07 -07:00
btrfs Btrfs: fix stale dir entries after unlink, inode eviction and fsync 2015-08-09 06:16:58 -07:00
cachefiles Merge branch 'fscache-fixes' into for-next 2015-06-23 18:01:30 -04:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-07-04 19:36:06 -07:00
cifs cifs: Unset CIFS_MOUNT_POSIX_PATHS flag when following dfs mounts 2015-06-29 14:50:22 -05:00
coda fs: cleanup slight list_entry abuse 2015-06-23 18:01:59 -04:00
configfs configfs: fix kernel infoleak through user-controlled format string 2015-07-17 16:39:53 -07:00
cramfs
debugfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-07-04 19:36:06 -07:00
devpts devpts: if initialization failed, don't crash when opening /dev/ptmx 2015-06-30 19:44:58 -07:00
dlm net: Add a struct net parameter to sock_create_kern 2015-05-11 10:50:17 -04:00
ecryptfs ioctl_compat: handle FITRIM 2015-07-09 11:42:21 -07:00
efivarfs Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-05-06 10:57:37 -07:00
efs fs/efs: femove unneeded cast 2015-06-25 17:00:42 -07:00
exofs pagemap.h: move dir_pages() over there 2015-06-23 18:02:00 -04:00
exportfs VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
ext2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-07-04 19:36:06 -07:00
ext3 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2015-06-24 20:07:10 -07:00
ext4 ioctl_compat: handle FITRIM 2015-07-09 11:42:21 -07:00
f2fs f2fs: call set_page_dirty to attach i_wb for cgroup 2015-07-25 08:54:26 -07:00
fat writeback: separate out include/linux/backing-dev-defs.h 2015-06-02 08:33:34 -06:00
freevxfs pagemap.h: move dir_pages() over there 2015-06-23 18:02:00 -04:00
fscache FS-Cache: Retain the netfs context in the retrieval op earlier 2015-04-02 14:28:53 +01:00
fuse Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-07-04 19:36:06 -07:00
gfs2 GFS2: merge window 2015-06-27 09:47:46 -07:00
hfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-07-04 19:36:06 -07:00
hfsplus Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-07-04 19:36:06 -07:00
hostfs Merge branch 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-06-22 12:51:21 -07:00
hpfs hpfs: hpfs_error: Remove static buffer, use vsprintf extension %pV instead 2015-07-09 13:35:31 -07:00
hugetlbfs mm/hugetlb: reduce arch dependent code about hugetlb_prefault_arch_hook 2015-06-24 17:49:41 -07:00
isofs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
jbd
jbd2 Revert "jbd2: speedup jbd2_journal_dirty_metadata()" 2015-06-27 09:41:50 -07:00
jffs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-07-04 19:36:06 -07:00
jfs A couple trivial fixes and an error path fix 2015-07-16 16:28:28 -07:00
kernfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2015-07-03 15:20:57 -07:00
lockd nfsd: eliminate NFSD_DEBUG 2015-04-21 16:16:02 -04:00
logfs logfs: fix a pagecache leak for symlinks 2015-05-10 22:18:28 -04:00
minix Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-07-04 19:36:06 -07:00
ncpfs ncpfs: successful rename() should invalidate caches for parents 2015-06-14 11:31:39 -04:00
nfs NFS client bugfixes for Linux 4.2 2015-07-28 09:37:44 -07:00
nfs_common
nfsd nfsd: wrap too long lines in nfsd4_encode_read 2015-06-22 14:15:05 -04:00
nilfs2 ioctl_compat: handle FITRIM 2015-07-09 11:42:21 -07:00
nls
notify Revert "fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()" 2015-07-21 16:06:53 -07:00
ntfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-07-04 19:36:06 -07:00
ocfs2 ioctl_compat: handle FITRIM 2015-07-09 11:42:21 -07:00
omfs omfs: fix potential integer overflow in allocator 2015-05-28 18:25:19 -07:00
openpromfs
overlayfs fix a braino in ovl_d_select_inode() 2015-07-12 11:22:05 -04:00
proc Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-07-18 10:49:57 -07:00
pstore Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2015-07-03 15:20:57 -07:00
qnx4
qnx6 pagemap.h: move dir_pages() over there 2015-06-23 18:02:00 -04:00
quota Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-04-26 17:22:07 -07:00
ramfs VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
reiserfs Merge branch 'akpm' (patches from Andrew) 2015-06-26 09:52:05 -07:00
romfs make new_sync_{read,write}() static 2015-04-11 22:29:40 -04:00
squashfs fs: cleanup slight list_entry abuse 2015-06-23 18:01:59 -04:00
sysfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2015-07-03 15:20:57 -07:00
sysv pagemap.h: move dir_pages() over there 2015-06-23 18:02:00 -04:00
tracefs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-07-04 19:36:06 -07:00
ubifs This pull request includes the following UBI/UBIFS changes: 2015-06-25 14:11:34 -07:00
udf udf: Don't corrupt unalloc spacetable when writing it 2015-07-09 16:38:57 +02:00
ufs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-07-04 19:36:06 -07:00
xfs xfs: remote attributes need to be considered data 2015-07-29 11:48:02 +10:00
Kconfig f2fs: relocate Kconfig from misc filesystems 2015-04-10 15:08:35 -07:00
Kconfig.binfmt mm: split ET_DYN ASLR from mmap ASLR 2015-04-14 16:49:05 -07:00
Makefile um: Remove hppfs 2015-05-31 13:23:08 +02:00
aio.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-04-16 23:27:56 -04:00
anon_inodes.c
attr.c
bad_inode.c don't bother with most of the bad_file_ops methods 2015-02-20 04:03:58 -05:00
binfmt_aout.c
binfmt_elf.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-07-04 19:36:06 -07:00
binfmt_elf_fdpic.c
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-04-26 17:22:07 -07:00
binfmt_script.c
block_dev.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-07-04 19:36:06 -07:00
buffer.c buffer: remove unusued 'ret' variable 2015-06-02 09:22:34 -06:00
char_dev.c
compat.c
compat_binfmt_elf.c
compat_ioctl.c ioctl_compat: handle FITRIM 2015-07-09 11:42:21 -07:00
coredump.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-07-04 19:36:06 -07:00
dax.c xfs: call dax_fault on read page faults for DAX 2015-07-29 11:48:00 +10:00
dcache.c freeing unlinked file indefinitely delayed 2015-07-12 11:27:04 -04:00
dcookies.c
direct-io.c direct-io: only inc/dec inode->i_dio_count for file systems 2015-04-24 15:45:28 -04:00
drop_caches.c vmscan: per memory cgroup slab shrinkers 2015-02-12 18:54:09 -08:00
eventfd.c eventfd: don't take the spinlock in eventfd_poll 2015-02-17 14:34:52 -08:00
eventpoll.c epoll: optimize setting task running after blocking 2015-02-13 21:21:40 -08:00
exec.c parisc,metag: Fix crashes due to stack randomization on stack-grows-upwards architectures 2015-05-12 22:03:44 +02:00
fcntl.c
fhandle.c vfs: read file_handle only once in handle_to_path 2015-06-02 10:29:07 -07:00
file.c fs/file.c: __fget() and dup2() atomicity rules 2015-07-01 02:31:08 -04:00
file_table.c remove the pointless include of lglock.h 2015-06-23 18:02:00 -04:00
filesystems.c
fs-writeback.c block: export bio_associate_*() and wbc_account_io() 2015-07-23 13:36:44 -06:00
fs_pin.c fs_pin: Allow for the possibility that m_list or s_list go unused. 2015-04-09 11:39:55 -05:00
fs_struct.c
inode.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-07-04 19:36:06 -07:00
internal.h overlayfs: Make f_path always point to the overlay and f_inode to the underlay 2015-06-19 03:19:32 -04:00
ioctl.c fsioctl.c: make generic_block_fiemap() signal-tolerant 2015-02-10 14:30:30 -08:00
libfs.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-07-04 19:36:06 -07:00
locks.c locks: inline posix_lock_file_wait and flock_lock_file_wait 2015-07-13 06:29:11 -04:00
mbcache.c
mount.h fs: use seq_open_private() for proc_mounts 2015-06-30 19:44:56 -07:00
mpage.c writeback: implement foreign cgroup inode detection 2015-06-02 08:40:20 -06:00
namei.c link_path_walk(): be careful when failing with ENOTDIR 2015-08-01 20:18:38 -04:00
namespace.c mnt: In detach_mounts detach the appropriate unmounted mount 2015-07-23 11:31:15 -05:00
no-block.c
nsfs.c VFS: assorted weird filesystems: d_inode() annotations 2015-04-15 15:06:58 -04:00
open.c fs: Call security_ops->inode_killpriv on truncate 2015-06-23 18:01:09 -04:00
pipe.c VFS: assorted weird filesystems: d_inode() annotations 2015-04-15 15:06:58 -04:00
pnode.c mnt: Don't propagate unmounts to locked mounts 2015-04-02 20:34:20 -05:00
pnode.h mnt: Clarify and correct the disconnect logic in umount_tree 2015-07-22 20:33:27 -05:00
posix_acl.c fs/posix_acl.c: make posix_acl_create() safer and cleaner 2015-06-23 18:01:07 -04:00
proc_namespace.c fs: use seq_open_private() for proc_mounts 2015-06-30 19:44:56 -07:00
read_write.c new_sync_write(): discard ->ki_pos unless the return value is positive 2015-04-11 22:29:46 -04:00
readdir.c
select.c locking/arch: Rename set_mb() to smp_store_mb() 2015-05-19 08:32:00 +02:00
seq_file.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-07-04 19:36:06 -07:00
signalfd.c
splice.c Merge branch 'akpm' (patches from Andrew) 2015-06-24 20:47:21 -07:00
stack.c
stat.c VFS: assorted d_backing_inode() annotations 2015-04-15 15:06:59 -04:00
statfs.c
super.c fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation 2015-07-01 01:50:06 -04:00
sync.c vfs: add support for a lazytime mount option 2015-02-05 02:45:00 -05:00
timerfd.c
utimes.c
xattr.c evm: fix potential race when removing xattrs 2015-05-21 13:28:47 -04:00