linux/fs/btrfs
Filipe Manana 471d557afe Btrfs: fix loss of prealloc extents past i_size after fsync log replay
Currently if we allocate extents beyond an inode's i_size (through the
fallocate system call) and then fsync the file, we log the extents but
after a power failure we replay them and then immediately drop them.
This behaviour happens since about 2009, commit c71bf099ab ("Btrfs:
Avoid orphan inodes cleanup while replaying log"), because it marks
the inode as an orphan instead of dropping any extents beyond i_size
before replaying logged extents, so after the log replay, and while
the mount operation is still ongoing, we find the inode marked as an
orphan and then perform a truncation (drop extents beyond the inode's
i_size). Because the processing of orphan inodes is still done
right after replaying the log and before the mount operation finishes,
the intention of that commit does not make any sense (at least as
of today). However reverting that behaviour is not enough, because
we can not simply discard all extents beyond i_size and then replay
logged extents, because we risk dropping extents beyond i_size created
in past transactions, for example:

  add prealloc extent beyond i_size
  fsync - clears the flag BTRFS_INODE_NEEDS_FULL_SYNC from the inode
  transaction commit
  add another prealloc extent beyond i_size
  fsync - triggers the fast fsync path
  power failure

In that scenario, we would drop the first extent and then replay the
second one. To fix this just make sure that all prealloc extents
beyond i_size are logged, and if we find too many (which is far from
a common case), fallback to a full transaction commit (like we do when
logging regular extents in the fast fsync path).

Trivial reproducer:

 $ mkfs.btrfs -f /dev/sdb
 $ mount /dev/sdb /mnt
 $ xfs_io -f -c "pwrite -S 0xab 0 256K" /mnt/foo
 $ sync
 $ xfs_io -c "falloc -k 256K 1M" /mnt/foo
 $ xfs_io -c "fsync" /mnt/foo
 <power failure>

 # mount to replay log
 $ mount /dev/sdb /mnt
 # at this point the file only has one extent, at offset 0, size 256K

A test case for fstests follows soon, covering multiple scenarios that
involve adding prealloc extents with previous shrinking truncates and
without such truncates.

Fixes: c71bf099ab ("Btrfs: Avoid orphan inodes cleanup while replaying log")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-12 14:50:36 +02:00
..
tests btrfs: tests/qgroup: Fix wrong tree backref level 2018-03-31 02:01:05 +02:00
Kconfig btrfs: Remove custom crc32c init code 2018-03-26 15:09:39 +02:00
Makefile btrfs: Remove custom crc32c init code 2018-03-26 15:09:39 +02:00
acl.c btrfs: remove stale comments about fs_mutex 2018-03-31 02:01:07 +02:00
async-thread.c Btrfs: fix confusing worker helper info in stacktrace 2017-10-30 12:27:57 +01:00
async-thread.h btrfs: constify tracepoint arguments 2017-08-16 14:19:53 +02:00
backref.c btrfs: Validate child tree block's level and first key 2018-03-31 02:01:06 +02:00
backref.h btrfs: add more __cold annotations 2018-03-26 15:09:39 +02:00
btrfs_inode.h btrfs: open code trivial helper btrfs_page_exists_in_range 2018-03-31 01:26:50 +02:00
check-integrity.c btrfs: Remove custom crc32c init code 2018-03-26 15:09:39 +02:00
check-integrity.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
compression.c btrfs: add more __cold annotations 2018-03-26 15:09:39 +02:00
compression.h btrfs: add more __cold annotations 2018-03-26 15:09:39 +02:00
ctree.c btrfs: update barrier in should_cow_block 2018-03-31 02:01:06 +02:00
ctree.h btrfs: qgroup: Use root::qgroup_meta_rsv_* to record qgroup meta reserved space 2018-03-31 02:01:04 +02:00
dedupe.h btrfs: expand cow_file_range() to support in-band dedup and subpage-blocksize 2016-07-26 13:52:25 +02:00
delayed-inode.c btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item 2018-03-31 02:01:03 +02:00
delayed-inode.h btrfs: add more __cold annotations 2018-03-26 15:09:39 +02:00
delayed-ref.c btrfs: use lockdep_assert_held for spinlocks 2018-03-31 02:01:06 +02:00
delayed-ref.h btrfs: add more __cold annotations 2018-03-26 15:09:39 +02:00
dev-replace.c btrfs: split dev-replace locking helpers for read and write 2018-03-31 02:01:07 +02:00
dev-replace.h btrfs: split dev-replace locking helpers for read and write 2018-03-31 02:01:07 +02:00
dir-item.c btrfs: Remove custom crc32c init code 2018-03-26 15:09:39 +02:00
disk-io.c Btrfs: clean up resources during umount after trans is aborted 2018-04-12 14:49:47 +02:00
disk-io.h btrfs: Validate child tree block's level and first key 2018-03-31 02:01:06 +02:00
export.c btrfs: Cleanup existing name_len checks 2018-01-22 16:08:12 +01:00
export.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
extent-tree.c btrfs: Fix possible softlock on single core machines 2018-04-05 19:22:35 +02:00
extent_io.c btrfs: lift errors from add_extent_changeset to the callers 2018-03-31 02:03:25 +02:00
extent_io.h btrfs: remove unused parameters from extent_submit_bio_done_t 2018-03-31 01:26:55 +02:00
extent_map.c btrfs: add more __cold annotations 2018-03-26 15:09:39 +02:00
extent_map.h btrfs: add more __cold annotations 2018-03-26 15:09:39 +02:00
file-item.c Merge branch 'for-4.13-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux 2017-07-05 16:41:23 -07:00
file.c btrfs: qgroup: Use separate meta reservation type for delalloc 2018-03-31 01:41:14 +02:00
free-space-cache.c btrfs: qgroup: Use separate meta reservation type for delalloc 2018-03-31 01:41:14 +02:00
free-space-cache.h btrfs: free-space-cache, clean up unnecessary root arguments 2017-02-17 12:03:56 +01:00
free-space-tree.c btrfs: use reada direction enum instead of constant value in load_free_space_tree 2018-03-26 15:09:37 +02:00
free-space-tree.h btrfs: expose internal free space tree routine only if sanity tests are enabled 2017-08-18 16:36:29 +02:00
inode-item.c btrfs: Remove custom crc32c init code 2018-03-26 15:09:39 +02:00
inode-map.c btrfs: qgroup: Use separate meta reservation type for delalloc 2018-03-31 01:41:14 +02:00
inode-map.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
inode.c btrfs: qgroup: Use separate meta reservation type for delalloc 2018-03-31 01:41:14 +02:00
ioctl.c btrfs: user proper type for btrfs_mask_flags flags 2018-03-31 02:01:07 +02:00
locking.c btrfs: Relax memory barrier in btrfs_tree_unlock 2018-03-31 01:26:51 +02:00
locking.h
lzo.c btrfs: Remove unused tot_len var from lzo_decompress 2018-03-31 01:26:57 +02:00
math.h
ordered-data.c btrfs: qgroup: Use separate meta reservation type for delalloc 2018-03-31 01:41:14 +02:00
ordered-data.h btrfs: add more __cold annotations 2018-03-26 15:09:39 +02:00
orphan.c
print-tree.c btrfs: Validate child tree block's level and first key 2018-03-31 02:01:06 +02:00
print-tree.h btrfs: get fs_info from eb in btrfs_print_tree, remove argument 2017-08-16 16:12:03 +02:00
props.c btrfs: drop underscores from exported xattr functions 2018-03-26 15:09:41 +02:00
props.h
qgroup.c btrfs: use lockdep_assert_held for spinlocks 2018-03-31 02:01:06 +02:00
qgroup.h btrfs: qgroup: Introduce function to convert META_PREALLOC into META_PERTRANS 2018-03-31 01:41:14 +02:00
raid56.c Btrfs: replace: cache rbio when rebuild data on missing device 2018-03-31 01:41:12 +02:00
raid56.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
rcu-string.h
reada.c btrfs: split dev-replace locking helpers for read and write 2018-03-31 02:01:07 +02:00
ref-verify.c btrfs: Validate child tree block's level and first key 2018-03-31 02:01:06 +02:00
ref-verify.h Btrfs: add a extent ref verify tool 2017-10-30 12:28:00 +01:00
relocation.c btrfs: Validate child tree block's level and first key 2018-03-31 02:01:06 +02:00
root-tree.c btrfs: Cleanup existing name_len checks 2018-01-22 16:08:12 +01:00
scrub.c btrfs: split dev-replace locking helpers for read and write 2018-03-31 02:01:07 +02:00
send.c Btrfs: send: fix typo in TLV_PUT 2018-03-26 15:09:42 +02:00
send.h btrfs: fix send ioctl on 32bit with 64bit kernel 2017-10-30 12:27:59 +01:00
struct-funcs.c btrfs: struct-funcs, constify readers 2017-08-16 14:19:53 +02:00
super.c btrfs: use RCU in btrfs_show_devname for device list traversal 2018-03-31 02:01:06 +02:00
sysfs.c btrfs: defer adding raid type kobject until after chunk relocation 2018-03-31 01:41:12 +02:00
sysfs.h Merge branch 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux 2017-11-14 13:35:29 -08:00
transaction.c btrfs: qgroup: Split meta rsv type into meta_prealloc and meta_pertrans 2018-03-31 01:41:14 +02:00
transaction.h btrfs: Remove code referencing unused TRANS_USERSPACE 2018-03-31 01:26:51 +02:00
tree-checker.c btrfs: add more __cold annotations 2018-03-26 15:09:39 +02:00
tree-checker.h btrfs: tree-checker: Replace root parameter with fs_info 2018-03-26 15:09:38 +02:00
tree-defrag.c btrfs: add define for oldest generation 2018-03-31 01:26:50 +02:00
tree-log.c Btrfs: fix loss of prealloc extents past i_size after fsync log replay 2018-04-12 14:50:36 +02:00
tree-log.h btrfs: Remove root argument from btrfs_log_dentry_safe 2018-03-26 15:09:42 +02:00
ulist.c btrfs: ulist: rename ulist_fini to ulist_release 2017-02-17 12:03:50 +01:00
ulist.h btrfs: ulist: rename ulist_fini to ulist_release 2017-02-17 12:03:50 +01:00
uuid-tree.c btrfs: add define for oldest generation 2018-03-31 01:26:50 +02:00
volumes.c btrfs: split dev-replace locking helpers for read and write 2018-03-31 02:01:07 +02:00
volumes.h btrfs: rename btrfs_close_extra_device to btrfs_free_extra_devids 2018-03-26 15:09:42 +02:00
xattr.c btrfs: adjust return type of btrfs_getxattr 2018-03-26 15:09:41 +02:00
xattr.h btrfs: move btrfs_listxattr prototype to xattr.h 2018-03-26 15:09:41 +02:00
zlib.c btrfs: allow to set compression level for zlib 2017-11-01 20:45:29 +01:00
zstd.c btrfs: move some zstd work data from stack to workspace 2018-01-22 16:08:14 +01:00