Commit Graph

2649 Commits

Author SHA1 Message Date
Gao Xiang eecfa42716 f2fs: use xattr_prefix to wrap up
Let's use xattr_prefix instead of open code.
No logic changes.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-15 20:59:46 -08:00
Jaegeuk Kim 812a95977f f2fs: sync filesystem after roll-forward recovery
Some works after roll-forward recovery can get an error which will release
all the data structures. Let's flush them in order to make it clean.

One possible corruption came from:

[   90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null)
[   90.675349] Call trace:
[   90.677869]  __list_del_entry_valid+0x94/0xb4
[   90.682351]  remove_dirty_inode+0xac/0x114
[   90.686563]  __f2fs_write_data_pages+0x6a8/0x6c8
[   90.691302]  f2fs_write_data_pages+0x40/0x4c
[   90.695695]  do_writepages+0x80/0xf0
[   90.699372]  __writeback_single_inode+0xdc/0x4ac
[   90.704113]  writeback_sb_inodes+0x280/0x440
[   90.708501]  wb_writeback+0x1b8/0x3d0
[   90.712267]  wb_workfn+0x1a8/0x4d4
[   90.715765]  process_one_work+0x1c0/0x3d4
[   90.719883]  worker_thread+0x224/0x344
[   90.723739]  kthread+0x120/0x130
[   90.727055]  ret_from_fork+0x10/0x18

Reported-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-15 20:59:45 -08:00
Jaegeuk Kim 0e0667b625 f2fs: flush quota blocks after turnning it off
After quota_off, we'll get some dirty blocks. If put_super don't have a chance
to flush them by checkpoint, it causes NULL pointer exception in end_io after
iput(node_inode). (e.g., by checkpoint=disable)

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-15 20:59:45 -08:00
Jaegeuk Kim 11ac8ef8d8 f2fs: avoid null pointer exception in dcc_info
If dcc_info is not set yet, we can get null pointer panic.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-15 20:59:45 -08:00
Jaegeuk Kim b460866d27 f2fs: don't wake up too frequently, if there is lots of IOs
Otherwise, it wakes up discard thread which will sleep again by busy IOs
in a loop.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-15 20:59:45 -08:00
Jaegeuk Kim b86232536c f2fs: try to keep CP_TRIMMED_FLAG after successful umount
If every discard were issued successfully, we can avoid further discard.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-15 20:59:45 -08:00
Jaegeuk Kim db610a640e f2fs: add quick mode of checkpoint=disable for QA
This mode returns mount() quickly with EAGAIN. We can trigger this by
shutdown(F2FS_GOING_DOWN_NEED_FSCK).

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-15 20:59:37 -08:00
Ming Lei 6dc4f100c1 block: allow bio_for_each_segment_all() to iterate over multi-page bvec
This patch introduces one extra iterator variable to bio_for_each_segment_all(),
then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.

Given it is just one mechannical & simple change on all bio_for_each_segment_all()
users, this patch does tree-wide change in one single patch, so that we can
avoid to use a temporary helper for this conversion.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15 08:40:11 -07:00
Jaegeuk Kim 03f2c02d8b f2fs: run discard jobs when put_super
When we umount f2fs, we need to avoid long delay due to discard commands, which
is actually taking tens of seconds, if storage is very slow on UNMAP. So, this
patch introduces timeout-based work on it.

By default, let me give 5 seconds for discard.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 08:55:34 -08:00
Chandan Rajendra 643fa9612b fscrypt: remove filesystem specific build config option
In order to have a common code base for fscrypt "post read" processing
for all filesystems which support encryption, this commit removes
filesystem specific build config option (e.g. CONFIG_EXT4_FS_ENCRYPTION)
and replaces it with a build option (i.e. CONFIG_FS_ENCRYPTION) whose
value affects all the filesystems making use of fscrypt.

Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-01-23 23:56:43 -05:00
Chandan Rajendra 62230e0d70 f2fs: use IS_ENCRYPTED() to check encryption status
This commit removes the f2fs specific f2fs_encrypted_inode() and makes
use of the generic IS_ENCRYPTED() macro to check for the encryption
status of an inode.

Acked-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-01-23 23:56:43 -05:00
Chao Yu 2010987365 f2fs: fix to set sbi dirty correctly
In order to record direct IO count, we add two additional type in
enum count_type: F2FS_DIO_{WRITE,READ}, but those IO won't dirty
filesystem metadata, so we don't need to set filesystem dirty in
inc_page_count(), fix it.

Fixes: 02b16d0a34 ("f2fs: add to account direct IO")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-22 15:31:26 -08:00
Chao Yu f9aa52a8cb f2fs: fix to initialize variable to avoid UBSAN/smatch warning
As Dan Carpenter as below:

The patch df634f444ee9: "f2fs: use rb_*_cached friends" from Oct 4,
2018, leads to the following static checker warning:

	fs/f2fs/extent_cache.c:606 f2fs_update_extent_tree_range()
	error: uninitialized symbol 'leftmost'.

And also Eric Biggers, and Kyungtae Kim reported, there is an UBSAN
warning described as below:

We report a bug in linux-4.20.2: "UBSAN: Undefined behaviour in
fs/f2fs/extent_cache.c"

kernel config: https://kt0755.github.io/etc/config_v4.20_stable
repro: https://kt0755.github.io/etc/repro.4a3e7.c (f2fs is mounted on
/mnt/f2fs/)

This arose in f2fs_update_extent_tree_range (fs/f2fs/extent_cache.c:605).
It seems that, for some reason, its last argument became "24"
although that was supposed to be bool type.

=========================================
UBSAN: Undefined behaviour in fs/f2fs/extent_cache.c:605:4
load of value 24 is not a valid value for type '_Bool'
CPU: 0 PID: 6774 Comm: syz-executor5 Not tainted 4.20.2 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xb1/0x118 lib/dump_stack.c:113
 ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
 __ubsan_handle_load_invalid_value+0x17a/0x1be lib/ubsan.c:457
 f2fs_update_extent_tree_range+0x1d4a/0x1d50 fs/f2fs/extent_cache.c:605
 f2fs_update_extent_cache+0x2b6/0x350 fs/f2fs/extent_cache.c:804
 f2fs_update_data_blkaddr+0x61/0x70 fs/f2fs/data.c:656
 f2fs_outplace_write_data+0x1d6/0x4b0 fs/f2fs/segment.c:3140
 f2fs_convert_inline_page+0x86d/0x2060 fs/f2fs/inline.c:163
 f2fs_convert_inline_inode+0x6b5/0xad0 fs/f2fs/inline.c:208
 f2fs_preallocate_blocks+0x78b/0xb00 fs/f2fs/data.c:982
 f2fs_file_write_iter+0x31b/0xf40 fs/f2fs/file.c:3062
 call_write_iter include/linux/fs.h:1857 [inline]
 new_sync_write fs/read_write.c:474 [inline]
 __vfs_write+0x538/0x6e0 fs/read_write.c:487
 vfs_write+0x1b3/0x520 fs/read_write.c:549
 ksys_write+0xde/0x1c0 fs/read_write.c:598
 __do_sys_write fs/read_write.c:610 [inline]
 __se_sys_write fs/read_write.c:607 [inline]
 __x64_sys_write+0x7e/0xc0 fs/read_write.c:607
 do_syscall_64+0xbe/0x4f0 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4497b9
Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f1ea15edc68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007f1ea15ee6cc RCX: 00000000004497b9
RDX: 0000000000001000 RSI: 0000000020000140 RDI: 0000000000000013
RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 000000000000bb50 R14: 00000000006f4bf0 R15: 00007f1ea15ee700
=========================================

As I checked, this uninitialized variable won't cause extent cache
corruption, but in order to avoid such kind of warning of both UBSAN
and smatch, fix to initialize related variable.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Kyungtae Kim <kt0755@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-22 15:31:26 -08:00
Sheng Yong ac92985864 f2fs: UBSAN: set boolean value iostat_enable correctly
When setting /sys/fs/f2fs/<DEV>/iostat_enable with non-bool value, UBSAN
reports the following warning.

[ 7562.295484] ================================================================================
[ 7562.296531] UBSAN: Undefined behaviour in fs/f2fs/f2fs.h:2776:10
[ 7562.297651] load of value 64 is not a valid value for type '_Bool'
[ 7562.298642] CPU: 1 PID: 7487 Comm: dd Not tainted 4.20.0-rc4+ #79
[ 7562.298653] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 7562.298662] Call Trace:
[ 7562.298760]  dump_stack+0x46/0x5b
[ 7562.298811]  ubsan_epilogue+0x9/0x40
[ 7562.298830]  __ubsan_handle_load_invalid_value+0x72/0x90
[ 7562.298863]  f2fs_file_write_iter+0x29f/0x3f0
[ 7562.298905]  __vfs_write+0x115/0x160
[ 7562.298922]  vfs_write+0xa7/0x190
[ 7562.298934]  ksys_write+0x50/0xc0
[ 7562.298973]  do_syscall_64+0x4a/0xe0
[ 7562.298992]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 7562.299001] RIP: 0033:0x7fa45ec19c00
[ 7562.299004] Code: 73 01 c3 48 8b 0d 88 92 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d dd eb 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ce 8f 01 00 48 89 04 24
[ 7562.299044] RSP: 002b:00007ffca52b49e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 7562.299052] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa45ec19c00
[ 7562.299059] RDX: 0000000000000400 RSI: 000000000093f000 RDI: 0000000000000001
[ 7562.299065] RBP: 000000000093f000 R08: 0000000000000004 R09: 0000000000000000
[ 7562.299071] R10: 00007ffca52b47b0 R11: 0000000000000246 R12: 0000000000000400
[ 7562.299077] R13: 000000000093f000 R14: 000000000093f400 R15: 0000000000000000
[ 7562.299091] ================================================================================

So, if iostat_enable is enabled, set its value as true.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-22 15:31:26 -08:00
Sheng Yong 2f84babfe5 f2fs: add brackets for macros
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-22 15:31:26 -08:00
Sheng Yong 720db06863 f2fs: check if file namelen exceeds max value
Dentry bitmap is not enough to detect incorrect dentries. So this patch
also checks the namelen value of a dentry.

Signed-off-by: Gong Chen <gongchen4@huawei.com>
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-22 15:31:26 -08:00
Chao Yu ddf06b753a f2fs: fix to trigger fsck if dirent.name_len is zero
While traversing dirents in f2fs_fill_dentries(), if bitmap is valid,
filename length should not be zero, otherwise, directory structure
consistency could be corrupted, in this case, let's print related
info and set SBI_NEED_FSCK to trigger fsck for repairing.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-22 15:31:25 -08:00
Greg Kroah-Hartman 21acc07d33 f2fs: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Cc: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22 14:25:25 +01:00
Greg Kroah-Hartman c20e57b32d f2fs: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-08 20:41:09 -08:00
Jaegeuk Kim 5d539245cb f2fs: export FS_NOCOW_FL flag to user
This exports pin_file status to user.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-08 20:41:09 -08:00
Chao Yu 36c5733f95 f2fs: check inject_rate validity during configuring
Type of inject_rate is unsigned int, let's check new value's
validity during configuring.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-08 20:41:09 -08:00
YueHaibing 8e11403876 f2fs: remove set but not used variable 'err'
Fixes gcc '-Wunused-but-set-variable' warning:

fs/f2fs/data.c: In function 'f2fs_dio_submit_bio':
fs/f2fs/data.c:2585:6: warning:
 variable 'err' set but not used [-Wunused-but-set-variable]

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-08 09:34:27 -08:00
Chengguang Xu f365c6cc85 f2fs: change error code to -ENOMEM from -EINVAL
The error case of failing allocating memory should
return -ENOMEM.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-08 09:34:27 -08:00
Jaegeuk Kim 7c77bf7de1 f2fs: don't access node/meta inode mapping after iput
This fixes wrong access of address spaces of node and meta inodes after iput.

Fixes: 60aa4d5536 ("f2fs: fix use-after-free issue when accessing sbi->stat_info")
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-08 09:34:27 -08:00
Jaegeuk Kim 31867b23d7 f2fs: wait on atomic writes to count F2FS_CP_WB_DATA
Otherwise, we can get wrong counts incurring checkpoint hang.

IO_W (CP:  -24, Data:   24, Flush: (   0    0    1), Discard: (   0    0))

Thread A                        Thread B
- f2fs_write_data_pages
 -  __write_data_page
  - f2fs_submit_page_write
   - inc_page_count(F2FS_WB_DATA)
     type is F2FS_WB_DATA due to file is non-atomic one
- f2fs_ioc_start_atomic_write
 - set_inode_flag(FI_ATOMIC_FILE)
                                - f2fs_write_end_io
                                 - dec_page_count(F2FS_WB_CP_DATA)
                                   type is F2FS_WB_DATA due to file becomes
                                   atomic one

Cc: <stable@vger.kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-08 09:34:27 -08:00
Linus Torvalds 9ab97aea85 f2fs-for-4.21-rc1
In this round, we've focused on bug fixes since Pixel devices have been
 shipping with f2fs. Some of them were related to hardware encryption support
 which are actually not an issue in mainline, but would be better to merge
 them in order to avoid potential bugs.
 
 Enhancement:
  - do GC sub-sections when the section is large
  - add a flag in ioctl(SHUTDOWN) to trigger fsck for QA
  - use kvmalloc() in order to give another chance to avoid ENOMEM
 
 Bug fix:
  - fix accessing memory boundaries in a malformed iamge
  - GC gives stale unencrypted block
  - GC counts in large sections
  - detect idle time more precisely
  - block allocation of DIO writes
  - race conditions between write_begin and write_checkpoint
  - allow GCs for node segments via ioctl()
 
 There are various clean-ups and minor bug fixes as well.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAlwmkzcACgkQQBSofoJI
 UNLIlw//RUd8rH8VMW/BcXQ3fwB8qFCk296cIVlWQu12BV7vmMKY886Sc+C9uAuR
 9Lze4D6Iyg1k2KLkaWKhnE2E4ZWQ2O5zVf11r4amm1bN9glPJAKfLWbenJhMfeGa
 IdWo1ksntxgjfqLim2QlGSv5cBUjEo8zf32D446BbQURQMD/mm/lndywItS5vTx1
 2eGKJ3pSXX6geFzyf1pD/wmC/7j6roKaTKAIMBagIuINNukeHctT/gRfklmYmOFC
 lRbXlsYhjIfGln6gktY6r4qyZbouOMad2fDl+dkp+UewQM4/2xLUaMqiN2eqd7Hr
 cag9LGe80MGgWwesqS6VOJHQviOlNd78Ie//iO1+ZsaIxKrvgzocRSDdxZswv+Vq
 reZNsEKH0s4/0ik9hiprW1NdQLvtg7e/UWPyrRvVRlDVitW/BXOWRBpGVgkdg01K
 OmAxNrVcr2+Gxp8jZL7M+rQdpApL/ms31x9HE/8u5ABBZfmCy3zhpICTZ6Hdngp1
 3iUi2PvKUg+Y7KgII1nK2HLDpmZZ5UkLm8NhxLI0ntBWExjQjOBasOxPJf4en/UR
 TAtK3ozNKim3fQ01NCBy3B8WMKRtovxLK6PgqNPROJparlKJ3FEHnPIrYKglDUPg
 sxWCm1gFSraRGP86TaeGeD+v+PP6erI9CTaf6F/U3pWFgIpHoPg=
 =NEns
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've focused on bug fixes since Pixel devices have
  been shipping with f2fs. Some of them were related to hardware
  encryption support which are actually not an issue in mainline, but
  would be better to merge them in order to avoid potential bugs.

  Enhancements:
   - do GC sub-sections when the section is large
   - add a flag in ioctl(SHUTDOWN) to trigger fsck for QA
   - use kvmalloc() in order to give another chance to avoid ENOMEM

  Bug fixes:
   - fix accessing memory boundaries in a malformed iamge
   - GC gives stale unencrypted block
   - GC counts in large sections
   - detect idle time more precisely
   - block allocation of DIO writes
   - race conditions between write_begin and write_checkpoint
   - allow GCs for node segments via ioctl()

  There are various clean-ups and minor bug fixes as well"

* tag 'f2fs-for-4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (43 commits)
  f2fs: sanity check of xattr entry size
  f2fs: fix use-after-free issue when accessing sbi->stat_info
  f2fs: check PageWriteback flag for ordered case
  f2fs: fix validation of the block count in sanity_check_raw_super
  f2fs: fix missing unlock(sbi->gc_mutex)
  f2fs: fix to dirty inode synchronously
  f2fs: clean up structure extent_node
  f2fs: fix block address for __check_sit_bitmap
  f2fs: fix sbi->extent_list corruption issue
  f2fs: clean up checkpoint flow
  f2fs: flush stale issued discard candidates
  f2fs: correct wrong spelling, issing_*
  f2fs: use kvmalloc, if kmalloc is failed
  f2fs: remove redundant comment of unused wio_mutex
  f2fs: fix to reorder set_page_dirty and wait_on_page_writeback
  f2fs: clear PG_writeback if IPU failed
  f2fs: add an ioctl() to explicitly trigger fsck later
  f2fs: avoid frequent costly fsck triggers
  f2fs: fix m_may_create to make OPU DIO write correctly
  f2fs: fix to update new block address correctly for OPU
  ...
2018-12-31 09:41:37 -08:00
Jan Kara ab41ee6879 mm: migrate: drop unused argument of migrate_page_move_mapping()
All callers of migrate_page_move_mapping() now pass NULL for 'head'
argument.  Drop it.

Link: http://lkml.kernel.org/r/20181211172143.7358-7-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:51 -08:00
Jaegeuk Kim 64beba0558 f2fs: sanity check of xattr entry size
There is a security report where f2fs_getxattr() has a hole to expose wrong
memory region when the image is malformed like this.

f2fs_getxattr: entry->e_name_len: 4, size: 12288, buffer_size: 16384, len: 4

Cc: <stable@vger.kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26 19:56:23 -08:00
Sahitya Tummala 60aa4d5536 f2fs: fix use-after-free issue when accessing sbi->stat_info
iput() on sbi->node_inode can update sbi->stat_info
in the below context, if the f2fs_write_checkpoint()
has failed with error.

f2fs_balance_fs_bg+0x1ac/0x1ec
f2fs_write_node_pages+0x4c/0x260
do_writepages+0x80/0xbc
__writeback_single_inode+0xdc/0x4ac
writeback_single_inode+0x9c/0x144
write_inode_now+0xc4/0xec
iput+0x194/0x22c
f2fs_put_super+0x11c/0x1e8
generic_shutdown_super+0x70/0xf4
kill_block_super+0x2c/0x5c
kill_f2fs_super+0x44/0x50
deactivate_locked_super+0x60/0x8c
deactivate_super+0x68/0x74
cleanup_mnt+0x40/0x78

Fix this by moving f2fs_destroy_stats() further below iput() in
both f2fs_put_super() and f2fs_fill_super() paths.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26 15:18:10 -08:00
Chao Yu bae0ee7a76 f2fs: check PageWriteback flag for ordered case
For all ordered cases in f2fs_wait_on_page_writeback(), we need to
check PageWriteback status, so let's clean up to relocate the check
into f2fs_wait_on_page_writeback().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26 15:16:56 -08:00
Martin Blumenstingl 88960068f2 f2fs: fix validation of the block count in sanity_check_raw_super
Treat "block_count" from struct f2fs_super_block as 64-bit little endian
value in sanity_check_raw_super() because struct f2fs_super_block
declares "block_count" as "__le64".

This fixes a bug where the superblock validation fails on big endian
devices with the following error:
  F2FS-fs (sda1): Wrong segment_count / block_count (61439 > 0)
  F2FS-fs (sda1): Can't find valid F2FS filesystem in 1th superblock
  F2FS-fs (sda1): Wrong segment_count / block_count (61439 > 0)
  F2FS-fs (sda1): Can't find valid F2FS filesystem in 2th superblock
As result of this the partition cannot be mounted.

With this patch applied the superblock validation works fine and the
partition can be mounted again:
  F2FS-fs (sda1): Mounted with checkpoint version = 7c84

My little endian x86-64 hardware was able to mount the partition without
this fix.
To confirm that mounting f2fs filesystems works on big endian machines
again I tested this on a 32-bit MIPS big endian (lantiq) device.

Fixes: 0cfe75c5b0 ("f2fs: enhance sanity_check_raw_super() to avoid potential overflows")
Cc: stable@vger.kernel.org
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26 15:16:55 -08:00
Jaegeuk Kim 8f31b4665c f2fs: fix missing unlock(sbi->gc_mutex)
This fixes missing unlock call.

Cc: <stable@vger.kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26 15:16:55 -08:00
Chao Yu b32e019049 f2fs: fix to dirty inode synchronously
If user change inode's i_flags via ioctl, let's add it into global
dirty list, so that checkpoint can guarantee its persistence before
fsync, it can make checkpoint keeping strong consistency.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26 15:16:55 -08:00
Chao Yu c0362117c3 f2fs: clean up structure extent_node
The union in struct extent_node wass only to indicate below fields

	struct rb_node rb_node;
	union {
		struct {
			unsigned int fofs;
			unsigned int len;
		...
	...

can be parsed as fields in struct rb_entry, but they were never be
used explicitly before, so let's remove them for cleanup.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26 15:16:55 -08:00
Qiuyang Sun 9249dded7b f2fs: fix block address for __check_sit_bitmap
Should use lstart (logical start address) instead of start (in dev) here.
This fixes a bug in multi-device scenarios.

Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26 15:16:55 -08:00
Sahitya Tummala e4589fa545 f2fs: fix sbi->extent_list corruption issue
When there is a failure in f2fs_fill_super() after/during
the recovery of fsync'd nodes, it frees the current sbi and
retries again. This time the mount is successful, but the files
that got recovered before retry, still holds the extent tree,
whose extent nodes list is corrupted since sbi and sbi->extent_list
is freed up. The list_del corruption issue is observed when the
file system is getting unmounted and when those recoverd files extent
node is being freed up in the below context.

list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
<...>
kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
lr : __list_del_entry_valid+0x94/0xb4
pc : __list_del_entry_valid+0x94/0xb4
<...>
Call trace:
__list_del_entry_valid+0x94/0xb4
__release_extent_node+0xb0/0x114
__free_extent_tree+0x58/0x7c
f2fs_shrink_extent_tree+0xdc/0x3b0
f2fs_leave_shrinker+0x28/0x7c
f2fs_put_super+0xfc/0x1e0
generic_shutdown_super+0x70/0xf4
kill_block_super+0x2c/0x5c
kill_f2fs_super+0x44/0x50
deactivate_locked_super+0x60/0x8c
deactivate_super+0x68/0x74
cleanup_mnt+0x40/0x78
__cleanup_mnt+0x1c/0x28
task_work_run+0x48/0xd0
do_notify_resume+0x678/0xe98
work_pending+0x8/0x14

Fix this by not creating extents for those recovered files if shrinker is
not registered yet. Once mount is successful and shrinker is registered,
those files can have extents again.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26 15:16:54 -08:00
Chao Yu 8ec18bff7b f2fs: clean up checkpoint flow
This patch cleans up checkpoint flow a bit:
- remove unneeded circulation of flushing meta pages.
- don't flush nat_bits pages in prior to other checkpoint pages.
- add bug_on to check remained meta pages after flushing.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26 15:16:54 -08:00
Jaegeuk Kim 76c7bfb3a8 f2fs: flush stale issued discard candidates
Sometimes, I could observe # of issuing_discard to be 1 which blocks background
jobs due to is_idle()=false.
The only way to get out of it was to trigger gc_urgent. This patch avoids that
by checking any candidates as done in the list.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26 15:16:54 -08:00
Jaegeuk Kim 72691af6db f2fs: correct wrong spelling, issing_*
Let's use "queued" instead of "issuing".

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26 15:16:54 -08:00
Jaegeuk Kim 5222595d09 f2fs: use kvmalloc, if kmalloc is failed
One report says memalloc failure during mount.

 (unwind_backtrace) from [<c010cd4c>] (show_stack+0x10/0x14)
 (show_stack) from [<c049c6b8>] (dump_stack+0x8c/0xa0)
 (dump_stack) from [<c024fcf0>] (warn_alloc+0xc4/0x160)
 (warn_alloc) from [<c0250218>] (__alloc_pages_nodemask+0x3f4/0x10d0)
 (__alloc_pages_nodemask) from [<c0270450>] (kmalloc_order_trace+0x2c/0x120)
 (kmalloc_order_trace) from [<c03fa748>] (build_node_manager+0x35c/0x688)
 (build_node_manager) from [<c03de494>] (f2fs_fill_super+0xf0c/0x16cc)
 (f2fs_fill_super) from [<c02a5864>] (mount_bdev+0x15c/0x188)
 (mount_bdev) from [<c03da624>] (f2fs_mount+0x18/0x20)
 (f2fs_mount) from [<c02a68b8>] (mount_fs+0x158/0x19c)
 (mount_fs) from [<c02c3c9c>] (vfs_kern_mount+0x78/0x134)
 (vfs_kern_mount) from [<c02c76ac>] (do_mount+0x474/0xca4)
 (do_mount) from [<c02c8264>] (SyS_mount+0x94/0xbc)
 (SyS_mount) from [<c0108180>] (ret_fast_syscall+0x0/0x48)

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26 15:16:53 -08:00
Yunlong Song af56b48708 f2fs: remove redundant comment of unused wio_mutex
Commit 089842de ("f2fs: remove codes of unused wio_mutex") removes codes
of unused wio_mutex, but missing the comment, so delete it.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26 15:16:53 -08:00
Chao Yu 8d64d365ae f2fs: fix to reorder set_page_dirty and wait_on_page_writeback
This patch reorders flow from

- update page
- set_page_dirty
- wait_on_page_writeback

to

- wait_on_page_writeback
- update page
- set_page_dirty

The reason is:
- set_page_dirty will increase reference of dirty page, the reference
should be cleared before wait_on_page_writeback to keep its consistency.
- some devices need stable page during page writebacking, so we
should not change page's data.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-14 06:38:36 -08:00
Sheng Yong 2062e0c3da f2fs: clear PG_writeback if IPU failed
If IPU failed, nothing is commited, we should end page writeback.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-14 06:38:12 -08:00
Jaegeuk Kim 0cd6d9b0d2 f2fs: add an ioctl() to explicitly trigger fsck later
This adds an option in ioctl(F2FS_IOC_SHUTDOWN) in order to trigger fsck by
setting a NEED_FSCK flag.

Generally, shutdown is used for the test to validate filesystem consistency, and
setting NEED_FSCK flag can be used for Android to trigger fsck.f2fs at boot time
explicitly so that we could measure the elapsed time as well as force filesystem
check.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-14 06:38:02 -08:00
Jaegeuk Kim a742fd41c0 f2fs: avoid frequent costly fsck triggers
If we want to re-enable nat_bits, we rely on fsck which requires full scan
of directory tree. Let's do that by regular fsck or unclean shutdown.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-28 00:16:25 -08:00
Jia Zhu f4f0b6777d f2fs: fix m_may_create to make OPU DIO write correctly
Previously, we added a parameter @map.m_may_create to trigger OPU
allocation and call f2fs_balance_fs() correctly.

But in get_more_blocks(), @create has been overwritten by below code.
So the function f2fs_map_blocks() will not allocate new block address
but directly go out. Meanwile,there are several functions calling
f2fs_map_blocks() directly and @map.m_may_create not initialized.
CODE:
create = dio->op == REQ_OP_WRITE;
	if (dio->flags & DIO_SKIP_HOLES) {
		if (fs_startblk <= ((i_size_read(dio->inode) - 1) >>
						i_blkbits))
			create = 0;
	}

This patch fixes it.

Signed-off-by: Jia Zhu <zhujia13@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 19:46:21 -08:00
Jia Zhu 73c0a9272a f2fs: fix to update new block address correctly for OPU
Previously, we allocated a new block address for OPU mode in direct_IO.

But the new address couldn't be assigned to @map->m_pblk correctly.

This patch fix it.

Cc: <stable@vger.kernel.org>
Fixes: 511f52d02f05 ("f2fs: allow out-place-update for direct IO in LFS mode")
Signed-off-by: Jia Zhu <zhujia13@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 16:42:03 -08:00
Sahitya Tummala e3c59108da f2fs: adjust trace print in f2fs_get_victim() to cover all paths
Adjust the trace print in f2fs_get_victim() to cover GC done by
F2FS_IOC_GARBAGE_COLLECT_RANGE.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 16:38:49 -08:00
Sahitya Tummala 08ac9a3870 f2fs: fix to allow node segment for GC by ioctl path
Allow node type segments also to be GC'd via f2fs ioctl
F2FS_IOC_GARBAGE_COLLECT_RANGE.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 16:38:46 -08:00
Alexey Dobriyan 19880e6e5f f2fs: make "f2fs_fault_name[]" const char *
Those strings are immutable.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:54:37 -08:00
Pan Bian 0ea295dd85 f2fs: read page index before freeing
The function truncate_node frees the page with f2fs_put_page. However,
the page index is read after that. So, the patch reads the index before
freeing the page.

Fixes: bf39c00a9a ("f2fs: drop obsolete node page when it is truncated")
Cc: <stable@vger.kernel.org>
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:54:37 -08:00
Tiezhu Yang f6176473a0 f2fs: fix wrong return value of f2fs_acl_create
When call f2fs_acl_create_masq() failed, the caller f2fs_acl_create()
should return -EIO instead of -ENOMEM, this patch makes it consistent
with posix_acl_create() which has been fixed in commit beaf226b86
("posix_acl: don't ignore return value of posix_acl_create_masq()").

Fixes: 83dfe53c18 ("f2fs: fix reference leaks in f2fs_acl_create")
Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:54:37 -08:00
Jaegeuk Kim f5d5510e73 f2fs: avoid build warn of fall_through
After merging the f2fs tree, today's linux-next build
 (x86_64_allmodconfig) produced this warning:

 In file included from fs/f2fs/dir.c:11:
 fs/f2fs/f2fs.h: In function '__mark_inode_dirty_flag':
 fs/f2fs/f2fs.h:2388:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
    if (set)
       ^
 fs/f2fs/f2fs.h:2390:2: note: here
   case FI_DATA_EXIST:
   ^~~~

 Exposed by my use of -Wimplicit-fallthrough

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:53:57 -08:00
Sheng Yong 2866fb16d6 f2fs: fix race between write_checkpoint and write_begin
The following race could lead to inconsistent SIT bitmap:

Task A                          Task B
======                          ======
f2fs_write_checkpoint
  block_operations
    f2fs_lock_all
      down_write(node_change)
      down_write(node_write)
      ... sync ...
      up_write(node_change)
                                f2fs_file_write_iter
                                  set_inode_flag(FI_NO_PREALLOC)
                                  ......
                                  f2fs_write_begin(index=0, has inline data)
                                    prepare_write_begin
                                      __do_map_lock(AIO) => down_read(node_change)
                                      f2fs_convert_inline_page => update SIT
                                      __do_map_lock(AIO) => up_read(node_change)
  f2fs_flush_sit_entries <= inconsistent SIT
  finish write checkpoint
  sudden-power-off

If SPO occurs after checkpoint is finished, SIT bitmap will be set
incorrectly.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:53:57 -08:00
Jaegeuk Kim 4e240d1bab f2fs: check memory boundary by insane namelen
If namelen is corrupted to have very long value, fill_dentries can copy
wrong memory area.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:53:57 -08:00
Yunlong Song 1e771e83ce f2fs: only flush the single temp bio cache which owns the target page
Previously, when f2fs finds which temp bio cache owns the target page,
it will flush all the three temp bio caches, but we only need to flush
one single bio cache indeed, which can help to keep bio merged.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:53:57 -08:00
Chao Yu f9d6d05976 f2fs: fix out-place-update DIO write
In get_more_blocks(), we may override @create as below code:

	create = dio->op == REQ_OP_WRITE;
	if (dio->flags & DIO_SKIP_HOLES) {
		if (fs_startblk <= ((i_size_read(dio->inode) - 1) >>
						i_blkbits))
			create = 0;
	}

But in f2fs_map_blocks(), we only trigger f2fs_balance_fs() if @create
is 1, so in LFS mode, dio overwrite under LFS mode can easily run out
of free segments, result in below panic.

 Call Trace:
  allocate_segment_by_default+0xa8/0x270 [f2fs]
  f2fs_allocate_data_block+0x1ea/0x5c0 [f2fs]
  __allocate_data_block+0x306/0x480 [f2fs]
  f2fs_map_blocks+0x6f6/0x920 [f2fs]
  __get_data_block+0x4f/0xb0 [f2fs]
  get_data_block_dio_write+0x50/0x60 [f2fs]
  do_blockdev_direct_IO+0xcd5/0x21e0
  __blockdev_direct_IO+0x3a/0x3c
  f2fs_direct_IO+0x1ff/0x4a0 [f2fs]
  generic_file_direct_write+0xd9/0x160
  __generic_file_write_iter+0xbb/0x1e0
  f2fs_file_write_iter+0xaf/0x220 [f2fs]
  __vfs_write+0xd0/0x130
  vfs_write+0xb2/0x1b0
  SyS_pwrite64+0x69/0xa0
  ? vtime_user_exit+0x29/0x70
  do_syscall_64+0x6e/0x160
  entry_SYSCALL64_slow_path+0x25/0x25
 RIP: new_curseg+0x36f/0x380 [f2fs] RSP: ffffac570393f7a8

So this patch introduces a parameter map.m_may_create to indicate that
f2fs_map_blocks() is called from write or read path, which can give the
right hint to let f2fs_map_blocks() trigger OPU allocation and call
f2fs_balanc_fs() correctly.

BTW, it disables physical address preallocation for direct IO in
f2fs_preallocate_blocks, which is redundant to OPU allocation of
f2fs_map_blocks.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:53:56 -08:00
Chao Yu fef4129ec2 f2fs: fix to be aware discard/preflush/dio command in is_idle()
This patch adds missing in-flight discard/preflush/dio command count
check in is_idle().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:53:56 -08:00
Chao Yu 02b16d0a34 f2fs: add to account direct IO
This patch adds f2fs_dio_submit_bio() to hook submit_io/end_io functions
in direct IO path, in order to account DIO.

Later, we will add this count into is_idle() to let background GC/Discard
thread be aware of DIO.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:53:56 -08:00
Yunlei He b61ac5b720 f2fs: move dir data flush to write checkpoint process
This patch move dir data flush to write checkpoint process, by
doing this, it may reduce some time for dir fsync.

pre:
	-f2fs_do_sync_file enter
		-file_write_and_wait_range  <- flush & wait
		-write_checkpoint
			-do_checkpoint	    <- wait all
	-f2fs_do_sync_file exit

now:
	-f2fs_do_sync_file enter
		-write_checkpoint
			-block_operations   <- flush dir & no wait
			-do_checkpoint	    <- wait all
	-f2fs_do_sync_file exit

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:53:56 -08:00
Yangtao Li 155c62fe9c f2fs: Change to use DEFINE_SHOW_ATTRIBUTE macro
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:53:56 -08:00
Yunlong Song 67b0e42b76 f2fs: change segment to section in f2fs_ioc_gc_range
f2fs_ioc_gc_range skips blocks_per_seg each time, however, f2fs_gc moves
blocks of section each time, so fix it from segment to section.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:53:56 -08:00
Chao Yu 3465557556 f2fs: export migration_granularity sysfs entry
Add one sysfs entry to control migration granularity of GC in large
section f2fs, it can be tuned to mitigate heavy overhead of migrating
huge number of blocks in large section.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:53:56 -08:00
Chao Yu e3080b0120 f2fs: support subsectional garbage collection
Section is minimal garbage collection unit of f2fs, in zoned block
device, or ancient block mapping flash device, in order to improve
GC efficiency, we can align GC unit to lower device erase unit,
normally, it consists of multiple of segments.

Once background or foreground GC triggers, it brings a large number
of IOs, which will impact user IO, and also occupy cpu/memory resource
intensively.

So, to reduce impact of GC on large size section, this patch supports
subsectional GC, in one cycle of GC, it only migrate partial segment{s}
in victim section. Currently, by default, we use sbi->segs_per_sec as
migration granularity.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:53:55 -08:00
Chao Yu 2c70c5e387 f2fs: introduce __is_large_section() for cleanup
Introduce a wrapper __is_large_section() to clean up codes.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:53:55 -08:00
Chao Yu 7beb01f744 f2fs: clean up f2fs_sb_has_##feature_name
In F2FS_HAS_FEATURE(), we will use F2FS_SB(sb) to get sbi pointer to
access .raw_super field, to avoid unneeded pointer conversion, this
patch changes to F2FS_HAS_FEATURE() accept sbi parameter directly.

Just do cleanup, no logic change.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:53:55 -08:00
Yunlong Song 089842de57 f2fs: remove codes of unused wio_mutex
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:53:55 -08:00
Yunlong Song d6c66cd19e f2fs: fix count of seg_freed to make sec_freed correct
When sbi->segs_per_sec > 1, and if some segno has 0 valid blocks before
gc starts, do_garbage_collect will skip counting seg_freed++, and this
will cause seg_freed < sbi->segs_per_sec and finally skip sec_freed++.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:53:55 -08:00
Chao Yu a8075dc484 f2fs: fix to account preflush command for noflush_merge mode
Previously, we only account preflush command for flush_merge mode,
so for noflush_merge mode, we can not know in-flight preflush
command count, fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:53:55 -08:00
Yunlong Song 9bf1a3f739 f2fs: avoid GC causing encrypted file corrupted
The encrypted file may be corrupted by GC in following case:

Time 1: | segment 1 blkaddr = A |  GC -> | segment 2 blkaddr = B |
Encrypted block 1 is moved from blkaddr A of segment 1 to blkaddr B of
segment 2,

Time 2: | segment 1 blkaddr = B |  GC -> | segment 3 blkaddr = C |

Before page 1 is written back and if segment 2 become a victim, then
page 1 is moved from blkaddr B of segment 2 to blkaddr Cof segment 3,
during the GC process of Time 2, f2fs should wait for page 1 written back
before reading it, or move_data_block will read a garbage block from
blkaddr B since page is not written back to blkaddr B yet.

Commit 6aa58d8a ("f2fs: readahead encrypted block during GC") introduce
ra_data_block to read encrypted block, but it forgets to add
f2fs_wait_on_page_writeback to avoid racing between GC and flush.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26 15:53:54 -08:00
Linus Torvalds dad4f140ed Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax
Pull XArray conversion from Matthew Wilcox:
 "The XArray provides an improved interface to the radix tree data
  structure, providing locking as part of the API, specifying GFP flags
  at allocation time, eliminating preloading, less re-walking the tree,
  more efficient iterations and not exposing RCU-protected pointers to
  its users.

  This patch set

   1. Introduces the XArray implementation

   2. Converts the pagecache to use it

   3. Converts memremap to use it

  The page cache is the most complex and important user of the radix
  tree, so converting it was most important. Converting the memremap
  code removes the only other user of the multiorder code, which allows
  us to remove the radix tree code that supported it.

  I have 40+ followup patches to convert many other users of the radix
  tree over to the XArray, but I'd like to get this part in first. The
  other conversions haven't been in linux-next and aren't suitable for
  applying yet, but you can see them in the xarray-conv branch if you're
  interested"

* 'xarray' of git://git.infradead.org/users/willy/linux-dax: (90 commits)
  radix tree: Remove multiorder support
  radix tree test: Convert multiorder tests to XArray
  radix tree tests: Convert item_delete_rcu to XArray
  radix tree tests: Convert item_kill_tree to XArray
  radix tree tests: Move item_insert_order
  radix tree test suite: Remove multiorder benchmarking
  radix tree test suite: Remove __item_insert
  memremap: Convert to XArray
  xarray: Add range store functionality
  xarray: Move multiorder_check to in-kernel tests
  xarray: Move multiorder_shrink to kernel tests
  xarray: Move multiorder account test in-kernel
  radix tree test suite: Convert iteration test to XArray
  radix tree test suite: Convert tag_tagged_items to XArray
  radix tree: Remove radix_tree_clear_tags
  radix tree: Remove radix_tree_maybe_preload_order
  radix tree: Remove split/join code
  radix tree: Remove radix_tree_update_node_t
  page cache: Finish XArray conversion
  dax: Convert page fault handlers to XArray
  ...
2018-10-28 11:35:40 -07:00
Chao Yu 7813081969 f2fs: fix to keep project quota consistent
This patch does below changes to keep consistence of project quota data
in sudden power-cut case:
- update inode.i_projid and project quota atomically under lock_op() in
f2fs_ioc_setproject()
- recover inode.i_projid and project quota in recover_inode()

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:48 -07:00
Chao Yu af033b2aa8 f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.

The implementation is as below:

1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
 a) flush dquot metadata into quota file.
 b) flush quota file to storage to keep file usage be consistent.

2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
 a) checkpoint will skip syncing dquot metadata.
 b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
    hint for fsck repairing.

3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().

Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:47 -07:00
Sheng Yong 26b5a07919 f2fs: cleanup dirty pages if recover failed
During recover, we will try to create new dentries for inodes with
dentry_mark. But if the parent is missing (e.g. killed by fsck),
recover will break. But those recovered dirty pages are not cleanup.
This will hit f2fs_bug_on:

[   53.519566] F2FS-fs (loop0): Found nat_bits in checkpoint
[   53.539354] F2FS-fs (loop0): recover_inode: ino = 5, name = file, inline = 3
[   53.539402] F2FS-fs (loop0): recover_dentry: ino = 5, name = file, dir = 0, err = -2
[   53.545760] F2FS-fs (loop0): Cannot recover all fsync data errno=-2
[   53.546105] F2FS-fs (loop0): access invalid blkaddr:4294967295
[   53.546171] WARNING: CPU: 1 PID: 1798 at fs/f2fs/checkpoint.c:163 f2fs_is_valid_blkaddr+0x26c/0x320
[   53.546174] Modules linked in:
[   53.546183] CPU: 1 PID: 1798 Comm: mount Not tainted 4.19.0-rc2+ #1
[   53.546186] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   53.546191] RIP: 0010:f2fs_is_valid_blkaddr+0x26c/0x320
[   53.546195] Code: 85 bb 00 00 00 48 89 df 88 44 24 07 e8 ad a8 db ff 48 8b 3b 44 89 e1 48 c7 c2 40 03 72 a9 48 c7 c6 e0 01 72 a9 e8 84 3c ff ff <0f> 0b 0f b6 44 24 07 e9 8a 00 00 00 48 8d bf 38 01 00 00 e8 7c a8
[   53.546201] RSP: 0018:ffff88006c067768 EFLAGS: 00010282
[   53.546208] RAX: 0000000000000000 RBX: ffff880068844200 RCX: ffffffffa83e1a33
[   53.546211] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88006d51e590
[   53.546215] RBP: 0000000000000005 R08: ffffed000daa3cb3 R09: ffffed000daa3cb3
[   53.546218] R10: 0000000000000001 R11: ffffed000daa3cb2 R12: 00000000ffffffff
[   53.546221] R13: ffff88006a1f8000 R14: 0000000000000200 R15: 0000000000000009
[   53.546226] FS:  00007fb2f3646840(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000
[   53.546229] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   53.546234] CR2: 00007f0fd77f0008 CR3: 00000000687e6002 CR4: 00000000000206e0
[   53.546237] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   53.546240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   53.546242] Call Trace:
[   53.546248]  f2fs_submit_page_bio+0x95/0x740
[   53.546253]  read_node_page+0x161/0x1e0
[   53.546271]  ? truncate_node+0x650/0x650
[   53.546283]  ? add_to_page_cache_lru+0x12c/0x170
[   53.546288]  ? pagecache_get_page+0x262/0x2d0
[   53.546292]  __get_node_page+0x200/0x660
[   53.546302]  f2fs_update_inode_page+0x4a/0x160
[   53.546306]  f2fs_write_inode+0x86/0xb0
[   53.546317]  __writeback_single_inode+0x49c/0x620
[   53.546322]  writeback_single_inode+0xe4/0x1e0
[   53.546326]  sync_inode_metadata+0x93/0xd0
[   53.546330]  ? sync_inode+0x10/0x10
[   53.546342]  ? do_raw_spin_unlock+0xed/0x100
[   53.546347]  f2fs_sync_inode_meta+0xe0/0x130
[   53.546351]  f2fs_fill_super+0x287d/0x2d10
[   53.546367]  ? vsnprintf+0x742/0x7a0
[   53.546372]  ? f2fs_commit_super+0x180/0x180
[   53.546379]  ? up_write+0x20/0x40
[   53.546385]  ? set_blocksize+0x5f/0x140
[   53.546391]  ? f2fs_commit_super+0x180/0x180
[   53.546402]  mount_bdev+0x181/0x200
[   53.546406]  mount_fs+0x94/0x180
[   53.546411]  vfs_kern_mount+0x6c/0x1e0
[   53.546415]  do_mount+0xe5e/0x1510
[   53.546420]  ? fs_reclaim_release+0x9/0x30
[   53.546424]  ? copy_mount_string+0x20/0x20
[   53.546428]  ? fs_reclaim_acquire+0xd/0x30
[   53.546435]  ? __might_sleep+0x2c/0xc0
[   53.546440]  ? ___might_sleep+0x53/0x170
[   53.546453]  ? __might_fault+0x4c/0x60
[   53.546468]  ? _copy_from_user+0x95/0xa0
[   53.546474]  ? memdup_user+0x39/0x60
[   53.546478]  ksys_mount+0x88/0xb0
[   53.546482]  __x64_sys_mount+0x5d/0x70
[   53.546495]  do_syscall_64+0x65/0x130
[   53.546503]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   53.547639] ---[ end trace b804d1ea2fec893e ]---

So if recover fails, we need to drop all recovered data.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:47 -07:00
Sahitya Tummala 1e78e8bd9d f2fs: fix data corruption issue with hardware encryption
Direct IO can be used in case of hardware encryption. The following
scenario results into data corruption issue in this path -

Thread A -                          Thread B-
-> write file#1 in direct IO
                                    -> GC gets kicked in
                                    -> GC submitted bio on meta mapping
				       for file#1, but pending completion
-> write file#1 again with new data
   in direct IO
                                    -> GC bio gets completed now
                                    -> GC writes old data to the new
                                       location and thus file#1 is
				       corrupted.

Fix this by submitting and waiting for pending io on meta mapping
for direct IO case in f2fs_map_blocks().

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:47 -07:00
Chao Yu 0c093b590e f2fs: fix to recover inode->i_flags of inode block during POR
Testcase to reproduce this bug:
1. mkfs.f2fs /dev/sdd
2. mount -t f2fs /dev/sdd /mnt/f2fs
3. touch /mnt/f2fs/file
4. sync
5. chattr +a /mnt/f2fs/file
6. xfs_io -a /mnt/f2fs/file -c "fsync"
7. godown /mnt/f2fs
8. umount /mnt/f2fs
9. mount -t f2fs /dev/sdd /mnt/f2fs
10. xfs_io /mnt/f2fs/file

There is no error when opening this file w/o O_APPEND, but actually,
we expect the correct result should be:

/mnt/f2fs/file: Operation not permitted

The root cause is, in recover_inode(), we recover inode->i_flags more
than F2FS_I(inode)->i_flags, so fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:47 -07:00
Chao Yu 9149a5eb60 f2fs: spread f2fs_set_inode_flags()
This patch changes codes as below:
- use f2fs_set_inode_flags() to update i_flags atomically to avoid
potential race.
- synchronize F2FS_I(inode)->i_flags to inode->i_flags in
f2fs_new_inode().
- use f2fs_set_inode_flags() to simply codes in f2fs_quota_{on,off}.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:47 -07:00
Chao Yu 2baf078185 f2fs: fix to spread clear_cold_data()
We need to drop PG_checked flag on page as well when we clear PG_uptodate
flag, in order to avoid treating the page as GCing one later.

Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:46 -07:00
Jaegeuk Kim 164a63fa6b Revert "f2fs: fix to clear PG_checked flag in set_page_dirty()"
This reverts commit 66110abc4c.

If we clear the cold data flag out of the writeback flow, we can miscount
-1 by end_io, which incurs a deadlock caused by all I/Os being blocked during
heavy GC.

Balancing F2FS Async:
 - IO (CP:    1, Data:   -1, Flush: (   0    0    1), Discard: (   ...

GC thread:                              IRQ
- move_data_page()
 - set_page_dirty()
  - clear_cold_data()
                                        - f2fs_write_end_io()
                                         - type = WB_DATA_TYPE(page);
                                           here, we get wrong type
                                         - dec_page_count(sbi, type);
 - f2fs_wait_on_page_writeback()

Cc: <stable@vger.kernel.org>
Reported-and-Tested-by: Park Ju Hyung <qkrwngud825@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:46 -07:00
Jaegeuk Kim 5f9abab42b f2fs: account read IOs and use IO counts for is_idle
This patch adds issued read IO counts which is under block layer.

Chao modified a bit, since:

Below race can cause reversed reference on F2FS_RD_DATA, there is
the same issue in f2fs_submit_page_bio(), fix them by relocate
__submit_bio() and inc_page_count.

Thread A			Thread B
- f2fs_write_begin
 - f2fs_submit_page_read
 - __submit_bio
				- f2fs_read_end_io
				 - __read_end_io
				 - dec_page_count(, F2FS_RD_DATA)
 - inc_page_count(, F2FS_RD_DATA)

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:46 -07:00
Chao Yu 78efac537d f2fs: fix to account IO correctly for cgroup writeback
Now, we have supported cgroup writeback, it depends on correctly IO
account of specified filesystem.

But in commit d1b3e72d54 ("f2fs: submit bio of in-place-update pages"),
we split write paths from f2fs_submit_page_mbio() to two:
- f2fs_submit_page_bio() for IPU path
- f2fs_submit_page_bio() for OPU path

But still we account write IO only in f2fs_submit_page_mbio(), result in
incorrect IO account, fix it by adding missing IO account in IPU path.

Fixes: d1b3e72d54 ("f2fs: submit bio of in-place-update pages")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:39 -07:00
Chao Yu 4c58ed0768 f2fs: fix to account IO correctly
Below race can cause reversed reference on dirty count, fix it by
relocating __submit_bio() and inc_page_count().

Thread A				Thread B
- f2fs_inplace_write_data
 - f2fs_submit_page_bio
  - __submit_bio
					- f2fs_write_end_io
					 - dec_page_count
  - inc_page_count

Cc: <stable@vger.kernel.org>
Fixes: d1b3e72d54 ("f2fs: submit bio of in-place-update pages")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:53:48 -07:00
Matthew Wilcox 5ec2d99de7 f2fs: Convert to XArray
This is a straightforward conversion.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21 10:46:42 -04:00
Matthew Wilcox 10bbd23585 pagevec: Use xa_mark_t
Removes sparse warnings.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21 10:46:39 -04:00
Jens Axboe b93f654d73 f2fs: remove request_list check in is_idle()
This doesn't work on stacked devices, and it doesn't work on
blk-mq devices. The request_list is only used on legacy, which
we don't have much of anymore, and soon won't have any of.

Kill the check.

Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 19:09:39 -07:00
Jaegeuk Kim 730746ce88 f2fs: allow to mount, if quota is failed
Since we can use the filesystem without quotas till next boot.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:37:00 -07:00
Sahitya Tummala 6390398ec7 f2fs: update REQ_TIME in f2fs_cross_rename()
Update REQ_TIME in the missing path - f2fs_cross_rename().

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
[Jaegeuk Kim: add it in f2fs_rename()]
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:37:00 -07:00
Sahitya Tummala c75f2feb80 f2fs: do not update REQ_TIME in case of error conditions
The REQ_TIME should be updated only in case of success cases
as followed at all other places in the file system.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:37:00 -07:00
Chao Yu 3b30eb19dc f2fs: remove unneeded disable_nat_bits()
Commit 7735730d39 ("f2fs: fix to propagate error from __get_meta_page()")
added disable_nat_bits() in error path of __get_nat_bitmaps(), but it's
unneeded, beause we will fail mount, we won't have chance to change nid
usage status w/o nat full/empty bitmaps.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:36:59 -07:00
Chao Yu 850971b23f f2fs: remove unused sbi->trigger_ssr_threshold
Commit a2a12b679f ("f2fs: export SSR allocation threshold") introduced
two threshold .min_ssr_sections and .trigger_ssr_threshold, but only
.min_ssr_sections is used, so just remove redundant one for cleanup.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:36:59 -07:00
Chao Yu ed15ba1415 f2fs: shrink sbi->sb_lock coverage in set_file_temperature()
file_set_{cold,hot} doesn't need holding sbi->sb_lock, so moving them
out of the lock.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:36:59 -07:00
Chao Yu 4dada3fd70 f2fs: use rb_*_cached friends
As rbtree supports caching leftmost node natively, update f2fs codes
to use rb_*_cached helpers to speed up leftmost node visiting.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:36:59 -07:00
Chao Yu ef2a007134 f2fs: fix to recover cold bit of inode block during POR
Testcase to reproduce this bug:
1. mkfs.f2fs /dev/sdd
2. mount -t f2fs /dev/sdd /mnt/f2fs
3. touch /mnt/f2fs/file
4. sync
5. chattr +A /mnt/f2fs/file
6. xfs_io -f /mnt/f2fs/file -c "fsync"
7. godown /mnt/f2fs
8. umount /mnt/f2fs
9. mount -t f2fs /dev/sdd /mnt/f2fs
10. chattr -A /mnt/f2fs/file
11. xfs_io -f /mnt/f2fs/file -c "fsync"
12. umount /mnt/f2fs
13. mount -t f2fs /dev/sdd /mnt/f2fs
14. lsattr /mnt/f2fs/file

-----------------N- /mnt/f2fs/file

But actually, we expect the corrct result is:

-------A---------N- /mnt/f2fs/file

The reason is in step 9) we missed to recover cold bit flag in inode
block, so later, in fsync, we will skip write inode block due to below
condition check, result in lossing data in another SPOR.

f2fs_fsync_node_pages()
	if (!IS_DNODE(page) || !is_cold_node(page))
		continue;

Note that, I guess that some non-dir inode has already lost cold bit
during POR, so in order to reenable recovery for those inode, let's
try to recover cold bit in f2fs_iget() to save more fsynced data.

Fixes: c56675750d ("f2fs: remove unneeded set_cold_node()")
Cc: <stable@vger.kernel.org> 4.17+
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:36:59 -07:00
Chao Yu 48018b4cfd f2fs: submit cached bio to avoid endless PageWriteback
When migrating encrypted block from background GC thread, we only add
them into f2fs inner bio cache, but forget to submit the cached bio, it
may cause potential deadlock when we are waiting page writebacked, fix
it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:36:59 -07:00
Daniel Rosenberg 4354994f09 f2fs: checkpoint disabling
Note that, it requires "f2fs: return correct errno in f2fs_gc".

This adds a lightweight non-persistent snapshotting scheme to f2fs.

To use, mount with the option checkpoint=disable, and to return to
normal operation, remount with checkpoint=enable. If the filesystem
is shut down before remounting with checkpoint=enable, it will revert
back to its apparent state when it was first mounted with
checkpoint=disable. This is useful for situations where you wish to be
able to roll back the state of the disk in case of some critical
failure.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
[Jaegeuk Kim: use SB_RDONLY instead of MS_RDONLY]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:36:39 -07:00
Jaegeuk Kim fb7d70db30 f2fs: clear PageError on the read path
When running fault injection test, I hit somewhat wrong behavior in f2fs_gc ->
gc_data_segment():

0. fault injection generated some PageError'ed pages

1. gc_data_segment
 -> f2fs_get_read_data_page(REQ_RAHEAD)

2. move_data_page
 -> f2fs_get_lock_data_page()
  -> f2f_get_read_data_page()
   -> f2fs_submit_page_read()
    -> submit_bio(READ)
  -> return EIO due to PageError
  -> fail to move data

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-02 17:33:10 -07:00
Chao Yu f847c699cf f2fs: allow out-place-update for direct IO in LFS mode
Normally, DIO uses in-pllace-update, but in LFS mode, f2fs doesn't
allow triggering any in-place-update writes, so we fallback direct
write to buffered write, result in bad performance of large size
write.

This patch adds to support triggering out-place-update for direct IO
to enhance its performance.

Note that it needs to exclude direct read IO during direct write,
since new data writing to new block address will no be valid until
write finished.

storage: zram

time xfs_io -f -d /mnt/f2fs/file -c "pwrite 0 1073741824" -c "fsync"

Before:
real	0m13.061s
user	0m0.327s
sys	0m12.486s

After:
real	0m6.448s
user	0m0.228s
sys	0m6.212s

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-30 18:42:50 -07:00
Chao Yu 39a8695824 f2fs: refactor ->page_mkwrite() flow
Thread A				Thread B
- f2fs_vm_page_mkwrite
					- f2fs_setattr
					 - down_write(i_mmap_sem)
					 - truncate_setsize
					 - f2fs_truncate
					 - up_write(i_mmap_sem)
 - f2fs_reserve_block
 reserve NEW_ADDR
 - skip dirty page due to truncation

1. we don't need to rserve new block address for a truncated page.
2. dn.data_blkaddr is used out of node page lock coverage.

Refactor ->page_mkwrite() flow to fix above issues:
- use __do_map_lock() to avoid racing checkpoint()
- lock data page in prior to dnode page
- cover f2fs_reserve_block with i_mmap_sem lock
- wait page writeback before zeroing page

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-30 18:41:22 -07:00
Chao Yu bab475c541 Revert: "f2fs: check last page index in cached bio to decide submission"
There is one case that we can leave bio in f2fs, result in hanging
page writeback waiter.

Thread A				Thread B
- f2fs_write_cache_pages
 - f2fs_submit_page_write
 page #0 cached in bio #0 of cold log
 - f2fs_submit_page_write
 page #1 cached in bio #1 of warm log
					- f2fs_write_cache_pages
					 - f2fs_submit_page_write
					 bio is full, submit bio #1 contain page #1
 - f2fs_submit_merged_write_cond(, page #1)
 fail to submit bio #0 due to page #1 is not in any cached bios.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-30 18:39:54 -07:00
Junling Zheng d440c52d31 f2fs: support superblock checksum
Now we support crc32 checksum for superblock.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Junling Zheng <zhengjunling@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-30 18:34:18 -07:00
Chao Yu 274bd9ba39 f2fs: add to account skip count of background GC
This patch adds to account skip count of background GC, and show stat
info via 'status' debugfs entry.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-30 18:33:34 -07:00
Chao Yu b63e7be590 f2fs: add to account meta IO
This patch supports to account meta IO, it enables to show write IO
from f2fs more comprehensively via 'status' debugfs entry.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-30 18:33:21 -07:00
Jaegeuk Kim 095680f24f f2fs: keep lazytime on remount
This patch fixes losing lazytime when remounting f2fs.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-30 16:50:46 -07:00
Jaegeuk Kim 89d13c3850 f2fs: fix missing up_read
This patch fixes missing up_read call.

Fixes: c9b60788fc ("f2fs: fix to do sanity check with block address in main area")
Cc: <stable@vger.kernel.org> # 4.19+
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-28 19:11:38 -07:00
Jaegeuk Kim 61f7725aa1 f2fs: return correct errno in f2fs_gc
This fixes overriding error number in f2fs_gc.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-28 10:40:46 -07:00
Jaegeuk Kim edc55aaf0d f2fs: avoid f2fs_bug_on if f2fs_get_meta_page_nofail got EIO
This patch avoids BUG_ON when f2fs_get_meta_page_nofail got EIO during
xfstests/generic/475.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-28 10:39:58 -07:00
Chao Yu 4a1728cad6 f2fs: mark inode dirty explicitly in recover_inode()
Mark inode dirty explicitly in the end of recover_inode() to make sure
that all recoverable fields can be persisted later.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-26 12:45:33 -07:00
Chao Yu 5cd1f387a1 f2fs: fix to recover inode's crtime during POR
Testcase to reproduce this bug:
1. mkfs.f2fs -O extra_attr -O inode_crtime /dev/sdd
2. mount -t f2fs /dev/sdd /mnt/f2fs
3. touch /mnt/f2fs/file
4. xfs_io -f /mnt/f2fs/file -c "fsync"
5. godown /mnt/f2fs
6. umount /mnt/f2fs
7. mount -t f2fs /dev/sdd /mnt/f2fs
8. xfs_io -f /mnt/f2fs/file -c "statx -r"

stat.btime.tv_sec = 0
stat.btime.tv_nsec = 0

This patch fixes to recover inode creation time fields during
mount.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-26 12:45:33 -07:00
Chao Yu 7de36cf3e4 f2fs: fix to recover inode's i_gc_failures during POR
inode.i_gc_failures is used to indicate that skip count of migrating
on blocks of inode, we should guarantee it can be recovered in sudden
power-off case.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-26 12:45:33 -07:00
Chao Yu 19c73a691c f2fs: fix to recover inode's i_flags during POR
Testcase to reproduce this bug:
1. mkfs.f2fs /dev/sdd
2. mount -t f2fs /dev/sdd /mnt/f2fs
3. touch /mnt/f2fs/file
4. sync
5. chattr +A /mnt/f2fs/file
6. xfs_io -f /mnt/f2fs/file -c "fsync"
7. godown /mnt/f2fs
8. umount /mnt/f2fs
9. mount -t f2fs /dev/sdd /mnt/f2fs
10. lsattr /mnt/f2fs/file

-----------------N- /mnt/f2fs/file

But actually, we expect the corrct result is:

-------A---------N- /mnt/f2fs/file

The reason is we didn't recover inode.i_flags field during mount,
fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-26 12:45:33 -07:00
Chao Yu f4474aa6e5 f2fs: fix to recover inode's project id during POR
Testcase to reproduce this bug:
1. mkfs.f2fs -O extra_attr -O project_quota /dev/sdd
2. mount -t f2fs /dev/sdd /mnt/f2fs
3. touch /mnt/f2fs/file
4. sync
5. chattr -p 1 /mnt/f2fs/file
6. xfs_io -f /mnt/f2fs/file -c "fsync"
7. godown /mnt/f2fs
8. umount /mnt/f2fs
9. mount -t f2fs /dev/sdd /mnt/f2fs
10. lsattr -p /mnt/f2fs/file

    0 -----------------N- /mnt/f2fs/file

But actually, we expect the correct result is:

    1 -----------------N- /mnt/f2fs/file

The reason is we didn't recover inode.i_projid field during mount,
fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-26 12:45:33 -07:00
Jaegeuk Kim 0a4daae5ff f2fs: update i_size after DIO completion
This is related to
ee70daaba8 ("xfs: update i_size after unwritten conversion in dio completion")

If we update i_size during dio_write, dio_read can read out stale data, which
breaks xfstests/465.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-26 12:45:34 -07:00
Jaegeuk Kim d83d0f5ba8 f2fs: report ENOENT correctly in f2fs_rename
This fixes wrong error report in f2fs_rename.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-26 12:38:11 -07:00
Chengguang Xu c6b1867b1d f2fs: fix remount problem of option io_bits
Currently we show mount option "io_bits=%u" as "io_size=%uKB",
it will cause option parsing problem(unrecognized mount option)
in remount.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-25 18:44:56 -07:00
Chao Yu dc4cd1257c f2fs: fix to recover inode's uid/gid during POR
Step to reproduce this bug:
1. logon as root
2. mount -t f2fs /dev/sdd /mnt;
3. touch /mnt/file;
4. chown system /mnt/file; chgrp system /mnt/file;
5. xfs_io -f /mnt/file -c "fsync";
6. godown /mnt;
7. umount /mnt;
8. mount -t f2fs /dev/sdd /mnt;

After step 8) we will expect file's uid/gid are all system, but during
recovery, these two fields were not been recovered, fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 14:50:25 -07:00
Jaegeuk Kim f84262b086 f2fs: avoid infinite loop in f2fs_alloc_nid
If we have an error in f2fs_build_free_nids, we're able to fall into a loop
to find free nids.

Suggested-by: Chao Yu <chao@kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-20 14:49:19 -07:00
Sahitya Tummala a7d10cf3e4 f2fs: add new idle interval timing for discard and gc paths
This helps to control the frequency of submission of discard and
GC requests independently, based on the need.

Suggested-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-19 15:55:14 -07:00
Chao Yu 6f5c2ed0a2 f2fs: split IO error injection according to RW
This patch adds to support injecting error for write IO, this can simulate
IO error like fail_make_request or dm_flakey does.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-12 13:07:32 -07:00
Chao Yu 7c1a000d46 f2fs: add SPDX license identifiers
Remove the verbose license text from f2fs files and replace them with
SPDX tags.  This does not change the license of any of the code.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-12 13:07:10 -07:00
Chengguang Xu 4cb037ec3f f2fs: surround fault_injection related option parsing using CONFIG_F2FS_FAULT_INJECTION
It's a little bit strange when fault_injection related
options fail with -EINVAL which were already disabled
from config, so surround all fault_injection related option
parsing code using CONFIG_F2FS_FAULT_INJECTION. Meanwhile,
slightly change warning message to keep consistency with
option POSIX_ACL and FS_XATTR.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-12 13:05:33 -07:00
Wang Shilong c8e927579e f2fs: fix setattr project check upon fssetxattr ioctl
Currently, project quota could be changed by fssetxattr
ioctl, and existed permission check inode_owner_or_capable()
is obviously not enough, just think that common users could
change project id of file, that could make users to
break project quota easily.

This patch try to follow same regular of xfs project
quota:

"Project Quota ID state is only allowed to change from
within the init namespace. Enforce that restriction only
if we are trying to change the quota ID state.
Everything else is allowed in user namespaces."

Besides that, check and set project id'state should
be an atomic operation, protect whole operation with
inode lock.

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-11 17:09:32 -07:00
Zhikang Zhang b430f72636 f2fs: avoid sleeping under spin_lock
In the call trace below, we might sleep in function dput().

So in order to avoid sleeping under spin_lock, we remove f2fs_mark_inode_dirty_sync
from __try_update_largest_extent && __drop_largest_extent.

BUG: sleeping function called from invalid context at fs/dcache.c:796
Call trace:
	dump_backtrace+0x0/0x3f4
	show_stack+0x24/0x30
	dump_stack+0xe0/0x138
	___might_sleep+0x2a8/0x2c8
	__might_sleep+0x78/0x10c
	dput+0x7c/0x750
	block_dump___mark_inode_dirty+0x120/0x17c
	__mark_inode_dirty+0x344/0x11f0
	f2fs_mark_inode_dirty_sync+0x40/0x50
	__insert_extent_tree+0x2e0/0x2f4
	f2fs_update_extent_tree_range+0xcf4/0xde8
	f2fs_update_extent_cache+0x114/0x12c
	f2fs_update_data_blkaddr+0x40/0x50
	write_data_page+0x150/0x314
	do_write_data_page+0x648/0x2318
	__write_data_page+0xdb4/0x1640
	f2fs_write_cache_pages+0x768/0xafc
	__f2fs_write_data_pages+0x590/0x1218
	f2fs_write_data_pages+0x64/0x74
	do_writepages+0x74/0xe4
	__writeback_single_inode+0xdc/0x15f0
	writeback_sb_inodes+0x574/0xc98
	__writeback_inodes_wb+0x190/0x204
	wb_writeback+0x730/0xf14
	wb_check_old_data_flush+0x1bc/0x1c8
	wb_workfn+0x554/0xf74
	process_one_work+0x440/0x118c
	worker_thread+0xac/0x974
	kthread+0x1a0/0x1c8
	ret_from_fork+0x10/0x1c

Signed-off-by: Zhikang Zhang <zhangzhikang1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-11 17:09:32 -07:00
Chao Yu e1293bdfa0 f2fs: plug readahead IO in readdir()
Add a plug to merge readahead IO in readdir(), expecting it can
reduce bio count before submitting to block layer.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-11 17:09:32 -07:00
Chao Yu 042be0f849 f2fs: fix to do sanity check with current segment number
https://bugzilla.kernel.org/show_bug.cgi?id=200219

Reproduction way:
- mount image
- run poc code
- umount image

F2FS-fs (loop1): Bitmap was wrongly set, blk:15364
------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/segment.c:2061!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 2 PID: 17686 Comm: umount Tainted: G        W  O      4.18.0-rc2+ #39
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
EIP: update_sit_entry+0x459/0x4e0 [f2fs]
Code: e8 1c b5 fd ff 0f 0b 0f 0b 8b 45 e4 c7 44 24 08 9c 7a 6c f8 c7 44 24 04 bc 4a 6c f8 89 44 24 0c 8b 06 89 04 24 e8 f7 b4 fd ff <0f> 0b 8b 45 e4 0f b6 d2 89 54 24 10 c7 44 24 08 60 7a 6c f8 c7 44
EAX: 00000032 EBX: 000000f8 ECX: 00000002 EDX: 00000001
ESI: d7177000 EDI: f520fe68 EBP: d6477c6c ESP: d6477c34
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010282
CR0: 80050033 CR2: b7fbe000 CR3: 2a99b3c0 CR4: 000406f0
Call Trace:
 f2fs_allocate_data_block+0x124/0x580 [f2fs]
 do_write_page+0x78/0x150 [f2fs]
 f2fs_do_write_node_page+0x25/0xa0 [f2fs]
 __write_node_page+0x2bf/0x550 [f2fs]
 f2fs_sync_node_pages+0x60e/0x6d0 [f2fs]
 ? sync_inode_metadata+0x2f/0x40
 ? f2fs_write_checkpoint+0x28f/0x7d0 [f2fs]
 ? up_write+0x1e/0x80
 f2fs_write_checkpoint+0x2a9/0x7d0 [f2fs]
 ? mark_held_locks+0x5d/0x80
 ? _raw_spin_unlock_irq+0x27/0x50
 kill_f2fs_super+0x68/0x90 [f2fs]
 deactivate_locked_super+0x3d/0x70
 deactivate_super+0x40/0x60
 cleanup_mnt+0x39/0x70
 __cleanup_mnt+0x10/0x20
 task_work_run+0x81/0xa0
 exit_to_usermode_loop+0x59/0xa7
 do_fast_syscall_32+0x1f5/0x22c
 entry_SYSENTER_32+0x53/0x86
EIP: 0xb7f95c51
Code: c1 1e f7 ff ff 89 e5 8b 55 08 85 d2 8b 81 64 cd ff ff 74 02 89 02 5d c3 8b 0c 24 c3 8b 1c 24 c3 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
EAX: 00000000 EBX: 0871ab90 ECX: bfb2cd00 EDX: 00000000
ESI: 00000000 EDI: 0871ab90 EBP: 0871ab90 ESP: bfb2cd7c
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000246
Modules linked in: f2fs(O) crc32_generic bnep rfcomm bluetooth ecdh_generic snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq pcbc joydev aesni_intel snd_seq_device aes_i586 snd_timer crypto_simd snd cryptd soundcore mac_hid serio_raw video i2c_piix4 parport_pc ppdev lp parport hid_generic psmouse usbhid hid e1000 [last unloaded: f2fs]
---[ end trace d423f83982cfcdc5 ]---

The reason is, different log headers using the same segment, once
one log's next block address is used by another log, it will cause
panic as above.

Main area: 24 segs, 24 secs 24 zones
  - COLD  data: 0, 0, 0
  - WARM  data: 1, 1, 1
  - HOT   data: 20, 20, 20
  - Dir   dnode: 22, 22, 22
  - File   dnode: 22, 22, 22
  - Indir nodes: 21, 21, 21

So this patch adds sanity check to detect such condition to avoid
this issue.

Signed-off-by: Chao Yu <yuchao0@huawei.com>

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-11 17:09:32 -07:00
Chao Yu 4a70e25544 f2fs: fix memory leak of percpu counter in fill_super()
In fill_super -> init_percpu_info, we should destroy percpu counter
in error path, otherwise memory allcoated for percpu counter will
leak.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-11 17:09:32 -07:00
Chao Yu 0b2103e886 f2fs: fix memory leak of write_io in fill_super()
It needs to release memory allocated for sbi->write_io in error path,
otherwise, it will cause memory leak.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-11 17:09:32 -07:00
Chengguang Xu 313ed62a3d f2fs: cache NULL when both default_acl and acl are NULL
default_acl and acl of newly created inode will be initiated
as ACL_NOT_CACHED in vfs function inode_init_always() and later
will be updated by calling xxx_init_acl() in specific filesystems.
Howerver, when default_acl and acl are NULL then they keep the value
of ACL_NOT_CACHED, this patch tries to cache NULL for acl/default_acl
in this case.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-11 13:16:03 -07:00
Chao Yu 1378752b99 f2fs: fix to flush all dirty inodes recovered in readonly fs
generic/417 reported as blow:

------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/inode.c:695!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 21697 Comm: umount Tainted: G        W  O      4.18.0-rc2+ #39
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
Call Trace:
 ? _raw_spin_unlock+0x2c/0x50
 evict+0xa8/0x170
 dispose_list+0x34/0x40
 evict_inodes+0x118/0x120
 generic_shutdown_super+0x41/0x100
 ? rcu_read_lock_sched_held+0x97/0xa0
 kill_block_super+0x22/0x50
 kill_f2fs_super+0x6f/0x80 [f2fs]
 deactivate_locked_super+0x3d/0x70
 deactivate_super+0x40/0x60
 cleanup_mnt+0x39/0x70
 __cleanup_mnt+0x10/0x20
 task_work_run+0x81/0xa0
 exit_to_usermode_loop+0x59/0xa7
 do_fast_syscall_32+0x1f5/0x22c
 entry_SYSENTER_32+0x53/0x86
EIP: f2fs_evict_inode+0x556/0x580 [f2fs]

It can simply reproduced with scripts:

Enable quota feature during mkfs.

Testcase1:
1. mkfs.f2fs /dev/zram0
2. mount -t f2fs /dev/zram0 /mnt/f2fs
3. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4k" -c "fsync"
4. godown /mnt/f2fs
5. umount /mnt/f2fs
6. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
7. umount /mnt/f2fs

Testcase2:
1. mkfs.f2fs /dev/zram0
2. mount -t f2fs /dev/zram0 /mnt/f2fs
3. touch /mnt/f2fs/file
4. create process[pid = x] do:
	a) open /mnt/f2fs/file;
	b) unlink /mnt/f2fs/file
5. godown -f /mnt/f2fs
6. kill process[pid = x]
7. umount /mnt/f2fs
8. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
9. umount /mnt/f2fs

The reason is: during recovery, i_{c,m}time of inode will be updated, then
the inode can be set dirty w/o being tracked in sbi->inode_list[DIRTY_META]
global list, so later write_checkpoint will not flush such dirty inode into
node page.

Once umount is called, sync_filesystem() in generic_shutdown_super() will
skip syncng dirty inodes due to sb_rdonly check, leaving dirty inodes
there.

To solve this issue, during umount, add remove SB_RDONLY flag in
sb->s_flags, to make sure sync_filesystem() will not be skipped.

Signed-off-by: Chao Yu <yuchao0@huawei.com>

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-11 13:16:03 -07:00
Yunlei He cda9cc595f f2fs: report error if quota off error during umount
Now, we depend on fsck to ensure quota file data is ok,
so we scan whole partition if checkpoint without umount
flag. It's same for quota off error case, which may make
quota file data inconsistent.

generic/019 reports below error:

 __quota_error: 1160 callbacks suppressed
 Quota error (device zram1): write_blk: dquota write failed
 Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
 Quota error (device zram1): write_blk: dquota write failed
 Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
 Quota error (device zram1): write_blk: dquota write failed
 Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
 Quota error (device zram1): write_blk: dquota write failed
 Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
 Quota error (device zram1): write_blk: dquota write failed
 Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
 VFS: Busy inodes after unmount of zram1. Self-destruct in 5 seconds.  Have a nice day...

If we failed in below path due to fail to write dquot block, we will miss
to release quota inode, fix it.

- f2fs_put_super
 - f2fs_quota_off_umount
  - f2fs_quota_off
   - f2fs_quota_sync   <-- failed
   - dquot_quota_off   <-- missed to call

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-11 13:16:03 -07:00
Jaegeuk Kim 5ce805869c f2fs: submit bio after shutdown
Sometimes, some merged IOs could get a chance to be submitted, resulting in
system hang in shutdown test. This issues IOs all the time after shutdown.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-07 14:26:43 -07:00
Jaegeuk Kim 0ded69f632 f2fs: avoid wrong decrypted data from disk
1. Create a file in an encrypted directory
2. Do GC & drop caches
3. Read stale data before its bio for metapage was not issued yet

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-05 13:41:07 -07:00
Chao Yu 22d7ea1364 Revert "f2fs: use printk_ratelimited for f2fs_msg"
Don't limit printing log, so that we will not miss any key messages.

This reverts commit a36c106dff.

In addition, we use printk_ratelimited to avoid too many log prints.
- error injection
- discard submission failure

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-05 13:41:06 -07:00
Sahitya Tummala abde73c718 f2fs: fix unnecessary periodic wakeup of discard thread when dev is busy
When dev is busy, discard thread wake up timeout can be aligned with the
exact time that it needs to wait for dev to come out of busy. This helps
to avoid unnecessary periodic wakeups and thus save some power.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-05 13:40:32 -07:00
Chao Yu 7d20c8abb2 f2fs: fix to avoid NULL pointer dereference on se->discard_map
https://bugzilla.kernel.org/show_bug.cgi?id=200951

These is a NULL pointer dereference issue reported in bugzilla:

Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.

The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
 sda1: FAT  /boot
 sda2: F2FS /
 sda3: F2FS /home
 sda4: F2FS

The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
 mounting with "discard" option, but the device does not support discard

The USB host is USB3.0 and UASP capable. It is the one on RK3399.

Given this everything works fine, except there is no TRIM support.

In order to enable TRIM a new UDEV rule is added [1]:
 /etc/udev/rules.d/10-sata-bridge-trim.rules:
 ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
 Unable to handle kernel NULL pointer dereference

Also tested on a x86_64 system: works fine even with TRIM enabled.
 same disc
 same bridge
 different usb host controller
 different cpu architecture
 not root filesystem

Regards,
  Vicenç.

[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280

 Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
 Mem abort info:
   ESR = 0x96000004
   Exception class = DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
 Data abort info:
   ISV = 0, ISS = 0x00000004
   CM = 0, WnR = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
 [000000000000003e] pgd=0000000000000000
 Internal error: Oops: 96000004 [#1] SMP
 Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
 CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
 Hardware name: Sapphire-RK3399 Board (DT)
 pstate: 00000005 (nzcv daif -PAN -UAO)
 pc : update_sit_entry+0x304/0x4b0
 lr : update_sit_entry+0x108/0x4b0
 sp : ffff00000ca13bd0
 x29: ffff00000ca13bd0 x28: 000000000000003e
 x27: 0000000000000020 x26: 0000000000080000
 x25: 0000000000000048 x24: ffff8000ebb85cf8
 x23: 0000000000000253 x22: 00000000ffffffff
 x21: 00000000000535f2 x20: 00000000ffffffdf
 x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
 x17: 0000000007ce6926 x16: 000000001c83ffa8
 x15: 0000000000000000 x14: ffff8000f602df90
 x13: 0000000000000006 x12: 0000000000000040
 x11: 0000000000000228 x10: 0000000000000000
 x9 : 0000000000000000 x8 : 0000000000000000
 x7 : 00000000000535f2 x6 : ffff8000ebff3440
 x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
 x3 : 00000000ffffffff x2 : 0000000000000020
 x1 : 0000000000000000 x0 : ffff8000eb9e5800
 Process nvim (pid: 957, stack limit = 0x0000000063a78320)
 Call trace:
  update_sit_entry+0x304/0x4b0
  f2fs_invalidate_blocks+0x98/0x140
  truncate_node+0x90/0x400
  f2fs_remove_inode_page+0xe8/0x340
  f2fs_evict_inode+0x2b0/0x408
  evict+0xe0/0x1e0
  iput+0x160/0x260
  do_unlinkat+0x214/0x298
  __arm64_sys_unlinkat+0x3c/0x68
  el0_svc_handler+0x94/0x118
  el0_svc+0x8/0xc
 Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
 ---[ end trace a0f21a307118c477 ]---

The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.

So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-05 13:40:31 -07:00
Chengguang Xu 1618e6e297 f2fs: add additional sanity check in f2fs_acl_from_disk()
Add additinal sanity check for irregular case(e.g. corruption).
If size of extended attribution is smaller than size of acl header,
then return -EINVAL.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-05 13:40:31 -07:00
Linus Torvalds fe6f0ed0da f2fs-for-4.19-rc1
In this round, we've tuned f2fs to improve general performance by serializing
 block allocation and enhancing discard flows like fstrim which avoids user IO
 contention. And we've added fsync_mode=nobarrier which gives an option to user
 where it skips issuing cache_flush commands to underlying flash storage. And
 there are many bug fixes related to fuzzed images, revoked atomic writes, quota
 ops, and minor direct IO.
 
 Enhancement:
  - add fsync_mode=nobarrier which bypasses cache_flush command
  - enhance the discarding flow which avoids user IOs and issues in LBA order
  - readahead some encrypted blocks during GC
  - enable in-memory inode checksum to verify the blocks if F2FS_CHECK_FS is set
  - enhance nat_bits behavior
  - set -o discard by default
  - set REQ_RAHEAD to bio in ->readpages
 
 Bug fixes:
  - fix a corner case to corrupt atomic_writes revoking flow
  - revisit i_gc_rwsem to fix race conditions
  - fix some dio behaviors captured by xfstests
  - correct handling errors given by quota-related failures
  - add many sanity check flows to avoid fuzz test failures
  - add more error number propagation to their callers
  - fix several corner cases to continue fault injection w/ shutdown loop
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAlt82U4ACgkQQBSofoJI
 UNJTLQ/+PhewnNa5tDfUgWdFnUFz3h9/NcC677l0OplOOUNxA8iSa1xamlKf/nf9
 sB5ey0I7oBF8zQGxfndhHQfi6fpfUcNMr14hm+TS/3+d54xLJmiVShD5fjNSV2vB
 Ur0xoozuQDwYF1e3QKdBQjFqaCf78VheTr3aWxyv22/Sg+PYylZJ2K8rHTB7mGPU
 UG0aRnKrP3FPRjL7Q3m0Vm6b6eZ5uNdNrFfjgn/8yuQQ9V197K8vwSbPAsR5/pOh
 miCQXyM708NgEYJRWkWmC/rDSQdU0/h/mGnJWrBrbceW62QefGOgd2jcVfmthHJa
 ZXpj+BEG5bYpCCxGxF6N+u0e28OKonCwO/uvL8YAd5icN7yXtsKzoF1CCuXxOYf1
 9K5SMylCTSyrs/+LV8CJoT2ya8w0l0w+R/txUYn8UT+4AgqU+chS2kJeXqw9tcHB
 WLFs/rnAyofWCI/8frVBmJY+zA1ZZvTqs/lmVYrtJUkiOcMTq34WICBUAEFKV452
 BM5dcu21bSIkapYispEt4Rr7o4P4HHMQ+N1i2yUZMFCz5T0RyzdybeS5THk2yVzd
 L0kxfYU+zHigNX51ez8+Z7DyDLDBp6jkD0e66x73bUK9TGPH+ZbnAL6gwLikmD3M
 +VxYl5nyW/3bxx1HdfK1Xwd4/wYMNBmdtn5NC50oZ+jpB0h8YCE=
 =zgbL
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've tuned f2fs to improve general performance by
  serializing block allocation and enhancing discard flows like fstrim
  which avoids user IO contention. And we've added fsync_mode=nobarrier
  which gives an option to user where it skips issuing cache_flush
  commands to underlying flash storage. And there are many bug fixes
  related to fuzzed images, revoked atomic writes, quota ops, and minor
  direct IO.

  Enhancements:
   - add fsync_mode=nobarrier which bypasses cache_flush command
   - enhance the discarding flow which avoids user IOs and issues in
     LBA order
   - readahead some encrypted blocks during GC
   - enable in-memory inode checksum to verify the blocks if
     F2FS_CHECK_FS is set
   - enhance nat_bits behavior
   - set -o discard by default
   - set REQ_RAHEAD to bio in ->readpages

  Bug fixes:
   - fix a corner case to corrupt atomic_writes revoking flow
   - revisit i_gc_rwsem to fix race conditions
   - fix some dio behaviors captured by xfstests
   - correct handling errors given by quota-related failures
   - add many sanity check flows to avoid fuzz test failures
   - add more error number propagation to their callers
   - fix several corner cases to continue fault injection w/ shutdown
     loop"

* tag 'f2fs-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (89 commits)
  f2fs: readahead encrypted block during GC
  f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gc
  f2fs: fix performance issue observed with multi-thread sequential read
  f2fs: fix to skip verifying block address for non-regular inode
  f2fs: rework fault injection handling to avoid a warning
  f2fs: support fault_type mount option
  f2fs: fix to return success when trimming meta area
  f2fs: fix use-after-free of dicard command entry
  f2fs: support discard submission error injection
  f2fs: split discard command in prior to block layer
  f2fs: wake up gc thread immediately when gc_urgent is set
  f2fs: fix incorrect range->len in f2fs_trim_fs()
  f2fs: refresh recent accessed nat entry in lru list
  f2fs: fix avoid race between truncate and background GC
  f2fs: avoid race between zero_range and background GC
  f2fs: fix to do sanity check with block address in main area v2
  f2fs: fix to do sanity check with inline flags
  f2fs: fix to reset i_gc_failures correctly
  f2fs: fix invalid memory access
  f2fs: fix to avoid broken of dnode block list
  ...
2018-08-22 13:29:39 -07:00
Chao Yu 6aa58d8ad2 f2fs: readahead encrypted block during GC
During GC, for each encrypted block, we will read block synchronously
into meta page, and then submit it into current cold data log area.

So this block read model with 4k granularity can make poor performance,
like migrating non-encrypted block, let's readahead encrypted block
as well to improve migration performance.

To implement this, we choose meta page that its index is old block
address of the encrypted block, and readahead ciphertext into this
page, later, if readaheaded page is still updated, we will load its
data into target meta page, and submit the write IO.

Note that for OPU, truncation, deletion, we need to invalid meta
page after we invalid old block address, to make sure we won't load
invalid data from target meta page during encrypted block migration.

for ((i = 0; i < 1000; i++))
do {
        xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
} done

for ((i = 0; i < 1000; i+=2))
do {
        rm /mnt/f2fs/dir/$i;
} done

ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);

Before:
              gc-6549  [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc]
              gc-6549  [001] d..1 214682.212802: block_unplug: [gc] 1
              gc-6549  [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc]
              gc-6549  [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc]
              gc-6549  [001] .... 214682.213902: block_plug: [gc]
              gc-6549  [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc]
              gc-6549  [001] d..1 214682.213908: block_unplug: [gc] 1
              gc-6549  [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc]
              gc-6549  [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc]
              gc-6549  [001] .... 214682.226414: block_plug: [gc]
              gc-6549  [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc]
              gc-6549  [001] d..1 214682.226420: block_unplug: [gc] 1
              gc-6549  [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc]
              gc-6549  [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc]
              gc-6549  [001] .... 214682.226911: block_plug: [gc]
              gc-6549  [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc]
              gc-6549  [001] d..1 214682.226916: block_unplug: [gc] 1

After:
              gc-5678  [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc]
              gc-5678  [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc]
              gc-5678  [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc]
              gc-5678  [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc]
              gc-5678  [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc]
              gc-5678  [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc]
              gc-5678  [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc]
              gc-5678  [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc]
              gc-5678  [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc]
              gc-5678  [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc]
              gc-5678  [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc]
              gc-5678  [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc]
              gc-5678  [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc]
              gc-5678  [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc]
              gc-5678  [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc]
              gc-5678  [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc]
              gc-5678  [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc]
              gc-5678  [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc]
              gc-5678  [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc]
              gc-5678  [003] d..1 214327.026023: block_unplug: [gc] 1
              gc-5678  [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc]
              gc-5678  [003] .... 214327.026046: block_plug: [gc]

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-20 23:13:42 -07:00
Jaegeuk Kim 6f8d445506 f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gc
The f2fs_gc() called by f2fs_balance_fs() requires to be called outside of
fi->i_gc_rwsem[WRITE], since f2fs_gc() can try to grab it in a loop.

If it hits the miximum retrials in GC, let's give a chance to release
gc_mutex for a short time in order not to go into live lock in the worst
case.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-20 23:13:42 -07:00
Jaegeuk Kim 853137cef4 f2fs: fix performance issue observed with multi-thread sequential read
This reverts the commit - "b93f771 - f2fs: remove writepages lock"
to fix the drop in sequential read throughput.

Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L
device: UFS

Before -
read throughput: 185 MB/s
total read requests: 85177 (of these ~80000 are 4KB size requests).
total write requests: 2546 (of these ~2208 requests are written in 512KB).

After -
read throughput: 758 MB/s
total read requests: 2417 (of these ~2042 are 512KB reads).
total write requests: 2701 (of these ~2034 requests are written in 512KB).

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-20 23:13:42 -07:00
Jens Axboe 74c8164e1c mpage: mpage_readpages() should submit IO as read-ahead
a_ops->readpages() is only ever used for read-ahead, yet we don't flag
the IO being submitted as such.  Fix that up.  Any file system that uses
mpage_readpages() as its ->readpages() implementation will now get this
right.

Since we're passing in whether the IO is read-ahead or not, we don't
need to pass in the 'gfp' separately, as it is dependent on the IO being
read-ahead.  Kill off that member.

Add some documentation notes on ->readpages() being purely for
read-ahead.

Link: http://lkml.kernel.org/r/20180621010725.17813-3-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <clm@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-17 16:20:29 -07:00
Chao Yu dda9f4b9ca f2fs: fix to skip verifying block address for non-regular inode
generic/184 1s ... [failed, exit status 1]- output mismatch
    --- tests/generic/184.out	2015-01-11 16:52:27.643681072 +0800
     QA output created by 184 - silence is golden
    +rm: cannot remove '/mnt/f2fs/null': Bad address
    +mknod: '/mnt/f2fs/null': Bad address
    +chmod: cannot access '/mnt/f2fs/null': Bad address
    +./tests/generic/184: line 36: /mnt/f2fs/null: Bad address
    ...

F2FS-fs (zram0): access invalid blkaddr:259
EIP: f2fs_is_valid_blkaddr+0x14b/0x1b0 [f2fs]
 f2fs_iget+0x927/0x1010 [f2fs]
 f2fs_lookup+0x26e/0x630 [f2fs]
 __lookup_slow+0xb3/0x140
 lookup_slow+0x31/0x50
 walk_component+0x185/0x1f0
 path_lookupat+0x51/0x190
 filename_lookup+0x7f/0x140
 user_path_at_empty+0x36/0x40
 vfs_statx+0x61/0xc0
 __do_sys_stat64+0x29/0x40
 sys_stat64+0x13/0x20
 do_fast_syscall_32+0xaa/0x22c
 entry_SYSENTER_32+0x53/0x86

In f2fs_iget(), we will check inode's first block address, if it is valid,
we will set FI_FIRST_BLOCK_WRITTEN flag in inode.

But we should only do this for regular inode, otherwise, like special
inode, i_addr[0] is used for storing device info instead of block address,
it will fail checking flow obviously.

So for non-regular inode, let's skip verifying address and setting flag.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14 10:29:11 -07:00
Arnd Bergmann 7fa750a163 f2fs: rework fault injection handling to avoid a warning
When CONFIG_F2FS_FAULT_INJECTION is disabled, we get a warning about an
unused label:

fs/f2fs/segment.c: In function '__submit_discard_cmd':
fs/f2fs/segment.c:1059:1: error: label 'submit' defined but not used [-Werror=unused-label]

This could be fixed by adding another #ifdef around it, but the more
reliable way of doing this seems to be to remove the other #ifdefs
where that is easily possible.

By defining time_to_inject() as a trivial stub, most of the checks for
CONFIG_F2FS_FAULT_INJECTION can go away. This also leads to nicer
formatting of the code.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-14 09:49:15 -07:00
Chao Yu d494500a70 f2fs: support fault_type mount option
Previously, once fault injection is on, by default, all kind of faults
will be injected to f2fs, if we want to trigger single or specified
combined type during the test, we need to configure sysfs entry, it will
be a little inconvenient to integrate sysfs configuring into testsuit,
such as xfstest.

So this patch introduces a new mount option 'fault_type' to assist old
option 'fault_injection', with these two mount options, we can specify
any fault rate/type at mount-time.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-13 10:48:17 -07:00
Chao Yu 3f16ecd950 f2fs: fix to return success when trimming meta area
generic/251
    --- tests/generic/251.out	2016-05-03 20:20:11.381899000 +0800
     QA output created by 251
     Running the test: done.
    +fstrim: /mnt/scratch_f2fs: FITRIM ioctl failed: Invalid argument
    +fstrim: /mnt/scratch_f2fs: FITRIM ioctl failed: Invalid argument
    +fstrim: /mnt/scratch_f2fs: FITRIM ioctl failed: Invalid argument
    +fstrim: /mnt/scratch_f2fs: FITRIM ioctl failed: Invalid argument
    +fstrim: /mnt/scratch_f2fs: FITRIM ioctl failed: Invalid argument
    ...
Ran: generic/251
Failures: generic/251

The reason is coverage of fstrim locates in meta area, previously we
just return -EINVAL for such case, making generic/251 failed, to fix
this problem, let's relieve restriction to return success with no
block discarded.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-13 10:48:17 -07:00
Chao Yu 6b9cb1242c f2fs: fix use-after-free of dicard command entry
As Dan Carpenter reported:

The patch 20ee4382322c: "f2fs: issue small discard by LBA order" from
Jul 8, 2018, leads to the following Smatch warning:

	fs/f2fs/segment.c:1277 __issue_discard_cmd_orderly()
	warn: 'dc' was already freed.

See also:
fs/f2fs/segment.c:2550 __issue_discard_cmd_range() warn: 'dc' was already freed.

In order to fix this issue, let's get error from __submit_discard_cmd(),
and release current discard command after we referenced next one.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-13 10:48:17 -07:00
Chao Yu b83dcfe671 f2fs: support discard submission error injection
This patch adds to support discard submission error injection for testing
error handling of __submit_discard_cmd().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-13 10:48:17 -07:00
Chao Yu 35ec7d5748 f2fs: split discard command in prior to block layer
Some devices has small max_{hw,}discard_sectors, so that in
__blkdev_issue_discard(), one big size discard bio can be split
into multiple small size discard bios, result in heavy load in IO
scheduler and device, which can hang other sync IO for long time.

Now, f2fs is trying to control discard commands more elaboratively,
in order to make less conflict in between discard IO and user IO
to enhance application's performance, so in this patch, we will
split discard bio in f2fs in prior to in block layer to reduce
issuing multiple discard bios in a short time.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-13 10:48:17 -07:00
Sheng Yong a690efffd1 f2fs: wake up gc thread immediately when gc_urgent is set
Fixes: 5b0e95398e ("f2fs: introduce sbi->gc_mode to determine the policy")
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-13 10:48:17 -07:00
Chao Yu 6eae269461 f2fs: fix incorrect range->len in f2fs_trim_fs()
generic/260 reported below error:

     [+] Default length with start set (should succeed)
     [+] Length beyond the end of fs (should succeed)
     [+] Length beyond the end of fs with start set (should succeed)
    +./tests/generic/260: line 94: [: 18446744073709551615: integer expression expected
    +./tests/generic/260: line 104: [: 18446744073709551615: integer expression expected
     Test done
    ...

In f2fs_trim_fs(), if there is no discard being trimmed, we need to correct
range->len before return.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-13 10:48:17 -07:00
Chao Yu 2296915808 f2fs: refresh recent accessed nat entry in lru list
Introduce nat_list_lock to protect nm_i->nat_entries list, and manage
it as a LRU list, refresh location for therein recent accessed entries
in the list.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-08-13 10:48:17 -07:00