linux

Commit Graph

Author	SHA1	Message	Date
Chao Yu	e584bbe821	f2fs: fix shift-out-of-bounds in sanity_check_raw_super() syzbot reported a bug which could cause shift-out-of-bounds issue, fix it. Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395 sanity_check_raw_super fs/f2fs/super.c:2812 [inline] read_raw_super_block fs/f2fs/super.c:3267 [inline] f2fs_fill_super.cold+0x16c9/0x16f6 fs/f2fs/super.c:3519 mount_bdev+0x34d/0x410 fs/super.c:1366 legacy_get_tree+0x105/0x220 fs/fs_context.c:592 vfs_get_tree+0x89/0x2f0 fs/super.c:1496 do_new_mount fs/namespace.c:2896 [inline] path_mount+0x12ae/0x1e70 fs/namespace.c:3227 do_mount fs/namespace.c:3240 [inline] __do_sys_mount fs/namespace.c:3448 [inline] __se_sys_mount fs/namespace.c:3425 [inline] __x64_sys_mount+0x27f/0x300 fs/namespace.c:3425 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported-by: syzbot+ca9a785f8ac472085994@syzkaller.appspotmail.com Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-10 09:13:19 -08:00
Daeho Jeong	6422a71ef4	f2fs: fix race of pending_pages in decompression I found out f2fs_free_dic() is invoked in a wrong timing, but f2fs_verify_bio() still needed the dic info and it triggered the below kernel panic. It has been caused by the race condition of pending_pages value between decompression and verity logic, when the same compression cluster had been split in different bios. By split bios, f2fs_verify_bio() ended up with decreasing pending_pages value before it is reset to nr_cpages by f2fs_decompress_pages() and caused the kernel panic. [ 4416.564763] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 ... [ 4416.896016] Workqueue: fsverity_read_queue f2fs_verity_work [ 4416.908515] pc : fsverity_verify_page+0x20/0x78 [ 4416.913721] lr : f2fs_verify_bio+0x11c/0x29c [ 4416.913722] sp : ffffffc019533cd0 [ 4416.913723] x29: ffffffc019533cd0 x28: 0000000000000402 [ 4416.913724] x27: 0000000000000001 x26: 0000000000000100 [ 4416.913726] x25: 0000000000000001 x24: 0000000000000004 [ 4416.913727] x23: 0000000000001000 x22: 0000000000000000 [ 4416.913728] x21: 0000000000000000 x20: ffffffff2076f9c0 [ 4416.913729] x19: ffffffff2076f9c0 x18: ffffff8a32380c30 [ 4416.913731] x17: ffffffc01f966d97 x16: 0000000000000298 [ 4416.913732] x15: 0000000000000000 x14: 0000000000000000 [ 4416.913733] x13: f074faec89ffffff x12: 0000000000000000 [ 4416.913734] x11: 0000000000001000 x10: 0000000000001000 [ 4416.929176] x9 : ffffffff20d1f5c7 x8 : 0000000000000000 [ 4416.929178] x7 : 626d7464ff286b6b x6 : ffffffc019533ade [ 4416.929179] x5 : 000000008049000e x4 : ffffffff2793e9e0 [ 4416.929180] x3 : 000000008049000e x2 : ffffff89ecfa74d0 [ 4416.929181] x1 : 0000000000000c40 x0 : ffffffff2076f9c0 [ 4416.929184] Call trace: [ 4416.929187] fsverity_verify_page+0x20/0x78 [ 4416.929189] f2fs_verify_bio+0x11c/0x29c [ 4416.929192] f2fs_verity_work+0x58/0x84 [ 4417.050667] process_one_work+0x270/0x47c [ 4417.055354] worker_thread+0x27c/0x4d8 [ 4417.059784] kthread+0x13c/0x320 [ 4417.063693] ret_from_fork+0x10/0x18 Chao pointed this can happen by the below race condition. Thread A f2fs_post_read_wq fsverity_wq - f2fs_read_multi_pages() - f2fs_alloc_dic - dic->pending_pages = 2 - submit_bio() - submit_bio() - f2fs_post_read_work() handle first bio - f2fs_decompress_work() - __read_end_io() - f2fs_decompress_pages() - dic->pending_pages-- - enqueue f2fs_verity_work() - f2fs_verity_work() handle first bio - f2fs_verify_bio() - dic->pending_pages-- - f2fs_post_read_work() handle second bio - f2fs_decompress_work() - enqueue f2fs_verity_work() - f2fs_verify_pages() - f2fs_free_dic() - f2fs_verity_work() handle second bio - f2fs_verfy_bio() - use-after-free on dic Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-08 15:39:14 -08:00
Chao Yu	96dd025195	f2fs: fix to account inline xattr correctly during recovery During recovery, we may missed to update inline xattr count correctly, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-08 14:25:41 -08:00
Jack Qiu	8492156153	f2fs: inline: fix wrong inline inode stat Miss to stat inline inode in f2fs_recover_inline_data. Signed-off-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-08 14:25:41 -08:00
Jack Qiu	6e5ca4fce7	f2fs: inline: correct comment in f2fs_recover_inline_data In 3rd scene, it should remove data blocks instead of inline_data. Signed-off-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-08 14:25:41 -08:00
Yangtao Li	d540e35d4e	f2fs: don't check PAGE_SIZE again in sanity_check_raw_super() Many flash devices read and write a single IO based on a multiple of 4KB, and we support only 4KB page cache size now. Since we already check page size in init_f2fs_fs(), so remove page size check in sanity_check_raw_super(). Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Shaohua Liu <liush@allwinnertech.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-08 14:25:40 -08:00
Yangtao Li	b9ec10948f	f2fs: convert to F2FS_*_INO macro Use F2FS_ROOT_INO, F2FS_NODE_INO and F2FS_META_INO macro for better code readability. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Shaohua Liu <liush@allwinnertech.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-08 14:25:40 -08:00
Jaegeuk Kim	10208567f1	f2fs: introduce max_io_bytes, a sysfs entry, to limit bio size This patch adds max_io_bytes to limit bio size when f2fs tries to merge consecutive IOs. This can give a testing point to split out bios and check end_io handles those bios correctly. This is used to capture a recent bug on the decompression and fsverity flow. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-07 08:41:25 -08:00
Jaegeuk Kim	ec2ddf4994	f2fs: don't allow any writes on readonly mount generic_make_request: Trying to write to read-only block-device dm-5 (partno 0) WARNING: CPU: 7 PID: 546 at block/blk-core.c:2190 generic_make_request_checks+0x664/0x690 pc : generic_make_request_checks+0x664/0x690 lr : generic_make_request_checks+0x664/0x690 Call trace: generic_make_request_checks+0x664/0x690 generic_make_request+0xf0/0x3a4 submit_bio+0x80/0x250 __submit_merged_bio+0x368/0x4e0 __submit_merged_write_cond.llvm.12294350193007536502+0xe0/0x3e8 f2fs_wait_on_page_writeback+0x84/0x128 f2fs_convert_inline_page+0x35c/0x6f8 f2fs_convert_inline_inode+0xe0/0x2e0 f2fs_file_mmap+0x48/0x9c mmap_region+0x41c/0x74c do_mmap+0x40c/0x4fc vm_mmap_pgoff+0xb8/0x114 vm_mmap+0x34/0x48 elf_map+0x68/0x108 load_elf_binary+0x538/0xb70 search_binary_handler+0xac/0x1dc exec_binprm+0x50/0x15c __do_execve_file+0x620/0x740 __arm64_sys_execve+0x54/0x68 el0_svc_common+0x9c/0x168 el0_svc_handler+0x60/0x6c el0_svc+0x8/0xc Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-07 08:41:16 -08:00
Jaegeuk Kim	a95ba66ac1	f2fs: avoid race condition for shrinker count Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in wrong do_shinker work. Let's avoid to return insane overflowed value by adding single tracking value. Reported-by: Light Hsieh <Light.Hsieh@mediatek.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-03 00:59:26 -08:00
Daeho Jeong	5fdb322ff2	f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE Added two ioctl to decompress/compress explicitly the compression enabled file in "compress_mode=user" mount option. Using these two ioctls, the users can make a control of compression and decompression of their files. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-03 00:12:08 -08:00
Daeho Jeong	602a16d58e	f2fs: add compress_mode mount option We will add a new "compress_mode" mount option to control file compression mode. This supports "fs" and "user". In "fs" mode (default), f2fs does automatic compression on the compression enabled files. In "user" mode, f2fs disables the automaic compression and gives the user discretion of choosing the target file and the timing. It means the user can do manual compression/decompression on the compression enabled files using ioctls. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-03 00:11:57 -08:00
Yangtao Li	db48965264	f2fs: Remove unnecessary unlikely() WARN_ON() already contains an unlikely(), so it's not necessary to use unlikely. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Shuosheng Huang <huangshuosheng@allwinnertech.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-02 22:00:23 -08:00
Jack Qiu	5335bfc6eb	f2fs: init dirty_secmap incorrectly section is dirty, but dirty_secmap may not set Reported-by: Jia Yang <jiayang5@huawei.com> Fixes: `da52f8ade4` ("f2fs: get the right gc victim section when section has several segments") Cc: <stable@vger.kernel.org> Signed-off-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-02 22:00:23 -08:00
Jaegeuk Kim	b876f4c94c	f2fs: remove buffer_head which has 32bits limit This patch removes buffer_head dependency when getting block addresses. Light reported there's a 32bit issue in f2fs_fiemap where map_bh.b_size is 32bits while len is 64bits given by user. This will give wrong length to f2fs_map_block. Reported-by: Light Hsieh <Light.Hsieh@mediatek.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-02 22:00:23 -08:00
Jaegeuk Kim	963ba7f983	f2fs: fix wrong block count instead of bytes We should convert cur_lblock, a block count, to bytes for len. Fixes: `af4b6b8edf` ("f2fs: introduce check_swap_activate_fast()") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-02 22:00:22 -08:00
Jaegeuk Kim	43b9d4b4d9	f2fs: use new conversion functions between blks and bytes This patch cleans up blks and bytes conversions. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-02 22:00:22 -08:00
Jaegeuk Kim	6cbfcab5ff	f2fs: rename logical_to_blk and blk_to_logical This patch renames two functions like below having u64. - logical_to_blk to bytes_to_blks - blk_to_logical to blks_to_bytes Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-02 22:00:22 -08:00
Chao Yu	3a0a9cbc44	f2fs: fix kbytes written stat for multi-device case For multi-device case, one f2fs image includes multi devices, so it needs to account bytes written of all block devices belong to the image rather than one main block device, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-02 22:00:22 -08:00
Chao Yu	b28f047b28	f2fs: compress: support chksum This patch supports to store chksum value with compressed data, and verify the integrality of compressed data while reading the data. The feature can be enabled through specifying mount option 'compress_chksum'. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-02 22:00:22 -08:00
Chao Yu	493720a485	f2fs: fix to avoid REQ_TIME and CP_TIME collision Lei Li reported a issue: if foreground operations are frequent, background checkpoint may be always skipped due to below check, result in losing more data after sudden power-cut. f2fs_balance_fs_bg() ... if (!is_idle(sbi, REQ_TIME) && (!excess_dirty_nats(sbi) && !excess_dirty_nodes(sbi))) return; E.g: cp_interval = 5 second idle_interval = 2 second foreground operation interval = 1 second (append 1 byte per second into file) In such case, no matter when it calls f2fs_balance_fs_bg(), is_idle(, REQ_TIME) returns false, result in skipping background checkpoint. This patch changes as below to make trigger condition being more reasonable: - trigger sync_fs() if dirty_{nats,nodes} and prefree segs exceeds threshold; - skip triggering sync_fs() if there is any background inflight IO or there is foreground operation recently and meanwhile cp_rwsem is being held by someone; Reported-by: Lei Li <noctis.akm@gmail.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-02 22:00:21 -08:00
Sahitya Tummala	8769918bf0	f2fs: change to use rwsem for cp_mutex Use rwsem to ensure serialization of the callers and to avoid starvation of high priority tasks, when the system is under heavy IO workload. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-02 22:00:21 -08:00
Daniel Rosenberg	7ad08a58bf	f2fs: Handle casefolding with Encryption Expand f2fs's casefolding support to include encrypted directories. To index casefolded+encrypted directories, we use the SipHash of the casefolded name, keyed by a key derived from the directory's fscrypt master key. This ensures that the dirhash doesn't leak information about the plaintext filenames. Encryption keys are unavailable during roll-forward recovery, so we can't compute the dirhash when recovering a new dentry in an encrypted + casefolded directory. To avoid having to force a checkpoint when a new file is fsync'ed, store the dirhash on-disk appended to i_name. This patch incorporates work by Eric Biggers <ebiggers@google.com> and Jaegeuk Kim <jaegeuk@kernel.org>. Co-developed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Daniel Rosenberg <drosen@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-02 22:00:21 -08:00
Daniel Rosenberg	bb9cd9106b	fscrypt: Have filesystems handle their d_ops This shifts the responsibility of setting up dentry operations from fscrypt to the individual filesystems, allowing them to have their own operations while still setting fscrypt's d_revalidate as appropriate. Most filesystems can just use generic_set_encrypted_ci_d_ops, unless they have their own specific dentry operations as well. That operation will set the minimal d_ops required under the circumstances. Since the fscrypt d_ops are set later on, we must set all d_ops there, since we cannot adjust those later on. This should not result in any change in behavior. Signed-off-by: Daniel Rosenberg <drosen@google.com> Acked-by: Theodore Ts'o <tytso@mit.edu> Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-02 22:00:21 -08:00
Zhang Qilong	beb78181f1	f2fs: Remove the redundancy initialization There are two assignments are meaningless, and remove them. Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-02 22:00:21 -08:00
Liu Song	9f7e334aec	f2fs: remove writeback_inodes_sb in f2fs_remount Since sync_inodes_sb has been used, there is no need to use writeback_inodes_sb, so remove it. Signed-off-by: Liu Song <liu.song11@zte.com.cn> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-02 22:00:20 -08:00
Hyeongseok Kim	89ff600503	f2fs: fix double free of unicode map In case of retrying fill_super with skip_recovery, s_encoding for casefold would not be loaded again even though it's already been freed because it's not NULL. Set NULL after free to prevent double freeing when unmount. Fixes: `eca4873ee1` ("f2fs: Use generic casefolding support") Signed-off-by: Hyeongseok Kim <hyeongseok@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-02 22:00:20 -08:00
Chao Yu	34178b1bc4	f2fs: fix compat F2FS_IOC_{MOVE,GARBAGE_COLLECT}_RANGE Eric reported a ioctl bug in below link: https://lore.kernel.org/linux-f2fs-devel/20201103032234.GB2875@sol.localdomain/ That said, on some 32-bit architectures, u64 has only 32-bit alignment, notably i386 and x86_32, so that size of struct f2fs_gc_range compiled in x86_32 is 20 bytes, however the size in x86_64 is 24 bytes, binary compiled in x86_32 can not call F2FS_IOC_GARBAGE_COLLECT_RANGE successfully due to mismatched value of ioctl command in between binary and f2fs module, similarly, F2FS_IOC_MOVE_RANGE will fail too. In this patch we introduce two ioctls for compatibility of above special 32-bit binary: - F2FS_IOC32_GARBAGE_COLLECT_RANGE - F2FS_IOC32_MOVE_RANGE Reported-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-02 22:00:20 -08:00
Chao Yu	3a1b9eaf72	f2fs: avoid unneeded data copy in f2fs_ioc_move_range() Fields in struct f2fs_move_range won't change in f2fs_ioc_move_range(), let's avoid copying this structure's data to userspace. Signed-off-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-02 22:00:20 -08:00
Daeho Jeong	e1e8debec6	f2fs: add F2FS_IOC_SET_COMPRESS_OPTION ioctl Added a new F2FS_IOC_SET_COMPRESS_OPTION ioctl to change file compression option of a file. struct f2fs_comp_option { u8 algorithm; => compression algorithm => 0:lzo, 1:lz4, 2:zstd, 3:lzorle u8 log_cluster_size; => log scale cluster size => 2 ~ 8 }; struct f2fs_comp_option option; option.algorithm = 1; option.log_cluster_size = 7; ioctl(fd, F2FS_IOC_SET_COMPRESS_OPTION, &option); Signed-off-by: Daeho Jeong <daehojeong@google.com> [Chao Yu: remove f2fs_is_compress_algorithm_valid()] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-12-02 21:59:38 -08:00
Eric Biggers	ec0caa974c	fscrypt: introduce fscrypt_prepare_readdir() The last remaining use of fscrypt_get_encryption_info() from filesystems is for readdir (->iterate_shared()). Every other call is now in fs/crypto/ as part of some other higher-level operation. We need to add a new argument to fscrypt_get_encryption_info() to indicate whether the encryption policy is allowed to be unrecognized or not. Doing this is easier if we can work with high-level operations rather than direct filesystem use of fscrypt_get_encryption_info(). So add a function fscrypt_prepare_readdir() which wraps the call to fscrypt_get_encryption_info() for the readdir use case. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201203022041.230976-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-12-02 18:25:01 -08:00
Eric Biggers	73114b6d28	f2fs: remove f2fs_dir_open() Since encrypted directories can be opened and searched without their key being available, and each readdir and ->lookup() tries to set up the key, trying to set up the key in ->open() too isn't really useful. Just remove it so that directories don't need an ->open() method anymore, and so that we eliminate a use of fscrypt_get_encryption_info() (which I'd like to stop exporting to filesystems). Reviewed-by: Chao Yu <yuchao0@huawei.com> Link: https://lore.kernel.org/r/20201203022041.230976-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-12-02 18:25:01 -08:00
Christoph Hellwig	9499ffc752	f2fs: remove a few bd_part checks bd_part is never NULL for a block device in use by a file system, so remove the checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-01 14:53:40 -07:00
Christoph Hellwig	8446fe9255	block: switch partition lookup to use struct block_device Use struct block_device to lookup partitions on a disk. This removes all usage of struct hd_struct from the I/O path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-01 14:53:40 -07:00
Christoph Hellwig	a782483cc1	block: remove the nr_sects field in struct hd_struct Now that the hd_struct always has a block device attached to it, there is no need for having two size field that just get out of sync. Additionally the field in hd_struct did not use proper serialization, possibly allowing for torn writes. By only using the block_device field this problem also gets fixed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-01 14:53:40 -07:00
Christoph Hellwig	040f04bd2e	fs: simplify freeze_bdev/thaw_bdev Store the frozen superblock in struct block_device to avoid the awkward interface that can return a sb only used a cookie, an ERR_PTR or NULL. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-01 14:53:38 -07:00
Eric Biggers	bfc2b7e851	f2fs: prevent creating duplicate encrypted filenames As described in "fscrypt: add fscrypt_is_nokey_name()", it's possible to create a duplicate filename in an encrypted directory by creating a file concurrently with adding the directory's encryption key. Fix this bug on f2fs by rejecting no-key dentries in f2fs_add_link(). Note that the weird check for the current task in f2fs_do_add_link() seems to make this bug difficult to reproduce on f2fs. Fixes: `9ea97163c6` ("f2fs crypto: add filename encryption for f2fs_add_link") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201118075609.120337-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-11-24 15:10:27 -08:00
Daeho Jeong	9e2a5f8cfb	f2fs: add F2FS_IOC_GET_COMPRESS_OPTION ioctl Added a new F2FS_IOC_GET_COMPRESS_OPTION ioctl to get file compression option of a file. struct f2fs_comp_option { u8 algorithm; => compression algorithm => 0:lzo, 1:lz4, 2:zstd, 3:lzorle u8 log_cluster_size; => log scale cluster size => 2 ~ 8 }; struct f2fs_comp_option option; ioctl(fd, F2FS_IOC_GET_COMPRESS_OPTION, &option); Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-11-02 17:34:00 -08:00
Chao Yu	fa4320cefb	f2fs: move ioctl interface definitions to separated file Like other filesystem does, we introduce a new file f2fs.h in path of include/uapi/linux/, and move f2fs-specified ioctl interface definitions to that file, after then, in order to use those definitions, userspace developer only need to include the new header file rather than copy & paste definitions from fs/f2fs/f2fs.h. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-11-02 08:33:02 -08:00
Chao Yu	7a6e59d719	f2fs: fix to seek incorrect data offset in inline data file As kitestramuort reported: F2FS-fs (nvme0n1p4): access invalid blkaddr:1598541474 [ 25.725898] ------------[ cut here ]------------ [ 25.725903] WARNING: CPU: 6 PID: 2018 at f2fs_is_valid_blkaddr+0x23a/0x250 [ 25.725923] Call Trace: [ 25.725927] ? f2fs_llseek+0x204/0x620 [ 25.725929] ? ovl_copy_up_data+0x14f/0x200 [ 25.725931] ? ovl_copy_up_inode+0x174/0x1e0 [ 25.725933] ? ovl_copy_up_one+0xa22/0xdf0 [ 25.725936] ? ovl_copy_up_flags+0xa6/0xf0 [ 25.725938] ? ovl_aio_cleanup_handler+0xd0/0xd0 [ 25.725939] ? ovl_maybe_copy_up+0x86/0xa0 [ 25.725941] ? ovl_open+0x22/0x80 [ 25.725943] ? do_dentry_open+0x136/0x350 [ 25.725945] ? path_openat+0xb7e/0xf40 [ 25.725947] ? __check_sticky+0x40/0x40 [ 25.725948] ? do_filp_open+0x70/0x100 [ 25.725950] ? __check_sticky+0x40/0x40 [ 25.725951] ? __check_sticky+0x40/0x40 [ 25.725953] ? __x64_sys_openat+0x1db/0x2c0 [ 25.725955] ? do_syscall_64+0x2d/0x40 [ 25.725957] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 llseek() reports invalid block address access, the root cause is if file has inline data, f2fs_seek_block() will access inline data regard as block address index in inode block, which should be wrong, fix it. Reported-by: kitestramuort <kitestramuort@autistici.org> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-11-02 08:33:01 -08:00
Jaegeuk Kim	3acc4522d8	f2fs: call f2fs_get_meta_page_retry for nat page When running fault injection test, if we don't stop checkpoint, some stale NAT entries were flushed which breaks consistency. Fixes: `86f33603f8` ("f2fs: handle errors of f2fs_get_meta_page_nofail") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-11-02 08:33:01 -08:00
Linus Torvalds	0eac1102e9	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted stuff all over the place (the largest group here is Christoph's stat cleanups)" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: remove KSTAT_QUERY_FLAGS fs: remove vfs_stat_set_lookup_flags fs: move vfs_fstatat out of line fs: implement vfs_stat and vfs_lstat in terms of vfs_fstatat fs: remove vfs_statx_fd fs: omfs: use kmemdup() rather than kmalloc+memcpy [PATCH] reduce boilerplate in fsid handling fs: Remove duplicated flag O_NDELAY occurring twice in VALID_OPEN_FLAGS selftests: mount: add nosymfollow tests Add a "nosymfollow" mount option.	2020-10-24 12:26:05 -07:00
Linus Torvalds	7a3dadedc8	f2fs-for-5.10-rc1 In this round, we've added new features such as zone capacity for ZNS and a new GC policy, ATGC, along with in-memory segment management. In addition, we could improve the decompression speed significantly by changing virtual mapping method. Even though we've fixed lots of small bugs in compression support, I feel that it becomes more stable so that I could give it a try in production. Enhancement: - suport zone capacity in NVMe Zoned Namespace devices - introduce in-memory current segment management - add standart casefolding support - support age threshold based garbage collection - improve decompression speed by changing virtual mapping method Bug fix: - fix condition checks in some ioctl() such as compression, move_range, etc - fix 32/64bits support in data structures - fix memory allocation in zstd decompress - add some boundary checks to avoid kernel panic on corrupted image - fix disallowing compression for non-empty file - fix slab leakage of compressed block writes In addition, it includes code refactoring for better readability and minor bug fixes for compression and zoned device support. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAl+I0nQACgkQQBSofoJI UNIJdw//Rj0YapYXSu1nlOQppzSAkCiL6wxrrG2qkisRE7uXSYGZfWBqrMT6Asnf i0f25i6ywZOZ00dgN/klZRBh4YYSgJqYx9BPkTxZsQZ/S/EmZJPpr8m4VUB69LKL VijwUdgcW9vNDJ2/DkDQDVBd/ZqRxXnltffWtP4pS96Gj089/dE2q8KXqQrt3LM1 lLQjDfHj+0AyWRzKpErTO0W9DOgO7wmmelS0h6m2RYttkbb328JEZezg5bjWNNlk eTXxuAFFy8Ap9DngkC/sqvY2NRTv1YgOPfrT8XWwdDIiFTZ+LoYdFI5Ap/UW7QwG iz7B/0wPj3+9ncl536LRbFPiLisbYrArYGmZKF6t8w1cP6mTVlileMT4Q6s9+qhn GJS88tTBVlR9vbzDu2brjI6qRQVTBdsohIGoA1g6lz0ogbphhmTzujPbFQ6GTSBi 3sKKp59urkBpVH3TVJU1oshLjIEG2yToMgYwZH9DU7zlzJS6XpetJrzReqItEThc VNixg2DxdIFQ+nrMt+LtWaOHs5qzxIIPksguGEhqSkLL5lI75n2MZxrhKXmUsaZa qItJE0ndJFfi6vggIkJID+a0bpTss7+AxF1AmSZDafMLkZy8j14DNQAnmBUYRX0J 5QExZ+LyyaKjWQ/k9SsSzV1Y3dbguDyB+gkeMhr/6XEr9DSwZc8= =exgv -----END PGP SIGNATURE----- Merge tag 'f2fs-for-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've added new features such as zone capacity for ZNS and a new GC policy, ATGC, along with in-memory segment management. In addition, we could improve the decompression speed significantly by changing virtual mapping method. Even though we've fixed lots of small bugs in compression support, I feel that it becomes more stable so that I could give it a try in production. Enhancements: - suport zone capacity in NVMe Zoned Namespace devices - introduce in-memory current segment management - add standart casefolding support - support age threshold based garbage collection - improve decompression speed by changing virtual mapping method Bug fixes: - fix condition checks in some ioctl() such as compression, move_range, etc - fix 32/64bits support in data structures - fix memory allocation in zstd decompress - add some boundary checks to avoid kernel panic on corrupted image - fix disallowing compression for non-empty file - fix slab leakage of compressed block writes In addition, it includes code refactoring for better readability and minor bug fixes for compression and zoned device support" * tag 'f2fs-for-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (51 commits) f2fs: code cleanup by removing unnecessary check f2fs: wait for sysfs kobject removal before freeing f2fs_sb_info f2fs: fix writecount false positive in releasing compress blocks f2fs: introduce check_swap_activate_fast() f2fs: don't issue flush in f2fs_flush_device_cache() for nobarrier case f2fs: handle errors of f2fs_get_meta_page_nofail f2fs: fix to set SBI_NEED_FSCK flag for inconsistent inode f2fs: reject CASEFOLD inode flag without casefold feature f2fs: fix memory alignment to support 32bit f2fs: fix slab leak of rpages pointer f2fs: compress: fix to disallow enabling compress on non-empty file f2fs: compress: introduce cic/dic slab cache f2fs: compress: introduce page array slab cache f2fs: fix to do sanity check on segment/section count f2fs: fix to check segment boundary during SIT page readahead f2fs: fix uninit-value in f2fs_lookup f2fs: remove unneeded parameter in find_in_block() f2fs: fix wrong total_sections check and fsmeta check f2fs: remove duplicated code in sanity_check_area_boundary f2fs: remove unused check on version_bitmap ...	2020-10-16 15:14:43 -07:00
Matthew Wilcox (Oracle)	73bb49da50	mm/readahead: make page_cache_ra_unbounded take a readahead_control Define it in the callers instead of in page_cache_ra_unbounded(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: David Howells <dhowells@redhat.com> Cc: Eric Biggers <ebiggers@google.com> Link: https://lkml.kernel.org/r/20200903140844.14194-4-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-10-16 11:11:16 -07:00
Chengguang Xu	788e96d1d3	f2fs: code cleanup by removing unnecessary check f2fs_seek_block() is only used for regular file, so don't have to check inline dentry in it. Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-10-14 13:23:41 -07:00
Jamie Iles	ae284d87ab	f2fs: wait for sysfs kobject removal before freeing f2fs_sb_info syzkaller found that with CONFIG_DEBUG_KOBJECT_RELEASE=y, unmounting an f2fs filesystem could result in the following splat: kobject: 'loop5' ((____ptrval____)): kobject_release, parent 0000000000000000 (delayed 250) kobject: 'f2fs_xattr_entry-7:5' ((____ptrval____)): kobject_release, parent 0000000000000000 (delayed 750) ------------[ cut here ]------------ ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x98 WARNING: CPU: 0 PID: 699 at lib/debugobjects.c:485 debug_print_object+0x180/0x240 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 699 Comm: syz-executor.5 Tainted: G S 5.9.0-rc8+ #101 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x4d8 show_stack+0x34/0x48 dump_stack+0x174/0x1f8 panic+0x360/0x7a0 __warn+0x244/0x2ec report_bug+0x240/0x398 bug_handler+0x50/0xc0 call_break_hook+0x160/0x1d8 brk_handler+0x30/0xc0 do_debug_exception+0x184/0x340 el1_dbg+0x48/0xb0 el1_sync_handler+0x170/0x1c8 el1_sync+0x80/0x100 debug_print_object+0x180/0x240 debug_check_no_obj_freed+0x200/0x430 slab_free_freelist_hook+0x190/0x210 kfree+0x13c/0x460 f2fs_put_super+0x624/0xa58 generic_shutdown_super+0x120/0x300 kill_block_super+0x94/0xf8 kill_f2fs_super+0x244/0x308 deactivate_locked_super+0x104/0x150 deactivate_super+0x118/0x148 cleanup_mnt+0x27c/0x3c0 __cleanup_mnt+0x28/0x38 task_work_run+0x10c/0x248 do_notify_resume+0x9d4/0x1188 work_pending+0x8/0x34c Like the error handling for f2fs_register_sysfs(), we need to wait for the kobject to be destroyed before returning to prevent a potential use-after-free. Fixes: `bf9e697ecd` ("f2fs: expose features to sysfs entry") Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Chao Yu <chao@kernel.org> Signed-off-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-10-14 13:23:30 -07:00
Daeho Jeong	8c8cf26ae3	f2fs: fix writecount false positive in releasing compress blocks In current condition check, if it detects writecount, it return -EBUSY regardless of f_mode of the file. Fixed it. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-10-13 23:23:34 -07:00
Chao Yu	af4b6b8edf	f2fs: introduce check_swap_activate_fast() check_swap_activate() will lookup block mapping via bmap() one by one, so its performance is very bad, this patch introduces check_swap_activate_fast() to use f2fs_fiemap() to boost this process, since f2fs_fiemap() will lookup block mappings in batch, therefore, it can improve swapon()'s performance significantly. Note that this enhancement only works when page size is equal to f2fs' block size. Testcase: (backend device: zram) - touch file - pin & fallocate file to 8GB - mkswap file - swapon file Before: real 0m2.999s user 0m0.000s sys 0m2.980s After: real 0m0.081s user 0m0.000s sys 0m0.064s Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-10-13 23:23:34 -07:00
Chao Yu	6ed29fe1ca	f2fs: don't issue flush in f2fs_flush_device_cache() for nobarrier case This patch changes f2fs_flush_device_cache() to skip issuing flush for nobarrier case. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-10-13 23:23:34 -07:00
Jaegeuk Kim	86f33603f8	f2fs: handle errors of f2fs_get_meta_page_nofail First problem is we hit BUG_ON() in f2fs_get_sum_page given EIO on f2fs_get_meta_page_nofail(). Quick fix was not to give any error with infinite loop, but syzbot caught a case where it goes to that loop from fuzzed image. In turned out we abused f2fs_get_meta_page_nofail() like in the below call stack. - f2fs_fill_super - f2fs_build_segment_manager - build_sit_entries - get_current_sit_page INFO: task syz-executor178:6870 can't die for more than 143 seconds. task:syz-executor178 state:R stack:26960 pid: 6870 ppid: 6869 flags:0x00004006 Call Trace: Showing all locks held in the system: 1 lock held by khungtaskd/1179: #0: ffffffff8a554da0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6242 1 lock held by systemd-journal/3920: 1 lock held by in:imklog/6769: #0: ffff88809eebc130 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:930 1 lock held by syz-executor178/6870: #0: ffff8880925120e0 (&type->s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0x201/0xaf0 fs/super.c:229 Actually, we didn't have to use _nofail in this case, since we could return error to mount(2) already with the error handler. As a result, this patch tries to 1) remove _nofail callers as much as possible, 2) deal with error case in last remaining caller, f2fs_get_sum_page(). Reported-by: syzbot+ee250ac8137be41d7b13@syzkaller.appspotmail.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-10-13 23:23:29 -07:00
Linus Torvalds	6f5032a852	fscrypt updates for 5.10 This release, we rework the implementation of creating new encrypted files in order to fix some deadlocks and prepare for adding fscrypt support to CephFS, which Jeff Layton is working on. We also export a symbol in preparation for the above-mentioned CephFS support and also for ext4/f2fs encrypt+casefold support. Finally, there are a few other small cleanups. As usual, all these patches have been in linux-next with no reported issues, and I've tested them with xfstests. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCX4SD7xQcZWJpZ2dlcnNA Z29vZ2xlLmNvbQAKCRDzXCl4vpKOKy/AAP92oOybTcuahmvAtHqZP9jAFPJrbI3r 6QLpMFtWznJoOQEAogaWsavtOIBx9afdOfRNj0zdoBIjpXgyMuzR10Ou2gE= =B/Mj -----END PGP SIGNATURE----- Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt Pull fscrypt updates from Eric Biggers: "This release, we rework the implementation of creating new encrypted files in order to fix some deadlocks and prepare for adding fscrypt support to CephFS, which Jeff Layton is working on. We also export a symbol in preparation for the above-mentioned CephFS support and also for ext4/f2fs encrypt+casefold support. Finally, there are a few other small cleanups. As usual, all these patches have been in linux-next with no reported issues, and I've tested them with xfstests" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fscrypt: export fscrypt_d_revalidate() fscrypt: rename DCACHE_ENCRYPTED_NAME to DCACHE_NOKEY_NAME fscrypt: don't call no-key names "ciphertext names" fscrypt: use sha256() instead of open coding fscrypt: make fscrypt_set_test_dummy_encryption() take a 'const char *' fscrypt: handle test_dummy_encryption in more logical way fscrypt: move fscrypt_prepare_symlink() out-of-line fscrypt: make "#define fscrypt_policy" user-only fscrypt: stop pretending that key setup is nofs-safe fscrypt: require that fscrypt_encrypt_symlink() already has key fscrypt: remove fscrypt_inherit_context() fscrypt: adjust logging for in-creation inodes ubifs: use fscrypt_prepare_new_inode() and fscrypt_set_context() f2fs: use fscrypt_prepare_new_inode() and fscrypt_set_context() ext4: use fscrypt_prepare_new_inode() and fscrypt_set_context() ext4: factor out ext4_xattr_credits_for_new_inode() fscrypt: add fscrypt_prepare_new_inode() and fscrypt_set_context() fscrypt: restrict IV_INO_LBLK_32 to ino_bits <= 32 fscrypt: drop unused inode argument from fscrypt_fname_alloc_buffer	2020-10-13 08:54:00 -07:00
Chao Yu	d662fad143	f2fs: fix to set SBI_NEED_FSCK flag for inconsistent inode If compressed inode has inconsistent fields on i_compress_algorithm, i_compr_blocks and i_log_cluster_size, we missed to set SBI_NEED_FSCK to notice fsck to repair the inode, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-10-09 10:29:31 -07:00
Eric Biggers	f6322f3f12	f2fs: reject CASEFOLD inode flag without casefold feature syzbot reported: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 0 PID: 6860 Comm: syz-executor835 Not tainted 5.9.0-rc8-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:utf8_casefold+0x43/0x1b0 fs/unicode/utf8-core.c:107 [...] Call Trace: f2fs_init_casefolded_name fs/f2fs/dir.c:85 [inline] __f2fs_setup_filename fs/f2fs/dir.c:118 [inline] f2fs_prepare_lookup+0x3bf/0x640 fs/f2fs/dir.c:163 f2fs_lookup+0x10d/0x920 fs/f2fs/namei.c:494 __lookup_hash+0x115/0x240 fs/namei.c:1445 filename_create+0x14b/0x630 fs/namei.c:3467 user_path_create fs/namei.c:3524 [inline] do_mkdirat+0x56/0x310 fs/namei.c:3664 do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 [...] The problem is that an inode has F2FS_CASEFOLD_FL set, but the filesystem doesn't have the casefold feature flag set, and therefore super_block::s_encoding is NULL. Fix this by making sanity_check_inode() reject inodes that have F2FS_CASEFOLD_FL when the filesystem doesn't have the casefold feature. Reported-by: syzbot+05139c4039d0679e19ff@syzkaller.appspotmail.com Fixes: `2c2eb7a300` ("f2fs: Support case-insensitive file name lookups") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-10-08 21:24:40 -07:00
Jaegeuk Kim	48046cb55d	f2fs: fix memory alignment to support 32bit In 32bit system, 64-bits key breaks memory alignment. This fixes the commit "f2fs: support 64-bits key in f2fs rb-tree node entry". Reported-by: Nicolas Chauvet <kwizart@gmail.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-10-08 21:24:40 -07:00
Jaegeuk Kim	adfc694330	f2fs: fix slab leak of rpages pointer This fixes the below mem leak. [ 130.157600] ============================================================================= [ 130.159662] BUG f2fs_page_array_entry-252:16 (Tainted: G W O ): Objects remaining in f2fs_page_array_entry-252:16 on __kmem_cache_shutdown() [ 130.162742] ----------------------------------------------------------------------------- [ 130.162742] [ 130.164979] Disabling lock debugging due to kernel taint [ 130.166188] INFO: Slab 0x000000009f5a52d2 objects=22 used=4 fp=0x00000000ba72c3e9 flags=0xfffffc0010200 [ 130.168269] CPU: 7 PID: 3560 Comm: umount Tainted: G B W O 5.9.0-rc4+ #35 [ 130.170019] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 130.171941] Call Trace: [ 130.172528] dump_stack+0x74/0x9a [ 130.173298] slab_err+0xb7/0xdc [ 130.174044] ? kernel_poison_pages+0xc0/0xc0 [ 130.175065] ? on_each_cpu_cond_mask+0x48/0x90 [ 130.176096] __kmem_cache_shutdown.cold+0x34/0x141 [ 130.177190] kmem_cache_destroy+0x59/0x100 [ 130.178223] f2fs_destroy_page_array_cache+0x15/0x20 [f2fs] [ 130.179527] f2fs_put_super+0x1bc/0x380 [f2fs] [ 130.180538] generic_shutdown_super+0x72/0x110 [ 130.181547] kill_block_super+0x27/0x50 [ 130.182438] kill_f2fs_super+0x76/0xe0 [f2fs] [ 130.183448] deactivate_locked_super+0x3b/0x80 [ 130.184456] deactivate_super+0x3e/0x50 [ 130.185363] cleanup_mnt+0x109/0x160 [ 130.186179] __cleanup_mnt+0x12/0x20 [ 130.187003] task_work_run+0x70/0xb0 [ 130.187841] exit_to_user_mode_prepare+0x18f/0x1b0 [ 130.188917] syscall_exit_to_user_mode+0x31/0x170 [ 130.189989] do_syscall_64+0x45/0x90 [ 130.190828] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 130.191986] RIP: 0033:0x7faf868ea2eb [ 130.192815] Code: 7b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 7b 0c 00 f7 d8 64 89 01 [ 130.196872] RSP: 002b:00007fffb7edb478 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 130.198494] RAX: 0000000000000000 RBX: 00007faf86a18204 RCX: 00007faf868ea2eb [ 130.201021] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055971df71c50 [ 130.203415] RBP: 000055971df71a40 R08: 0000000000000000 R09: 00007fffb7eda1f0 [ 130.205772] R10: 00007faf86a04339 R11: 0000000000000246 R12: 000055971df71c50 [ 130.208150] R13: 0000000000000000 R14: 000055971df71b38 R15: 0000000000000000 [ 130.210515] INFO: Object 0x00000000a980843a @offset=744 [ 130.212476] INFO: Allocated in page_array_alloc+0x3d/0xe0 [f2fs] age=1572 cpu=0 pid=3297 [ 130.215030] __slab_alloc+0x20/0x40 [ 130.216566] kmem_cache_alloc+0x2a0/0x2e0 [ 130.218217] page_array_alloc+0x3d/0xe0 [f2fs] [ 130.219940] f2fs_init_compress_ctx+0x1f/0x40 [f2fs] [ 130.221736] f2fs_write_cache_pages+0x3db/0x860 [f2fs] [ 130.223591] f2fs_write_data_pages+0x2c9/0x300 [f2fs] [ 130.225414] do_writepages+0x43/0xd0 [ 130.226907] __filemap_fdatawrite_range+0xd5/0x110 [ 130.228632] filemap_write_and_wait_range+0x48/0xb0 [ 130.230336] __generic_file_write_iter+0x18a/0x1d0 [ 130.232035] f2fs_file_write_iter+0x226/0x550 [f2fs] [ 130.233737] new_sync_write+0x113/0x1a0 [ 130.235204] vfs_write+0x1a6/0x200 [ 130.236579] ksys_write+0x67/0xe0 [ 130.237898] __x64_sys_write+0x1a/0x20 [ 130.239309] do_syscall_64+0x38/0x90 Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-29 09:16:36 -07:00
Chao Yu	519a5a2f37	f2fs: compress: fix to disallow enabling compress on non-empty file Compressed inode and normal inode has different layout, so we should disallow enabling compress on non-empty file to avoid race condition during inode .i_addr array parsing and updating. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: Fix missing condition] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-29 09:16:36 -07:00
Chao Yu	c68d6c8830	f2fs: compress: introduce cic/dic slab cache Add two slab caches: "f2fs_cic_entry" and "f2fs_dic_entry" for memory allocation of compress_io_ctx and decompress_io_ctx structure. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-29 09:16:36 -07:00
Chao Yu	3108303170	f2fs: compress: introduce page array slab cache Add a per-sbi slab cache "f2fs_page_array_entry-%u:%u" for memory allocation of page pointer array in compress context. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: Fix wrong memory allocation] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-29 09:16:32 -07:00
Chao Yu	3a22e9ac71	f2fs: fix to do sanity check on segment/section count As syzbot reported: BUG: KASAN: slab-out-of-bounds in init_min_max_mtime fs/f2fs/segment.c:4710 [inline] BUG: KASAN: slab-out-of-bounds in f2fs_build_segment_manager+0x9302/0xa6d0 fs/f2fs/segment.c:4792 Read of size 8 at addr ffff8880a1b934a8 by task syz-executor682/6878 CPU: 1 PID: 6878 Comm: syz-executor682 Not tainted 5.9.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 init_min_max_mtime fs/f2fs/segment.c:4710 [inline] f2fs_build_segment_manager+0x9302/0xa6d0 fs/f2fs/segment.c:4792 f2fs_fill_super+0x381a/0x6e80 fs/f2fs/super.c:3633 mount_bdev+0x32e/0x3f0 fs/super.c:1417 legacy_get_tree+0x105/0x220 fs/fs_context.c:592 vfs_get_tree+0x89/0x2f0 fs/super.c:1547 do_new_mount fs/namespace.c:2875 [inline] path_mount+0x1387/0x20a0 fs/namespace.c:3192 do_mount fs/namespace.c:3205 [inline] __do_sys_mount fs/namespace.c:3413 [inline] __se_sys_mount fs/namespace.c:3390 [inline] __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The root cause is: if segs_per_sec is larger than one, and segment count in last section is less than segs_per_sec, we will suffer out-of-boundary memory access on sit_i->sentries[] in init_min_max_mtime(). Fix this by adding sanity check among segment count, section count and segs_per_sec value in sanity_check_raw_super(). Reported-by: syzbot+481a3ffab50fed41dcc0@syzkaller.appspotmail.com Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-29 02:12:41 -07:00
Chao Yu	6a257471fa	f2fs: fix to check segment boundary during SIT page readahead As syzbot reported: kernel BUG at fs/f2fs/segment.h:657! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 16220 Comm: syz-executor.0 Not tainted 5.9.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:f2fs_ra_meta_pages+0xa51/0xdc0 fs/f2fs/segment.h:657 Call Trace: build_sit_entries fs/f2fs/segment.c:4195 [inline] f2fs_build_segment_manager+0x4b8a/0xa3c0 fs/f2fs/segment.c:4779 f2fs_fill_super+0x377d/0x6b80 fs/f2fs/super.c:3633 mount_bdev+0x32e/0x3f0 fs/super.c:1417 legacy_get_tree+0x105/0x220 fs/fs_context.c:592 vfs_get_tree+0x89/0x2f0 fs/super.c:1547 do_new_mount fs/namespace.c:2875 [inline] path_mount+0x1387/0x2070 fs/namespace.c:3192 do_mount fs/namespace.c:3205 [inline] __do_sys_mount fs/namespace.c:3413 [inline] __se_sys_mount fs/namespace.c:3390 [inline] __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 @blkno in f2fs_ra_meta_pages could exceed max segment count, causing panic in following sanity check in current_sit_addr(), add check condition to avoid this issue. Reported-by: syzbot+3698081bcf0bb2d12174@syzkaller.appspotmail.com Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-29 02:12:41 -07:00
Chao Yu	6d7ab88a98	f2fs: fix uninit-value in f2fs_lookup As syzbot reported: Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x21c/0x280 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:122 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:219 f2fs_lookup+0xe05/0x1a80 fs/f2fs/namei.c:503 lookup_open fs/namei.c:3082 [inline] open_last_lookups fs/namei.c:3177 [inline] path_openat+0x2729/0x6a90 fs/namei.c:3365 do_filp_open+0x2b8/0x710 fs/namei.c:3395 do_sys_openat2+0xa88/0x1140 fs/open.c:1168 do_sys_open fs/open.c:1184 [inline] __do_compat_sys_openat fs/open.c:1242 [inline] __se_compat_sys_openat+0x2a4/0x310 fs/open.c:1240 __ia32_compat_sys_openat+0x56/0x70 fs/open.c:1240 do_syscall_32_irqs_on arch/x86/entry/common.c:80 [inline] __do_fast_syscall_32+0x129/0x180 arch/x86/entry/common.c:139 do_fast_syscall_32+0x6a/0xc0 arch/x86/entry/common.c:162 do_SYSENTER_32+0x73/0x90 arch/x86/entry/common.c:205 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c In f2fs_lookup(), @res_page could be used before being initialized, because in __f2fs_find_entry(), once F2FS_I(dir)->i_current_depth was been fuzzed to zero, then @res_page will never be initialized, causing this kmsan warning, relocating @res_page initialization place to fix this bug. Reported-by: syzbot+0eac6f0bbd558fd866d7@syzkaller.appspotmail.com Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-29 02:12:41 -07:00
Chao Yu	17f930e0a6	f2fs: remove unneeded parameter in find_in_block() We can relocate @res_page assignment in find_in_block() to its caller, so unneeded parameter could be removed for cleanup. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-29 01:48:34 -07:00
Wang Xiaojun	f99ba9add6	f2fs: fix wrong total_sections check and fsmeta check Meta area is not included in section_count computation. So the minimum number of total_sections is 1 meanwhile it cannot be greater than segment_count_main. The minimum number of meta segments is 8 (SB + 2 (CP + SIT + NAT) + SSA). Signed-off-by: Wang Xiaojun <wangxiaojun11@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-29 01:48:34 -07:00
Wang Xiaojun	d89f589130	f2fs: remove duplicated code in sanity_check_area_boundary Use seg_end_blkaddr instead of "segment0_blkaddr + (segment_count << log_blocks_per_seg)". Signed-off-by: Wang Xiaojun <wangxiaojun11@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-29 01:48:34 -07:00
Wang Xiaojun	e6e421870b	f2fs: remove unused check on version_bitmap A NULL will not be return by __bitmap_ptr here. Remove the unused check. Signed-off-by: Wang Xiaojun <wangxiaojun11@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-29 01:48:33 -07:00
Chao Yu	d0660122dc	f2fs: relocate blkzoned feature check Relocate blkzoned feature check into parse_options() like other feature check. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-29 01:48:33 -07:00
Chao Yu	07eb1d6994	f2fs: do sanity check on zoned block device path sbi->devs would be initialized only if image enables multiple device feature or blkzoned feature, if blkzoned feature flag was set by fuzz in non-blkzoned device, we will suffer below panic: get_zone_idx fs/f2fs/segment.c:4892 [inline] f2fs_usable_zone_blks_in_seg fs/f2fs/segment.c:4943 [inline] f2fs_usable_blks_in_seg+0x39b/0xa00 fs/f2fs/segment.c:4999 Call Trace: check_block_count+0x69/0x4e0 fs/f2fs/segment.h:704 build_sit_entries fs/f2fs/segment.c:4403 [inline] f2fs_build_segment_manager+0x51da/0xa370 fs/f2fs/segment.c:5100 f2fs_fill_super+0x3880/0x6ff0 fs/f2fs/super.c:3684 mount_bdev+0x32e/0x3f0 fs/super.c:1417 legacy_get_tree+0x105/0x220 fs/fs_context.c:592 vfs_get_tree+0x89/0x2f0 fs/super.c:1547 do_new_mount fs/namespace.c:2896 [inline] path_mount+0x12ae/0x1e70 fs/namespace.c:3216 do_mount fs/namespace.c:3229 [inline] __do_sys_mount fs/namespace.c:3437 [inline] __se_sys_mount fs/namespace.c:3414 [inline] __x64_sys_mount+0x27f/0x300 fs/namespace.c:3414 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 Add sanity check to inconsistency on factors: blkzoned flag, device path and device character to avoid above panic. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-29 01:48:33 -07:00
Zhang Qilong	9b66482282	f2fs: add trace exit in exception path Missing the trace exit in f2fs_sync_dirty_inodes Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-29 01:48:33 -07:00
Xiaojun Wang	4470eb2873	f2fs: change return value of reserved_segments to unsigned int The type of SM_I(sbi)->reserved_segments is unsigned int, so change the return value to unsigned int. The type cast can be removed in reserved_sections as a result. Signed-off-by: Xiaojun Wang <wangxiaojun11@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-29 01:48:33 -07:00
Eric Biggers	70fb2612aa	fscrypt: don't call no-key names "ciphertext names" Currently we're using the term "ciphertext name" ambiguously because it can mean either the actual ciphertext filename, or the encoded filename that is shown when an encrypted directory is listed without its key. The latter we're now usually calling the "no-key name"; and while it's derived from the ciphertext name, it's not the same thing. To avoid this ambiguity, rename fscrypt_name::is_ciphertext_name to fscrypt_name::is_nokey_name, and update comments that say "ciphertext name" (or "encrypted name") to say "no-key name" instead when warranted. Link: https://lore.kernel.org/r/20200924042624.98439-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-09-23 21:29:49 -07:00
Eric Biggers	c8c868abc9	fscrypt: make fscrypt_set_test_dummy_encryption() take a 'const char ' fscrypt_set_test_dummy_encryption() requires that the optional argument to the test_dummy_encryption mount option be specified as a substring_t. That doesn't work well with filesystems that use the new mount API, since the new way of parsing mount options doesn't use substring_t. Make it take the argument as a 'const char ' instead. Instead of moving the match_strdup() into the callers in ext4 and f2fs, make them just use arg->from directly. Since the pattern is "test_dummy_encryption=%s", the argument will be null-terminated. Acked-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20200917041136.178600-14-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-09-22 06:48:52 -07:00
Eric Biggers	ac4acb1f4b	fscrypt: handle test_dummy_encryption in more logical way The behavior of the test_dummy_encryption mount option is that when a new file (or directory or symlink) is created in an unencrypted directory, it's automatically encrypted using a dummy encryption policy. That's it; in particular, the encryption (or lack thereof) of existing files (or directories or symlinks) doesn't change. Unfortunately the implementation of test_dummy_encryption is a bit weird and confusing. When test_dummy_encryption is enabled and a file is being created in an unencrypted directory, we set up an encryption key (->i_crypt_info) for the directory. This isn't actually used to do any encryption, however, since the directory is still unencrypted! Instead, ->i_crypt_info is only used for inheriting the encryption policy. One consequence of this is that the filesystem ends up providing a "dummy context" (policy + nonce) instead of a "dummy policy". In commit `ed318a6cc0` ("fscrypt: support test_dummy_encryption=v2"), I mistakenly thought this was required. However, actually the nonce only ends up being used to derive a key that is never used. Another consequence of this implementation is that it allows for 'inode->i_crypt_info != NULL && !IS_ENCRYPTED(inode)', which is an edge case that can be forgotten about. For example, currently FS_IOC_GET_ENCRYPTION_POLICY on an unencrypted directory may return the dummy encryption policy when the filesystem is mounted with test_dummy_encryption. That seems like the wrong thing to do, since again, the directory itself is not actually encrypted. Therefore, switch to a more logical and maintainable implementation where the dummy encryption policy inheritance is done without setting up keys for unencrypted directories. This involves: - Adding a function fscrypt_policy_to_inherit() which returns the encryption policy to inherit from a directory. This can be a real policy, a dummy policy, or no policy. - Replacing struct fscrypt_dummy_context, ->get_dummy_context(), etc. with struct fscrypt_dummy_policy, ->get_dummy_policy(), etc. - Making fscrypt_fname_encrypted_size() take an fscrypt_policy instead of an inode. Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Acked-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20200917041136.178600-13-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-09-22 06:48:49 -07:00
Eric Biggers	e075b69010	f2fs: use fscrypt_prepare_new_inode() and fscrypt_set_context() Convert f2fs to use the new functions fscrypt_prepare_new_inode() and fscrypt_set_context(). This avoids calling fscrypt_get_encryption_info() from under f2fs_lock_op(), which can deadlock because fscrypt_get_encryption_info() isn't GFP_NOFS-safe. For more details about this problem, see the earlier patch "fscrypt: add fscrypt_prepare_new_inode() and fscrypt_set_context()". This also fixes a f2fs-specific deadlock when the filesystem is mounted with '-o test_dummy_encryption' and a file is created in an unencrypted directory other than the root directory: INFO: task touch:207 blocked for more than 30 seconds. Not tainted 5.9.0-rc4-00099-g729e3d0919844 #2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:touch state:D stack: 0 pid: 207 ppid: 167 flags:0x00000000 Call Trace: [...] lock_page include/linux/pagemap.h:548 [inline] pagecache_get_page+0x25e/0x310 mm/filemap.c:1682 find_or_create_page include/linux/pagemap.h:348 [inline] grab_cache_page include/linux/pagemap.h:424 [inline] f2fs_grab_cache_page fs/f2fs/f2fs.h:2395 [inline] f2fs_grab_cache_page fs/f2fs/f2fs.h:2373 [inline] __get_node_page.part.0+0x39/0x2d0 fs/f2fs/node.c:1350 __get_node_page fs/f2fs/node.c:35 [inline] f2fs_get_node_page+0x2e/0x60 fs/f2fs/node.c:1399 read_inline_xattr+0x88/0x140 fs/f2fs/xattr.c:288 lookup_all_xattrs+0x1f9/0x2c0 fs/f2fs/xattr.c:344 f2fs_getxattr+0x9b/0x160 fs/f2fs/xattr.c:532 f2fs_get_context+0x1e/0x20 fs/f2fs/super.c:2460 fscrypt_get_encryption_info+0x9b/0x450 fs/crypto/keysetup.c:472 fscrypt_inherit_context+0x2f/0xb0 fs/crypto/policy.c:640 f2fs_init_inode_metadata+0xab/0x340 fs/f2fs/dir.c:540 f2fs_add_inline_entry+0x145/0x390 fs/f2fs/inline.c:621 f2fs_add_dentry+0x31/0x80 fs/f2fs/dir.c:757 f2fs_do_add_link+0xcd/0x130 fs/f2fs/dir.c:798 f2fs_add_link fs/f2fs/f2fs.h:3234 [inline] f2fs_create+0x104/0x290 fs/f2fs/namei.c:344 lookup_open.isra.0+0x2de/0x500 fs/namei.c:3103 open_last_lookups+0xa9/0x340 fs/namei.c:3177 path_openat+0x8f/0x1b0 fs/namei.c:3365 do_filp_open+0x87/0x130 fs/namei.c:3395 do_sys_openat2+0x96/0x150 fs/open.c:1168 [...] That happened because f2fs_add_inline_entry() locks the directory inode's page in order to add the dentry, then f2fs_get_context() tries to lock it recursively in order to read the encryption xattr. This problem is specific to "test_dummy_encryption" because normally the directory's fscrypt_info would be set up prior to f2fs_add_inline_entry() in order to encrypt the new filename. Regardless, the new design fixes this test_dummy_encryption deadlock as well as potential deadlocks with fs reclaim, by setting up any needed fscrypt_info structs prior to taking so many locks. The test_dummy_encryption deadlock was reported by Daniel Rosenberg. Reported-by: Daniel Rosenberg <drosen@google.com> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Link: https://lore.kernel.org/r/20200917041136.178600-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-09-22 06:48:35 -07:00
Al Viro	6d1349c769	[PATCH] reduce boilerplate in fsid handling Get rid of boilerplate in most of ->statfs() instances... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-09-18 16:45:50 -04:00
Chao Yu	c8eb702484	f2fs: clean up kvfree After commit `0b6d4ca04a` ("f2fs: don't return vmalloc() memory from f2fs_kmalloc()"), f2fs_k{m,z}alloc() will not return vmalloc()'ed memory, so clean up to use kfree() instead of kvfree() to free vmalloc()'ed memory. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-14 11:15:37 -07:00
Daeho Jeong	6fcaebac66	f2fs: change virtual mapping way for compression pages By profiling f2fs compression works, I've found vmap() callings have unexpected hikes in the execution time in our test environment and those are bottlenecks of f2fs decompression path. Changing these with vm_map_ram(), we can enhance f2fs decompression speed pretty much. [Verification] Android Pixel 3(ARM64, 6GB RAM, 128GB UFS) Turned on only 0-3 little cores(at 1.785GHz) dd if=/dev/zero of=dummy bs=1m count=1000 echo 3 > /proc/sys/vm/drop_caches dd if=dummy of=/dev/zero bs=512k - w/o compression - 1048576000 bytes (0.9 G) copied, 2.082554 s, 480 M/s 1048576000 bytes (0.9 G) copied, 2.081634 s, 480 M/s 1048576000 bytes (0.9 G) copied, 2.090861 s, 478 M/s - before patch - 1048576000 bytes (0.9 G) copied, 7.407527 s, 135 M/s 1048576000 bytes (0.9 G) copied, 7.283734 s, 137 M/s 1048576000 bytes (0.9 G) copied, 7.291508 s, 137 M/s - after patch - 1048576000 bytes (0.9 G) copied, 1.998959 s, 500 M/s 1048576000 bytes (0.9 G) copied, 1.987554 s, 503 M/s 1048576000 bytes (0.9 G) copied, 1.986380 s, 503 M/s Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-11 11:11:26 -07:00
Daeho Jeong	78134d0351	f2fs: change return value of f2fs_disable_compressed_file to bool The returned integer is not required anywhere. So we need to change the return value to bool type. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-11 11:11:26 -07:00
Daeho Jeong	c2759ebaf7	f2fs: change i_compr_blocks of inode to atomic value writepages() can be concurrently invoked for the same file by different threads such as a thread fsyncing the file and a kworker kernel thread. So, changing i_compr_blocks without protection is racy and we need to protect it by changing it with atomic type value. Plus, we don't need a 64bit value for i_compr_blocks, so just we will use a atomic value, not atomic64. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-11 11:11:26 -07:00
Chao Yu	69c0dd29f7	f2fs: ignore compress mount option on image w/o compression feature to keep consistent with behavior when passing compress mount option to kernel w/o compression feature, so that mount may not fail on such condition. Reported-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-11 11:11:25 -07:00
Chao Yu	0e2b7385cb	f2fs: allocate proper size memory for zstd decompress As 5kft <5kft@5kft.org> reported: kworker/u9:3: page allocation failure: order:9, mode:0x40c40(GFP_NOFS\|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 3 PID: 8168 Comm: kworker/u9:3 Tainted: G C 5.8.3-sunxi #trunk Hardware name: Allwinner sun8i Family Workqueue: f2fs_post_read_wq f2fs_post_read_work [<c010d6d5>] (unwind_backtrace) from [<c0109a55>] (show_stack+0x11/0x14) [<c0109a55>] (show_stack) from [<c056d489>] (dump_stack+0x75/0x84) [<c056d489>] (dump_stack) from [<c0243b53>] (warn_alloc+0xa3/0x104) [<c0243b53>] (warn_alloc) from [<c024473b>] (__alloc_pages_nodemask+0xb87/0xc40) [<c024473b>] (__alloc_pages_nodemask) from [<c02267c5>] (kmalloc_order+0x19/0x38) [<c02267c5>] (kmalloc_order) from [<c02267fd>] (kmalloc_order_trace+0x19/0x90) [<c02267fd>] (kmalloc_order_trace) from [<c047c665>] (zstd_init_decompress_ctx+0x21/0x88) [<c047c665>] (zstd_init_decompress_ctx) from [<c047e9cf>] (f2fs_decompress_pages+0x97/0x228) [<c047e9cf>] (f2fs_decompress_pages) from [<c045d0ab>] (__read_end_io+0xfb/0x130) [<c045d0ab>] (__read_end_io) from [<c045d141>] (f2fs_post_read_work+0x61/0x84) [<c045d141>] (f2fs_post_read_work) from [<c0130b2f>] (process_one_work+0x15f/0x3b0) [<c0130b2f>] (process_one_work) from [<c0130e7b>] (worker_thread+0xfb/0x3e0) [<c0130e7b>] (worker_thread) from [<c0135c3b>] (kthread+0xeb/0x10c) [<c0135c3b>] (kthread) from [<c0100159>] zstd may allocate large size memory for {,de}compression, it may cause file copy failure on low-end device which has very few memory. For decompression, let's just allocate proper size memory based on current file's cluster size instead of max cluster size. Reported-by: 5kft <5kft@5kft.org> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-11 11:11:25 -07:00
Daeho Jeong	ae999bb9a3	f2fs: change compr_blocks of superblock info to 64bit Current compr_blocks of superblock info is not 64bit value. We are accumulating each i_compr_blocks count of inodes to this value and those are 64bit values. So, need to change this to 64bit value. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-11 11:11:25 -07:00
Daeho Jeong	4eda1682cd	f2fs: add block address limit check to compressed file Need to add block address range check to compressed file case and avoid calling get_data_block_bmap() for compressed file. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-11 11:11:25 -07:00
Dan Robertson	aad1383cbf	f2fs: check position in move range ioctl When the move range ioctl is used, check the input and output position and ensure that it is a non-negative value. Without this check f2fs_get_dnode_of_data may hit a memmory bug. Signed-off-by: Dan Robertson <dan@dlrobertson.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-11 11:11:25 -07:00
Jack Qiu	335cac8b25	f2fs: correct statistic of APP_DIRECT_IO/APP_DIRECT_READ_IO Miss to update APP_DIRECT_IO/APP_DIRECT_READ_IO when receiving async DIO. For example: fio -filename=/data/test.0 -bs=1m -ioengine=libaio -direct=1 -name=fill -size=10m -numjobs=1 -iodepth=32 -rw=write Signed-off-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-11 11:11:24 -07:00
Matthew Wilcox (Oracle)	4cb03fecd3	f2fs: Simplify SEEK_DATA implementation Instead of finding the first dirty page and then seeing if it matches the index of a block that is NEW_ADDR, delay the lookup of the dirty bit until we've actually found a block that's NEW_ADDR. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-11 11:11:24 -07:00
Chao Yu	093749e296	f2fs: support age threshold based garbage collection There are several issues in current background GC algorithm: - valid blocks is one of key factors during cost overhead calculation, so if segment has less valid block, however even its age is young or it locates hot segment, CB algorithm will still choose the segment as victim, it's not appropriate. - GCed data/node will go to existing logs, no matter in-there datas' update frequency is the same or not, it may mix hot and cold data again. - GC alloctor mainly use LFS type segment, it will cost free segment more quickly. This patch introduces a new algorithm named age threshold based garbage collection to solve above issues, there are three steps mainly: 1. select a source victim: - set an age threshold, and select candidates beased threshold: e.g. 0 means youngest, 100 means oldest, if we set age threshold to 80 then select dirty segments which has age in range of [80, 100] as candiddates; - set candidate_ratio threshold, and select candidates based the ratio, so that we can shrink candidates to those oldest segments; - select target segment with fewest valid blocks in order to migrate blocks with minimum cost; 2. select a target victim: - select candidates beased age threshold; - set candidate_radius threshold, search candidates whose age is around source victims, searching radius should less than the radius threshold. - select target segment with most valid blocks in order to avoid migrating current target segment. 3. merge valid blocks from source victim into target victim with SSR alloctor. Test steps: - create 160 dirty segments: * half of them have 128 valid blocks per segment * left of them have 384 valid blocks per segment - run background GC Benefit: GC count and block movement count both decrease obviously: - Before: - Valid: 86 - Dirty: 1 - Prefree: 11 - Free: 6001 (6001) GC calls: 162 (BG: 220) - data segments : 160 (160) - node segments : 2 (2) Try to move 41454 blocks (BG: 41454) - data blocks : 40960 (40960) - node blocks : 494 (494) IPU: 0 blocks SSR: 0 blocks in 0 segments LFS: 41364 blocks in 81 segments - After: - Valid: 87 - Dirty: 0 - Prefree: 4 - Free: 6008 (6008) GC calls: 75 (BG: 76) - data segments : 74 (74) - node segments : 1 (1) Try to move 12813 blocks (BG: 12813) - data blocks : 12544 (12544) - node blocks : 269 (269) IPU: 0 blocks SSR: 12032 blocks in 77 segments LFS: 855 blocks in 2 segments Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-11 11:11:15 -07:00
Daniel Rosenberg	eca4873ee1	f2fs: Use generic casefolding support This switches f2fs over to the generic support provided in the previous patch. Since casefolded dentries behave the same in ext4 and f2fs, we decrease the maintenance burden by unifying them, and any optimizations will immediately apply to both. Signed-off-by: Daniel Rosenberg <drosen@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-10 14:03:31 -07:00
Chao Yu	e6c3948de2	f2fs: compress: use more readable atomic_t type for {cic,dic}.ref refcount_t type variable should never be less than one, so it's a little bit hard to understand when we use it to indicate pending compressed page count, let's change to use atomic_t for better readability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-10 14:03:31 -07:00
Chao Yu	17d7648d9c	f2fs: fix compile warning This patch fixes below compile warning reported by LKP (kernel test robot) cppcheck warnings: (new ones prefixed by >>) >> fs/f2fs/file.c:761:9: warning: Identical condition 'err', second condition is always false [identicalConditionAfterEarlyExit] return err; ^ fs/f2fs/file.c:753:6: note: first condition if (err) ^ fs/f2fs/file.c:761:9: note: second condition return err; Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-10 14:03:30 -07:00
Chao Yu	2e9b2bb250	f2fs: support 64-bits key in f2fs rb-tree node entry then, we can add specified entry into rb-tree with 64-bits segment time as key. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-10 14:03:30 -07:00
Chao Yu	c5d02785c5	f2fs: inherit mtime of original block during GC Don't let f2fs inner GC ruins original aging degree of segment. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-10 14:03:30 -07:00
Chao Yu	6f3a01ae9b	f2fs: record average update time of segment Previously, once we update one block in segment, we will update mtime of segment to last time, making aged segment becoming freshest, result in that GC with cost benefit algorithm missing such segment, So this patch changes to record mtime as average block updating time instead of last updating time. It's not needed to reset mtime for prefree segment, as se->valid_blocks is zero, then old se->mtime won't take any weight with below calculation: se->mtime = div_u64(se->mtime * se->valid_blocks + mtime, se->valid_blocks + 1); Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-10 14:03:30 -07:00
Chao Yu	d0b9e42ab6	f2fs: introduce inmem curseg Previous implementation of aligned pinfile allocation will: - allocate new segment on cold data log no matter whether last used segment is partially used or not, it makes IOs more random; - force concurrent cold data/GCed IO going into warm data area, it can make a bad effect on hot/cold data separation; In this patch, we introduce a new type of log named 'inmem curseg', the differents from normal curseg is: - it reuses existed segment type (CURSEG_XXX_NODE/DATA); - it only exists in memory, its segno, blkofs, summary will not b persisted into checkpoint area; With this new feature, we can enhance scalability of log, special allocators can be created for purposes: - pure lfs allocator for aligned pinfile allocation or file defragmentation - pure ssr allocator for later feature So that, let's update aligned pinfile allocation to use this new inmem curseg fwk. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-10 14:03:30 -07:00
Chao Yu	376207af4b	f2fs: compress: remove unneeded code - f2fs_write_multi_pages - f2fs_compress_pages - init_compress_ctx - compress_pages - destroy_compress_ctx --- 1 - f2fs_write_compressed_pages - destroy_compress_ctx --- 2 destroy_compress_ctx() in f2fs_write_multi_pages() is redundant, remove it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-10 14:03:30 -07:00
Xiaojun Wang	e90027d23a	f2fs: remove duplicated type casting Since DUMMY_WRITTEN_PAGE and ATOMIC_WRITTEN_PAGE have already been converted as unsigned long type, we don't need do type casting again. Signed-off-by: Xiaojun Wang <wangxiaojun11@huawei.com> Reported-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-10 14:03:29 -07:00
Aravind Ramesh	de881df977	f2fs: support zone capacity less than zone size NVMe Zoned Namespace devices can have zone-capacity less than zone-size. Zone-capacity indicates the maximum number of sectors that are usable in a zone beginning from the first sector of the zone. This makes the sectors sectors after the zone-capacity till zone-size to be unusable. This patch set tracks zone-size and zone-capacity in zoned devices and calculate the usable blocks per segment and usable segments per section. If zone-capacity is less than zone-size mark only those segments which start before zone-capacity as free segments. All segments at and beyond zone-capacity are treated as permanently used segments. In cases where zone-capacity does not align with segment size the last segment will start before zone-capacity and end beyond the zone-capacity of the zone. For such spanning segments only sectors within the zone-capacity are used. During writes and GC manage the usable segments in a section and usable blocks per segment. Segments which are beyond zone-capacity are never allocated, and do not need to be garbage collected, only the segments which are before zone-capacity needs to garbage collected. For spanning segments based on the number of usable blocks in that segment, write to blocks only up to zone-capacity. Zone-capacity is device specific and cannot be configured by the user. Since NVMe ZNS device zones are sequentially write only, a block device with conventional zones or any normal block device is needed along with the ZNS device for the metadata operations of F2fs. A typical nvme-cli output of a zoned device shows zone start and capacity and write pointer as below: SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones are in EMPTY state. For each zone, only zone start + 49MB is usable area, any lba/sector after 49MB cannot be read or written to, the drive will fail any attempts to read/write. So, the second zone starts at 64MB and is usable till 113MB (64 + 49) and the range between 113 and 128MB is again unusable. The next zone starts at 128MB, and so on. Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-10 14:03:29 -07:00
Gabriel Krisman Bertazi	20d0a107fb	f2fs: Return EOF on unaligned end of file DIO read Reading past end of file returns EOF for aligned reads but -EINVAL for unaligned reads on f2fs. While documentation is not strict about this corner case, most filesystem returns EOF on this case, like iomap filesystems. This patch consolidates the behavior for f2fs, by making it return EOF(0). it can be verified by a read loop on a file that does a partial read before EOF (A file that doesn't end at an aligned address). The following code fails on an unaligned file on f2fs, but not on btrfs, ext4, and xfs. while (done < total) { ssize_t delta = pread(fd, buf + done, total - done, off + done); if (!delta) break; ... } It is arguable whether filesystems should actually return EOF or -EINVAL, but since iomap filesystems support it, and so does the original DIO code, it seems reasonable to consolidate on that. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-08 20:31:33 -07:00
Sahitya Tummala	e2cab031ba	f2fs: fix indefinite loop scanning for free nid If the sbi->ckpt->next_free_nid is not NAT block aligned and if there are free nids in that NAT block between the start of the block and next_free_nid, then those free nids will not be scanned in scan_nat_page(). This results into mismatch between nm_i->available_nids and the sum of nm_i->free_nid_count of all NAT blocks scanned. And nm_i->available_nids will always be greater than the sum of free nids in all the blocks. Under this condition, if we use all the currently scanned free nids, then it will loop forever in f2fs_alloc_nid() as nm_i->available_nids is still not zero but nm_i->free_nid_count of that partially scanned NAT block is zero. Fix this to align the nm_i->next_scan_nid to the first nid of the corresponding NAT block. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-08 20:31:33 -07:00
Shin'ichiro Kawasaki	123aaf774f	f2fs: Fix type of section block count variables Commit `da52f8ade4` ("f2fs: get the right gc victim section when section has several segments") added code to count blocks of each section using variables with type 'unsigned short', which has 2 bytes size in many systems. However, the counts can be larger than the 2 bytes range and type conversion results in wrong values. Especially when the f2fs sections have blocks as many as USHRT_MAX + 1, the count is handled as 0. This triggers eternal loop in init_dirty_segmap() at mount system call. Fix this by changing the type of the variables to block_t. Fixes: `da52f8ade4` ("f2fs: get the right gc victim section when section has several segments") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-08 20:31:33 -07:00
Jeff Layton	8b10fe6898	fscrypt: drop unused inode argument from fscrypt_fname_alloc_buffer Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20200810142139.487631-1-jlayton@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-09-07 15:27:42 -07:00

1 2 3 4 5 ...

3138 Commits