linux

Commit Graph

Author	SHA1	Message	Date
Shuoran Liu	8f1dbbbbdf	f2fs: introduce lifetime write IO statistics This patch introduces lifetime IO write statistics exposed to the sysfs interface. The write IO amount is obtained from block layer, accumulated in the file system and stored in the hot node summary of checkpoint. Signed-off-by: Shuoran Liu <liushuoran@huawei.com> Signed-off-by: Pengyang Hou <houpengyang@huawei.com> [Jaegeuk Kim: add sysfs documentation] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	fec1d6576c	f2fs: use wait_for_stable_page to avoid contention In write_begin, if storage supports stable_page, we don't need to wait for writeback to update its contents. This patch introduces to use wait_for_stable_page instead of wait_on_page_writeback. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	6beceb5427	f2fs: introduce time and interval facility This patch adds time and interval arrays to store some timing variables. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-11 15:36:27 -08:00
Jaegeuk Kim	8d4ea29b64	f2fs: write pending bios when cp_error is set When testing ioc_shutdown, put_super is able to be hanged by waiting for writebacking pages as follows. INFO: task umount:2723 blocked for more than 120 seconds. Tainted: G O 4.4.0-rc3+ #8 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. umount D ffff88000859f9d8 0 2723 2110 0x00000000 ffff88000859f9d8 0000000000000000 0000000000000000 ffffffff81e11540 ffff880078c225c0 ffff8800085a0000 ffff88007fc17440 7fffffffffffffff ffffffff818239f0 ffff88000859fb48 ffff88000859f9f0 ffffffff8182310c Call Trace: [<ffffffff818239f0>] ? bit_wait+0x50/0x50 [<ffffffff8182310c>] schedule+0x3c/0x90 [<ffffffff81827fb9>] schedule_timeout+0x2d9/0x430 [<ffffffff810e0f8f>] ? mark_held_locks+0x6f/0xa0 [<ffffffff8111614d>] ? ktime_get+0x7d/0x140 [<ffffffff818239f0>] ? bit_wait+0x50/0x50 [<ffffffff8106a655>] ? kvm_clock_get_cycles+0x25/0x30 [<ffffffff8111617c>] ? ktime_get+0xac/0x140 [<ffffffff818239f0>] ? bit_wait+0x50/0x50 [<ffffffff81822564>] io_schedule_timeout+0xa4/0x110 [<ffffffff81823a25>] bit_wait_io+0x35/0x50 [<ffffffff818235bd>] __wait_on_bit+0x5d/0x90 [<ffffffff811b9e8b>] wait_on_page_bit+0xcb/0xf0 [<ffffffff810d5f90>] ? autoremove_wake_function+0x40/0x40 [<ffffffff811cf84c>] truncate_inode_pages_range+0x4bc/0x840 [<ffffffff811cfc3d>] truncate_inode_pages_final+0x4d/0x60 [<ffffffffc023ced5>] f2fs_evict_inode+0x75/0x400 [f2fs] [<ffffffff812639bc>] evict+0xbc/0x190 [<ffffffff81263d19>] iput+0x229/0x2c0 [<ffffffffc0241885>] f2fs_put_super+0x105/0x1a0 [f2fs] [<ffffffff8124756a>] generic_shutdown_super+0x6a/0xf0 [<ffffffff812478f7>] kill_block_super+0x27/0x70 [<ffffffffc0241290>] kill_f2fs_super+0x20/0x30 [f2fs] [<ffffffff81247b03>] deactivate_locked_super+0x43/0x70 [<ffffffff81247f4c>] deactivate_super+0x5c/0x60 [<ffffffff81268d2f>] cleanup_mnt+0x3f/0x90 [<ffffffff81268dc2>] __cleanup_mnt+0x12/0x20 [<ffffffff810ac463>] task_work_run+0x73/0xa0 [<ffffffff810032ac>] exit_to_usermode_loop+0xcc/0xd0 [<ffffffff81003e7c>] syscall_return_slowpath+0xcc/0xe0 [<ffffffff81829ea2>] int_ret_from_sys_call+0x25/0x9f Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-31 13:08:02 -08:00
Chao Yu	6d5a1495ee	f2fs: let user being aware of IO error Sometimes we keep dumb when IO error occur in lower layer device, so user will not receive any error return value for some operation, but actually, the operation did not succeed. This sould be avoided, so this patch reports such kind of error to user. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-30 10:14:15 -08:00
Chao Yu	c34f42e2cb	f2fs: report error of do_checkpoint do_checkpoint and write_checkpoint can fail due to reasons like triggering in a readonly fs or encountering IO error of storage device. So it's better to report such error info to user, let user be aware of failure of doing checkpoint. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-30 10:14:09 -08:00
Chao Yu	4cf185379b	f2fs: add a tracepoint for sync_dirty_inodes This patch adds a tracepoint for sync_dirty_inodes. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-17 09:55:27 -08:00
Chao Yu	33fbd5100d	f2fs: stat dirty regular/symlink inodes Add to stat dirty regular and symlink inode for showing in debugfs. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-17 09:53:19 -08:00
Chao Yu	c227f91273	f2fs: record dirty status of regular/symlink inode Maintain regular/symlink inode which has dirty pages in global dirty list and record their total dirty pages count like the way of handling directory inode. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-16 08:58:12 -08:00
Jaegeuk Kim	55d1cdb25a	f2fs: relocate tracepoint of write_checkpoint It needs to relocate its location to see exact trace logs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-16 08:58:06 -08:00
Chao Yu	6ad7609a18	f2fs: introduce __remove_dirty_inode Introduce __remove_dirty_inode to clean up codes in remove_dirty_dir_inode. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-15 13:31:28 -08:00
Chao Yu	2710fd7e00	f2fs: introduce dirty list node in inode info Add a new dirt list node member in inode info for linking the inode to global dirty list in superblock, instead of old implementation which allocate slab cache memory as an entry to inode. It avoids memory pressure due to slab cache allocation, and also makes codes more clean. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-15 13:24:19 -08:00
Chao Yu	a49324f127	f2fs: rename {add,remove,release}_dirty_inode to {add,remove,release}_ino_entry remove_dirty_dir_inode will be renamed to remove_dirty_inode as a generic function in following patch for removing directory/regular/symlink inode in global dirty list. Here rename ino management related functions for readability, also in order to avoid name conflict. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-15 13:23:43 -08:00
Chao Yu	26879fb101	f2fs: support lower priority asynchronous readahead in ra_meta_pages Now, we use ra_meta_pages to reads continuous physical blocks as much as possible to improve performance of following reads. However, ra_meta_pages uses a synchronous readahead approach by submitting bio with READ, as READ is with high priority, it can not be used in the case of preloading blocks, and it's not sure when these RAed pages will be used. This patch supports asynchronous readahead in ra_meta_pages by tagging bio with READA flag in order to allow preloading. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-12 14:03:15 -07:00
Chao Yu	2b947003fa	f2fs: don't tag REQ_META for temporary non-meta pages In recovery or checkpoint flow, we grab pages temperarily in meta inode's mapping for caching temperary data, actually, datas in these pages were not meta data of f2fs, but still we tag them with REQ_META flag. However, lower device like eMMC may do some optimization for data of such type. So in order to avoid wrong optimization, we'd better remove such flag for temperary non-meta pages. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-12 14:01:46 -07:00
Jaegeuk Kim	6066d8cdb6	f2fs: merge meta writes as many possible This patch tries to merge IOs as many as possible when background flusher conducts flushing the dirty meta pages. [Before] ... 2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124320, size = 4096 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124560, size = 32768 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 95720, size = 987136 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123928, size = 4096 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123944, size = 8192 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123968, size = 45056 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124064, size = 4096 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 97648, size = 1007616 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123776, size = 8192 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123800, size = 32768 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124624, size = 4096 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 99616, size = 921600 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123608, size = 4096 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123624, size = 77824 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123792, size = 4096 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123864, size = 32768 ... [After] ... f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 92168, size = 892928 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 93912, size = 753664 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 95384, size = 716800 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 96784, size = 712704 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 104160, size = 364544 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 104872, size = 356352 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 105568, size = 278528 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 106112, size = 319488 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 106736, size = 258048 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 107240, size = 270336 f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 107768, size = 180224 ... Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-09 16:20:57 -07:00
Jaegeuk Kim	60b99b486b	f2fs: introduce a periodic checkpoint flow This patch introduces a periodic checkpoint feature. Note that, this is not enforcing to conduct checkpoints very strictly in terms of trigger timing, instead just hope to help user experiences. The default value is 60 seconds. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-09 16:20:57 -07:00
Jaegeuk Kim	a7230d16d5	f2fs: check end_io for metapages before making next checkpoint blocks This patch avoids to produce new checkpoint blocks before the previous meta pages were written completely. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-09 16:20:51 -07:00
Jaegeuk Kim	80c545055d	f2fs: use __GFP_NOFAIL to avoid infinite loop __GFP_NOFAIL can avoid retrying the whole path of kmem_cache_alloc and bio_alloc. And, it also fixes the use cases of GFP_ATOMIC correctly. Suggested-by: Chao Yu <chao2.yu@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-24 09:37:21 -07:00
Jaegeuk Kim	315df8398e	f2fs: do not write any node pages related to orphan inodes We should not write node pages when deleting orphan inodes. In order to do that, we can eaisly set POR_DOING flag earlier before entering orphan inode routine. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-20 08:59:42 -07:00
Chao Yu	8c14bfadea	f2fs: handle error of f2fs_iget correctly In recover_orphan_inode, whenever f2fs_iget fail, we will make kernel panic, but it's not reasonable, because f2fs_iget can fail due to a lot of reasons including out of memory. So we change error handling method as below: a) when finding no entry for the orphan inode, bug_on for catching bugs; b) for other reasons, report it to caller. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-14 16:02:14 -07:00
Chao Yu	e90c2d2850	f2fs: invalidate temporary meta page To avoid meeting garbage data in next free node block at the end of warm node chain when doing recovery, we will try to zero out that invalid block. If the device is not support discard, our way for zeroing out block is: grabbing a temporary zeroed page in meta inode, then, issue write request with this page. But, we forget to release that temporary page, so our memory usage will increase without gaining any hit ratio benefit, so it's better to free it for saving memory. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-05 08:19:21 -07:00
Chao Yu	f3f338caad	f2fs: freeze filesystem when fail to update meta page due to IO error In get_meta_page, we guarantee no failure for the returned page, but sometimes, IO error from device will incur returning an non-updated page. Then, we still use this page as updated one, exception could happen when using this kind of page. So in this condition, we'd better freeze fs by making fs readonly and and stop doing checkpoint. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-05 08:08:17 -07:00
Jaegeuk Kim	86531d6b84	f2fs: callers take care of the page from bio error This patch changes for a caller to handle the page after its bio gets an error. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-05 08:08:07 -07:00
Chao Yu	bd936f8407	f2fs: cleanup write_orphan_inodes Previously, since 'commit `4531929e39` ("f2fs: move grabing orphan pages out of protection region")' was committed, in write_orphan_inodes(), we will grab all meta page in a batch before we use them under spinlock, so that we can avoid large time delay of grabbing meta pages under spinlock. Now, 'commit `d6c67a4fee` ("f2fs: revmove spin_lock for write_orphan_inodes")' remove the spinlock in write_orphan_inodes, so there is no issue we describe above, we'd better recover to move the grab operation to original place for readability. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:59 -07:00
Chao Yu	5ac9f36fca	f2fs: fix to record dirty page count for symlink Dirty page can be exist in mapping of newly created symlink, but previously we did not maintain the counting of dirty page for symlink like we maintained for regular/directory, so the counting we lookuped should be wrong. This patch adds missed dirty page counting for symlink to fix this issue. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:52 -07:00
Chao Yu	381722d2ac	f2fs: introduce update_meta_page Add a help function update_meta_page() to update meta page with specified buffer. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-06-01 16:21:00 -07:00
Jaegeuk Kim	4375a33664	f2fs crypto: add encryption support in read/write paths This patch adds encryption support in read and write paths. Note that, in f2fs, we need to consider cleaning operation. In cleaning procedure, we must avoid encrypting and decrypting written blocks. So, this patch implements move_encrypted_block(). Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:52 -07:00
Jaegeuk Kim	836b5a6356	f2fs: issue discard with finally produced len and minlen This patch determines to issue discard commands by comparing given minlen and the length of produced final candidates. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:39 -07:00
Jaegeuk Kim	a66cdd9855	f2fs: introduce discard_map for f2fs_trim_fs This patch adds a bitmap for discard issues from f2fs_trim_fs. There-in rule is to issue discard commands only for invalidated blocks after mount. Once mount is done, f2fs_trim_fs trims out whole invalid area. After ehn, it will not issue and discrads redundantly. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:39 -07:00
Jaegeuk Kim	d6c67a4fee	f2fs: revmove spin_lock for write_orphan_inodes This patch removes spin_lock, since this is covered by f2fs_lock_op already. And, we should avoid to use page operations inside spin_lock. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:38 -07:00
Jaegeuk Kim	05ca3632e5	f2fs: add sbi and page pointer in f2fs_io_info This patch adds f2fs_sb_info and page pointers in f2fs_io_info structure. With this change, we can reduce a lot of parameters for IO functions. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:32 -07:00
Chao Yu	f0c9cadae6	f2fs: use is_valid_blkaddr to verify blkaddr for readability Export is_valid_blkaddr() and use it to replace some codes for readability. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-07 11:38:32 -07:00
Jaegeuk Kim	10027551cc	f2fs: pass checkpoint reason on roll-forward recovery This patch adds CP_RECOVERY to remain recovery information for checkpoint. And, it makes sure writing checkpoint in this case. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-16 09:45:40 -07:00
Changman Lee	e0150392dd	f2fs: cleanup statement about max orphan inodes calc Through each macro, we can read the meaning easily. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:38 -07:00
Sebastian Andrzej Siewior	7ecebe5e07	f2fs: add cond_resched() to sync_dirty_dir_inodes() In a preempt-off enviroment a alot of FS activity (write/delete) I run into a CPU stall: \| NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u2:2:59] \| Modules linked in: \| CPU: 0 PID: 59 Comm: kworker/u2:2 Tainted: G W 3.19.0-00010-g10c11c51ffed #153 \| Workqueue: writeback bdi_writeback_workfn (flush-179:0) \| task: df230000 ti: df23e000 task.ti: df23e000 \| PC is at __submit_merged_bio+0x6c/0x110 \| LR is at f2fs_submit_merged_bio+0x74/0x80 … \| [<c00085c4>] (gic_handle_irq) from [<c0012e84>] (__irq_svc+0x44/0x5c) \| Exception stack(0xdf23fb48 to 0xdf23fb90) \| fb40: deef3484 ffff0001 ffff0001 00000027 deef3484 00000000 \| fb60: deef3440 00000000 de426000 deef34ec deefc440 df23fbb4 df23fbb8 df23fb90 \| fb80: c02191f0 c0218fa0 60000013 ffffffff \| [<c0012e84>] (__irq_svc) from [<c0218fa0>] (__submit_merged_bio+0x6c/0x110) \| [<c0218fa0>] (__submit_merged_bio) from [<c02191f0>] (f2fs_submit_merged_bio+0x74/0x80) \| [<c02191f0>] (f2fs_submit_merged_bio) from [<c021624c>] (sync_dirty_dir_inodes+0x70/0x78) \| [<c021624c>] (sync_dirty_dir_inodes) from [<c0216358>] (write_checkpoint+0x104/0xc10) \| [<c0216358>] (write_checkpoint) from [<c021231c>] (f2fs_sync_fs+0x80/0xbc) \| [<c021231c>] (f2fs_sync_fs) from [<c0221eb8>] (f2fs_balance_fs_bg+0x4c/0x68) \| [<c0221eb8>] (f2fs_balance_fs_bg) from [<c021e9b8>] (f2fs_write_node_pages+0x40/0x110) \| [<c021e9b8>] (f2fs_write_node_pages) from [<c00de620>] (do_writepages+0x34/0x48) \| [<c00de620>] (do_writepages) from [<c0145714>] (__writeback_single_inode+0x50/0x228) \| [<c0145714>] (__writeback_single_inode) from [<c0146184>] (writeback_sb_inodes+0x1a8/0x378) \| [<c0146184>] (writeback_sb_inodes) from [<c01463e4>] (__writeback_inodes_wb+0x90/0xc8) \| [<c01463e4>] (__writeback_inodes_wb) from [<c01465f8>] (wb_writeback+0x1dc/0x28c) \| [<c01465f8>] (wb_writeback) from [<c0146dd8>] (bdi_writeback_workfn+0x2ac/0x460) \| [<c0146dd8>] (bdi_writeback_workfn) from [<c003c3fc>] (process_one_work+0x11c/0x3a4) \| [<c003c3fc>] (process_one_work) from [<c003c844>] (worker_thread+0x17c/0x490) \| [<c003c844>] (worker_thread) from [<c0041398>] (kthread+0xec/0x100) \| [<c0041398>] (kthread) from [<c000ed10>] (ret_from_fork+0x14/0x24) As it turns out, the code loops in sync_dirty_dir_inodes() and waits for others to make progress but since it never leaves the CPU there is no progress made. At the time of this stall, there is also a rm process blocked: \| rm R running 0 1989 1774 0x00000000 \| [<c047c55c>] (__schedule) from [<c00486dc>] (__cond_resched+0x30/0x4c) \| [<c00486dc>] (__cond_resched) from [<c047c8c8>] (_cond_resched+0x4c/0x54) \| [<c047c8c8>] (_cond_resched) from [<c00e1aec>] (truncate_inode_pages_range+0x1f0/0x5e8) \| [<c00e1aec>] (truncate_inode_pages_range) from [<c00e1fd8>] (truncate_inode_pages+0x28/0x30) \| [<c00e1fd8>] (truncate_inode_pages) from [<c00e2148>] (truncate_inode_pages_final+0x60/0x64) \| [<c00e2148>] (truncate_inode_pages_final) from [<c020c92c>] (f2fs_evict_inode+0x4c/0x268) \| [<c020c92c>] (f2fs_evict_inode) from [<c0137214>] (evict+0x94/0x140) \| [<c0137214>] (evict) from [<c01377e8>] (iput+0xc8/0x134) \| [<c01377e8>] (iput) from [<c01333e4>] (d_delete+0x154/0x180) \| [<c01333e4>] (d_delete) from [<c0129870>] (vfs_rmdir+0x114/0x12c) \| [<c0129870>] (vfs_rmdir) from [<c012d644>] (do_rmdir+0x158/0x168) \| [<c012d644>] (do_rmdir) from [<c012dd90>] (SyS_unlinkat+0x30/0x3c) \| [<c012dd90>] (SyS_unlinkat) from [<c000ec40>] (ret_fast_syscall+0x0/0x4c) As explained by Jaegeuk Kim: \|This inode is the directory (c.f., do_rmdir) causing a infinite loop on \|sync_dirty_dir_inodes. \|The sync_dirty_dir_inodes tries to flush dirty dentry pages, but if the \|inode is under eviction, it submits bios and do it again until eviction \|is finished. This patch adds a cond_resched() (as suggested by Jaegeuk) after a BIO is submitted so other thread can make progress. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> [Jaegeuk Kim: change fs/f2fs to f2fs in subject as naming convention] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:30 -07:00
Wanpeng Li	14b4281776	f2fs: fix max orphan inodes calculation cp_payload is introduced for sit bitmap to support large volume, and it is just after the block of f2fs_checkpoint + nat bitmap, so the first segment should include F2FS_CP_PACKS + NR_CURSEG_TYPE + cp_payload + orphan blocks. However, current max orphan inodes calculation don't consider cp_payload, this patch fix it by reducing the number of cp_payload from total blocks of the first segment when calculate max orphan inodes. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:29 -07:00
Wanpeng Li	2bda542d59	f2fs: fix block_ops trace point block operations is used to flush all dirty node and dentry blocks in the page cache and suspend ordinary writing activities, however, there are some facts such like cp error or mount read-only etc which lead to block operations can't be invoked. Current trace point print block_ops start premature even if block_ops doesn't have opportunity to execute. This patch fix it by move block_ops trace point just before block_ops. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:28 -07:00
Wanpeng Li	3c64298579	f2fs: fix the number of orphan inode blocks cp_pack_start_sum is calculated in do_checkpoint and is equal to cpu_to_le32(1 + cp_payload_blks + orphan_blocks). The number of orphan inode blocks is take advantage of by recover_orphan_inodes to readahead meta pages and recovery inodes. However, current codes forget to reduce the number of cp payload blocks when calculate the number of orphan inode blocks. This patch fix it. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:26 -07:00
Wanpeng Li	551414861f	f2fs: introduce macro __cp_payload This patch introduce macro __cp_payload. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:25 -07:00
Chao Yu	97dc3fd2cb	f2fs: use ->writepage in sync_meta_pages This patch uses ->writepage of meta mapping in sync_meta_pages instead of f2fs_write_meta_page, by this way, in its caller we can ignore any changes (e.g. changing name) of this registered function. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-03-03 09:58:44 -08:00
Jaegeuk Kim	29e7043f40	f2fs: fix sparse warnings This patch resolves the following warnings. include/trace/events/f2fs.h:150:1: warning: expression using sizeof bool include/trace/events/f2fs.h:180:1: warning: expression using sizeof bool include/trace/events/f2fs.h:990:1: warning: expression using sizeof bool include/trace/events/f2fs.h:990:1: warning: expression using sizeof bool include/trace/events/f2fs.h:150:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) include/trace/events/f2fs.h:180:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) include/trace/events/f2fs.h:990:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) include/trace/events/f2fs.h:990:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) fs/f2fs/checkpoint.c:27:19: warning: symbol 'inode_entry_slab' was not declared. Should it be static? fs/f2fs/checkpoint.c:577:15: warning: cast to restricted __le32 fs/f2fs/checkpoint.c:592:15: warning: cast to restricted __le32 fs/f2fs/trace.c:19:1: warning: symbol 'pids' was not declared. Should it be static? fs/f2fs/trace.c:21:21: warning: symbol 'last_io' was not declared. Should it be static? Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:49 -08:00
Jaegeuk Kim	f7ef9b83b5	f2fs: introduce macros to convert bytes and blocks in f2fs This patch adds two macros for transition between byte and block offsets. Currently, f2fs only supports 4KB blocks, so use the default size for now. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:48 -08:00
Chao Yu	487261f39b	f2fs: merge {invalidate,release}page for meta/node/data pages This patch merges ->{invalidate,release}page function for meta/node/data pages. After this, duplication of codes could be removed. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:44 -08:00
Jaegeuk Kim	f68daeebba	f2fs: keep PagePrivate during releasepage If PagePrivate is removed by releasepage, f2fs loses counting dirty pages. e.g., try_to_release_page will not release page when the page is dirty, but our releasepage removes PagePrivate. [<ffffffff81188d75>] try_to_release_page+0x35/0x50 [<ffffffff811996f9>] invalidate_inode_pages2_range+0x2f9/0x3b0 [<ffffffffa02a7f54>] ? truncate_blocks+0x384/0x4d0 [f2fs] [<ffffffffa02b7583>] ? f2fs_direct_IO+0x283/0x290 [f2fs] [<ffffffffa02b7fb0>] ? get_data_block_fiemap+0x20/0x20 [f2fs] [<ffffffff8118aa53>] generic_file_direct_write+0x163/0x170 [<ffffffff8118ad06>] __generic_file_write_iter+0x2a6/0x350 [<ffffffff8118adef>] generic_file_write_iter+0x3f/0xb0 [<ffffffff81203081>] new_sync_write+0x81/0xb0 [<ffffffff81203837>] vfs_write+0xb7/0x1f0 [<ffffffff81204459>] SyS_write+0x49/0xb0 [<ffffffff817c286d>] system_call_fastpath+0x16/0x1b Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:42 -08:00
Jaegeuk Kim	119ee91445	f2fs: split UMOUNT and FASTBOOT flags This patch adds FASTBOOT flag into checkpoint as follows. - CP_UMOUNT_FLAG is set when system is umounted. - CP_FASTBOOT_FLAG is set when intermediate checkpoint having node summaries was done. So, if you get CP_UMOUNT_FLAG from checkpoint, the system was umounted cleanly. Instead, if there was sudden-power-off, you can get CP_FASTBOOT_FLAG or nothing. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:41 -08:00
Jaegeuk Kim	11504a8e7e	f2fs: avoid write_checkpoint if f2fs is mounted readonly Do not change any partition when f2fs is changed to readonly mode. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:40 -08:00
Chao Yu	caf0047e7e	f2fs: merge flags in struct f2fs_sb_info Currently, there are several variables with Boolean type as below: struct f2fs_sb_info { ... int s_dirty; bool need_fsck; bool s_closing; ... bool por_doing; ... } For this there are some issues: 1. there are some space of f2fs_sb_info is wasted due to aligning after Boolean type variables by compiler. 2. if we continuously add new flag into f2fs_sb_info, structure will be messed up. So in this patch, we try to: 1. switch s_dirty to Boolean type variable since it has two status 0/1. 2. merge s_dirty/need_fsck/s_closing/por_doing variables into s_flag. 3. introduce an enum type which can indicate different states of sbi. 4. use new introduced universal interfaces is_sbi_flag_set/{set,clear}_sbi_flag to operate flags for sbi. After that, above issues will be fixed. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:38 -08:00
Chao Yu	1601839e9e	f2fs: fix to release count of meta page in ->invalidatepage We will encounter deadloop in below scenario: 1. increase page count for F2FS_DIRTY_META type in following path: ->recover_fsync_data ->recover_data ->do_recover_data ->recover_data_page ->change_curseg ->write_sum_page ->set_page_dirty 2. fail in recover_data() 3. invalidate meta pages in truncate_inode_pages_final without decreasing page count. 4. deadloop when sync_meta_pages as page count will always be non-zero. message: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [<c1129a37>] pagevec_lookup_tag+0x27/0x30 [<f0e774c7>] sync_meta_pages+0x87/0x160 [f2fs] [<f0e86dd9>] recover_fsync_data+0xeb9/0xf10 [f2fs] [<f0e75398>] f2fs_fill_super+0x888/0x980 [f2fs] [<c11733ca>] mount_bdev+0x16a/0x1a0 [<f0e7180f>] f2fs_mount+0x1f/0x30 [f2fs] [<c1173da6>] mount_fs+0x36/0x170 [<c118b6f5>] vfs_kern_mount+0x55/0xe0 [<c118d63f>] do_mount+0x1df/0x9f0 [<c118e110>] SyS_mount+0x70/0xb0 [<c15a0c48>] sysenter_do_call+0x12/0x12 To avoid page count leak, let's add ->invalidatepage and ->releasepage in f2fs_meta_aops as f2fs_node_aops to release meta page count correctly. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:33 -08:00
Jaegeuk Kim	85dc2f2c6c	f2fs: do checkpoint when umount flag is not set If the previous checkpoint was done without CP_UMOUNT flag, it needs to do checkpoint with CP_UMOUNT for the next fast boot. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:33 -08:00
Chao Yu	062920734c	f2fs: reuse inode_entry_slab in gc procedure for using slab more effectively There are two slab cache inode_entry_slab and winode_slab using the same structure as below: struct dir_inode_entry { struct list_head list; /* list head / struct inode inode; /* vfs inode pointer / }; struct inode_entry { struct list_head list; struct inode inode; }; It's a little waste that the two cache can not share their memory space for each other. So in this patch we remove one redundant winode_slab slab cache, then use more universal name struct inode_entry as remaining data structure name of slab, finally we reuse the inode_entry_slab to store dirty dir item and gc item for more effective. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-01-09 17:02:26 -08:00
Jaegeuk Kim	9e4ded3f30	f2fs: activate f2fs_trace_pid This patch activates f2fs_trace_pid. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-01-09 17:02:24 -08:00
Jaegeuk Kim	cf04e8eb55	f2fs: use f2fs_io_info to clean up messy parameters during IO path This patch cleans up parameters on IO paths. The key idea is to use f2fs_io_info adding a parameter, block address, and then use this structure as parameters. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-01-09 17:02:23 -08:00
Chao Yu	3fa06d7bc9	f2fs: readahead contiguous current summary blocks in checkpoint Let's add readahead code for reading contiguous compact/normal summary blocks in checkpoint, then we will gain better performance in mount procedure. Changes from v1 o remove inappropriate 'unlikely' in npages_for_summary_flush. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-01-09 17:02:23 -08:00
Chao Yu	635aee1fef	f2fs: avoid to ra unneeded blocks in recover flow To improve recovery speed, f2fs try to readahead many contiguous blocks in warm node segment, but for most time, abnormal power-off do not occur frequently, so when mount a normal power-off f2fs image, by contrary ra so many blocks and then invalid them will hurt the performance of mount. It's better to just ra the first next-block for normal condition. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-12-08 14:19:09 -08:00
Chao Yu	66b00c1867	f2fs: introduce is_valid_blkaddr to cleanup codes in ra_meta_pages This patch does cleanup work, it introduces is_valid_blkaddr() to include verification code for blkaddr with upper and down boundary value which were in ra_meta_pages previous. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-12-08 14:19:08 -08:00
Chao Yu	13da549460	f2fs: fix to enable readahead for SSA/CP blocks 1.We use zero as upper boundary value for ra SSA/CP blocks, we will skip readahead as verification failure with max number, it causes low performance. 2.Low boundary value is not accurate for SSA/CP/POR region verification, so these values need to be redefined. This patch fixes above issues. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-12-08 14:19:07 -08:00
Jaegeuk Kim	769ec6e5b7	f2fs: call radix_tree_preload before radix_tree_insert This patch tries to fix: BUG: using smp_processor_id() in preemptible [00000000] code: f2fs_gc-254:0/384 (radix_tree_node_alloc+0x14/0x74) from [<c033d8a0>] (radix_tree_insert+0x110/0x200) (radix_tree_insert+0x110/0x200) from [<c02e8264>] (gc_data_segment+0x340/0x52c) (gc_data_segment+0x340/0x52c) from [<c02e8658>] (f2fs_gc+0x208/0x400) (f2fs_gc+0x208/0x400) from [<c02e8a98>] (gc_thread_func+0x248/0x28c) (gc_thread_func+0x248/0x28c) from [<c0139944>] (kthread+0xa0/0xac) (kthread+0xa0/0xac) from [<c0105ef8>] (ret_from_fork+0x14/0x3c) The reason is that f2fs calls radix_tree_insert under enabled preemption. So, before calling it, we need to call radix_tree_preload. Otherwise, we should use _GFP_WAIT for the radix tree, and use mutex or semaphore to cover the radix tree operations. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-12-05 09:51:04 -08:00
Jaegeuk Kim	857dc4e059	f2fs: write SSA pages under memory pressure Under memory pressure, we don't need to skip SSA page writes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-19 22:49:33 -08:00
Chao Yu	67298804f3	f2fs: introduce struct inode_management to wrap inner fields Now in f2fs, we have three inode cache: ORPHAN_INO, APPEND_INO, UPDATE_INO, and we manage fields related to inode cache separately in struct f2fs_sb_info for each inode cache type. This makes codes a bit messy, so that this patch intorduce a new struct inode_management to wrap inner fields as following which make codes more neat. /* for inner inode cache management / struct inode_management { struct radix_tree_root ino_root; / ino entry array / spinlock_t ino_lock; / for ino entry lock / struct list_head ino_list; / inode list head / unsigned long ino_num; / number of entries / }; struct f2fs_sb_info { ... struct inode_management im[MAX_INO_ENTRY]; / manage inode cache */ ... } Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-19 22:49:32 -08:00
Jaegeuk Kim	8c402946f0	f2fs: introduce the number of inode entries This patch adds to monitor the number of ino entries. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-06 15:17:43 -08:00
Jaegeuk Kim	6a8f8ca582	f2fs: avoid race condition in handling wait_io __submit_merged_bio f2fs_write_end_io f2fs_write_end_io wait_io = X wait_io = x complete(X) complete(X) wait_io = NULL wait_for_completion() free(X) spin_lock(X) kernel panic In order to avoid this, this patch removes the wait_io facility. Instead, we can use wait_on_all_pages_writeback(sbi) to wait for end_ios. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-04 17:34:14 -08:00
Jaegeuk Kim	af41d3ee00	f2fs: avoid infinite loop at cp_error This patch avoids an infinite loop in sync_dirty_inode_page when -EIO was detected. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-03 16:07:31 -08:00
Jaegeuk Kim	7cd8558baa	f2fs: check the use of macros on block counts and addresses This patch cleans up the existing and new macros for readability. Rule is like this. ,-----------------------------------------> MAX_BLKADDR -, \| ,------------- TOTAL_BLKS ----------------------------, \| \| \| \| ,- seg0_blkaddr ,----- sit/nat/ssa/main blkaddress \| block \| \| (SEG0_BLKADDR) \| \| \| \| (e.g., MAIN_BLKADDR) \| address 0..x................ a b c d ............................. \| \| global seg# 0...................... m ............................. \| \| \| \| `------- MAIN_SEGS -----------' `-------------- TOTAL_SEGS ---------------------------' \| \| seg# 0..........xx.................. = Note = o GET_SEGNO_FROM_SEG0 : blk address -> global segno o GET_SEGNO : blk address -> segno o START_BLOCK : segno -> starting block address Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-30 15:34:47 -07:00
Jaegeuk Kim	4b2fecc846	f2fs: introduce FITRIM in f2fs_ioctl This patch introduces FITRIM in f2fs_ioctl. In this case, f2fs will issue small discards and prefree discards as many as possible for the given area. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-30 15:06:09 -07:00
Jaegeuk Kim	75ab4cb830	f2fs: introduce cp_control structure This patch add a new data structure to control checkpoint parameters. Currently, it presents the reason of checkpoint such as is_umount and normal sync. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-30 15:01:28 -07:00
Jaegeuk Kim	90a893c749	f2fs: use MAX_BIO_BLOCKS(sbi) This patch cleans up a simple macro. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:18 -07:00
Jaegeuk Kim	4c521f493b	f2fs: use meta_inode cache to improve roll-forward speed Previously, all the dnode pages should be read during the roll-forward recovery. Even worsely, whole the chain was traversed twice. This patch removes that redundant and costly read operations by using page cache of meta_inode and readahead function as well. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:12 -07:00
Huang Ying	7704182387	f2fs: use nm_i->next_scan_nid as default for next_free_nid Now, if there is no free nid in nm_i->free_nid_list, 0 may be saved into next_free_nid of checkpoint, this may cause useless scanning for next mount. nm_i->next_scan_nid should be a better default value than 0. Signed-off-by: Huang, Ying <ying.huang@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-16 04:10:45 -07:00
Jaegeuk Kim	a7ffdbe22c	f2fs: expand counting dirty pages in the inode page cache Previously f2fs only counts dirty dentry pages, but there is no reason not to expand the scope. This patch changes the names on the management of dirty pages and to count dirty pages in each inode info as well. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-16 04:10:39 -07:00
Jaegeuk Kim	9850cf4a89	f2fs: need fsck.f2fs when f2fs_bug_on is triggered If any f2fs_bug_on is triggered, fsck.f2fs is needed. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-09 13:15:02 -07:00
Jaegeuk Kim	2ae4c673e3	f2fs: retain inconsistency information to initiate fsck.f2fs This patch adds sbi->need_fsck to conduct fsck.f2fs later. This flag can only be removed by fsck.f2fs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-09 13:14:25 -07:00
Jaegeuk Kim	4081363fbe	f2fs: introduce F2FS_I_SB, F2FS_M_SB, and F2FS_P_SB This patch adds three inline functions to clean up dirty casting codes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-03 17:37:13 -07:00
Chao Yu	b5b822050c	f2fs: use macro for code readability This patch introduces DEF_NIDS_PER_INODE/GET_ORPHAN_BLOCKS/F2FS_CP_PACKS macro instead of numbers in code for readability. change log from v1: o fix typo pointed out by Jaegeuk Kim. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-22 13:56:47 -07:00
Jaegeuk Kim	cf779cab14	f2fs: handle EIO not to break fs consistency There are two rules when EIO is occurred. 1. don't write any checkpoint data to preserve the previous checkpoint 2. don't lose the cached dentry/node/meta pages So, at first, this patch adds set_page_dirty in f2fs_write_end_io's failure. Then, writing checkpoint/dentry/node blocks is not allowed. Note that, for the data pages, we can't just throw away by redirtying them. Otherwise, kworker can fall into infinite loop to flush them. (Ref. xfstests/019) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-21 13:55:05 -07:00
Jaegeuk Kim	8501017e50	f2fs: check s_dirty under cp_mutex It needs to check s_dirty under cp_mutex, since s_dirty is reset under that mutex. And previous condition was not correct, since we can omit doing checkpoint when checkpoint was done followed by all the node pages were written back. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-21 09:21:02 -07:00
Jaegeuk Kim	1e968fdfe6	f2fs: introduce f2fs_cp_error for readability This patch adds f2fs_cp_error for readability. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-21 09:21:00 -07:00
Jaegeuk Kim	6f12ac25f0	f2fs: trigger release_dirty_inode in f2fs_put_super The generic_shutdown_super calls sync_filesystem, evict_inode, and then f2fs_put_super. In f2fs_evict_inode, we remain some dirty inode information so we should release them at f2fs_put_super. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-21 09:20:29 -07:00
arter97	e1c4204520	f2fs: fix typo Fix typo and some grammatical errors. The words "filesystem" and "readahead" are being used without the space treewide. Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-19 10:01:33 -07:00
Chao Yu	b3582c6892	f2fs: reduce competition among node page writes We do not need to block on ->node_write among different node page writers e.g. fsync/flush, unless we have a node page writer from write_checkpoint. So it's better use rw_semaphore instead of mutex type for ->node_write to promote performance. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 23:28:37 -07:00
Jaegeuk Kim	cf2271e781	f2fs: avoid retrying wrong recovery routine when error was occurred This patch eliminates the propagation of recovery errors to the next mount. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 14:13:35 -07:00
Jaegeuk Kim	01229f5e1b	f2fs: fix wrong condition for unlikely This patch fixes the wrongly used unlikely condition. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 14:13:31 -07:00
Jaegeuk Kim	fff04f90c1	f2fs: add info of appended or updated data writes This patch introduces a inode number list in which represents inodes having appended data writes or updated data writes after last checkpoint. This will be used at fsync to determine whether the recovery information should be written or not. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 07:46:11 -07:00
Jaegeuk Kim	39efac41fb	f2fs: use radix_tree for ino management For better ino management, this patch replaces the data structure from list to radix tree. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 07:46:11 -07:00
Jaegeuk Kim	6451e041c8	f2fs: add infra for ino management This patch changes the naming of orphan-related data structures to use as inode numbers managed globally. Later, we can use this facility for managing any inode number lists. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 07:45:54 -07:00
Jaegeuk Kim	953e6cc6bc	f2fs: punch the core function for inode management This patch punches out the core functions to manage the inode numbers. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 07:40:25 -07:00
Linus Torvalds	64b2d1fbbf	f2fs updates for v3.16 This patch-set includes the following major enhancement patches. o enhance wait_on_page_writeback o support SEEK_DATA and SEEK_HOLE o enhance readahead flows o enhance IO flushes o support fiemap o add some tracepoints The other bug fixes are as follows. o fix to support a large volume > 2TB correctly o recovery bug fix wrt fallocated space o fix recursive lock on xattr operations o fix some cases on the remount flow And, there are a bunch of cleanups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTleLYAAoJEEAUqH6CSFDSdhEP/iY5hTZ02jH4ZiFPP/Nee4hS l0BHeZvrMjoccaWUDu0ZvIPC8BiJ7kLOgK+/VWS7LAfY1PY11ALEtYQOrW+RM47+ jMfULegTod/F8WS2B6dk31QMhAZldtnsYvA5PS1VV3S0rht+qbOz+PDejZFgSMc3 VVQ7OZzq30gMmtsw7oh3FHfeTu4xe/bxygdMRXgljQQD2MvorJiOb4MA+ovEDd9z CZMMTQvRKjc0d8LPf0gOkZEvG63GR6klCgFMuiappUsua0O8IPIEhCyEGFrE66vS fObVKpqNAsR2ABhS2grgn6Q7UUvn4xrF6jOwDH5tuw2yzmkQiMAWINwBdgnbEy1c D5S89PQ8TkQK9KBSoU0v8WKWC4HzWZF4ZEd6eq9VxVTS8iT0w8DtNHXK518FVC34 82iqrLc0EhrcGNiW/7Nrc6WzHrWqorylCFN7atB3ruhVqeMh3MZsDU4Gq0WgB2oh pF9XVBEpJQpV35DYSAPzLkm+2+mwHVNqwdY3HcvUs7IP90+wZlrWSRG2FEfFt/e8 6nwvbORrHMTI5VfdYlEPgpjuesFmYPZqEvRGdaDa1YrHqhvvgStEPT9qiq2qVn9+ tr0HjpNRDD/etkaE6ujYOYqdxuk3mm6RY68h+KSbNcY1/VTvt2bN2telwWuDsxIq 8yOsxV2x3JB/euDAJsSU =xqsO -----END PGP SIGNATURE----- Merge tag 'for-f2fs-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, there is no special interesting feature, but we've investigated a couple of tuning points with respect to the I/O flow. Several major bug fixes and a bunch of clean-ups also have been made. This patch-set includes the following major enhancement patches: - enhance wait_on_page_writeback - support SEEK_DATA and SEEK_HOLE - enhance readahead flows - enhance IO flushes - support fiemap - add some tracepoints The other bug fixes are as follows: - fix to support a large volume > 2TB correctly - recovery bug fix wrt fallocated space - fix recursive lock on xattr operations - fix some cases on the remount flow And, there are a bunch of cleanups" * tag 'for-f2fs-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (52 commits) f2fs: support f2fs_fiemap f2fs: avoid not to call remove_dirty_inode f2fs: recover fallocated space f2fs: fix to recover data written by dio f2fs: large volume support f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages f2fs: avoid overflow when large directory feathure is enabled f2fs: fix recursive lock by f2fs_setxattr MAINTAINERS: add a co-maintainer from samsung for F2FS MAINTAINERS: change the email address for f2fs f2fs: use inode_init_owner() to simplify codes f2fs: avoid to use slab memory in f2fs_issue_flush for efficiency f2fs: add a tracepoint for f2fs_read_data_page f2fs: add a tracepoint for f2fs_write_{meta,node,data}_pages f2fs: add a tracepoint for f2fs_write_{meta,node,data}_page f2fs: add a tracepoint for f2fs_write_end f2fs: add a tracepoint for f2fs_write_begin f2fs: fix checkpatch warning f2fs: deactivate inode page if the inode is evicted f2fs: decrease the lock granularity during write_begin ...	2014-06-09 19:11:44 -07:00
Mel Gorman	2457aec637	mm: non-atomically mark page accessed during page cache allocation where possible aops->write_begin may allocate a new page and make it visible only to have mark_page_accessed called almost immediately after. Once the page is visible the atomic operations are necessary which is noticable overhead when writing to an in-memory filesystem like tmpfs but should also be noticable with fast storage. The objective of the patch is to initialse the accessed information with non-atomic operations before the page is visible. The bulk of filesystems directly or indirectly use grab_cache_page_write_begin or find_or_create_page for the initial allocation of a page cache page. This patch adds an init_page_accessed() helper which behaves like the first call to mark_page_accessed() but may called before the page is visible and can be done non-atomically. The primary APIs of concern in this care are the following and are used by most filesystems. find_get_page find_lock_page find_or_create_page grab_cache_page_nowait grab_cache_page_write_begin All of them are very similar in detail to the patch creates a core helper pagecache_get_page() which takes a flags parameter that affects its behavior such as whether the page should be marked accessed or not. Then old API is preserved but is basically a thin wrapper around this core function. Each of the filesystems are then updated to avoid calling mark_page_accessed when it is known that the VM interfaces have already done the job. There is a slight snag in that the timing of the mark_page_accessed() has now changed so in rare cases it's possible a page gets to the end of the LRU as PageReferenced where as previously it might have been repromoted. This is expected to be rare but it's worth the filesystem people thinking about it in case they see a problem with the timing change. It is also the case that some filesystems may be marking pages accessed that previously did not but it makes sense that filesystems have consistent behaviour in this regard. The test case used to evaulate this is a simple dd of a large file done multiple times with the file deleted on each iterations. The size of the file is 1/10th physical memory to avoid dirty page balancing. In the async case it will be possible that the workload completes without even hitting the disk and will have variable results but highlight the impact of mark_page_accessed for async IO. The sync results are expected to be more stable. The exception is tmpfs where the normal case is for the "IO" to not hit the disk. The test machine was single socket and UMA to avoid any scheduling or NUMA artifacts. Throughput and wall times are presented for sync IO, only wall times are shown for async as the granularity reported by dd and the variability is unsuitable for comparison. As async results were variable do to writback timings, I'm only reporting the maximum figures. The sync results were stable enough to make the mean and stddev uninteresting. The performance results are reported based on a run with no profiling. Profile data is based on a separate run with oprofile running. async dd 3.15.0-rc3 3.15.0-rc3 vanilla accessed-v2 ext3 Max elapsed 13.9900 ( 0.00%) 11.5900 ( 17.16%) tmpfs Max elapsed 0.5100 ( 0.00%) 0.4900 ( 3.92%) btrfs Max elapsed 12.8100 ( 0.00%) 12.7800 ( 0.23%) ext4 Max elapsed 18.6000 ( 0.00%) 13.3400 ( 28.28%) xfs Max elapsed 12.5600 ( 0.00%) 2.0900 ( 83.36%) The XFS figure is a bit strange as it managed to avoid a worst case by sheer luck but the average figures looked reasonable. samples percentage ext3 86107 0.9783 vmlinux-3.15.0-rc4-vanilla mark_page_accessed ext3 23833 0.2710 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed ext3 5036 0.0573 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed ext4 64566 0.8961 vmlinux-3.15.0-rc4-vanilla mark_page_accessed ext4 5322 0.0713 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed ext4 2869 0.0384 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed xfs 62126 1.7675 vmlinux-3.15.0-rc4-vanilla mark_page_accessed xfs 1904 0.0554 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed xfs 103 0.0030 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed btrfs 10655 0.1338 vmlinux-3.15.0-rc4-vanilla mark_page_accessed btrfs 2020 0.0273 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed btrfs 587 0.0079 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed tmpfs 59562 3.2628 vmlinux-3.15.0-rc4-vanilla mark_page_accessed tmpfs 1210 0.0696 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed tmpfs 94 0.0054 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed [akpm@linux-foundation.org: don't run init_page_accessed() against an uninitialised pointer] Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Tested-by: Prabhakar Lad <prabhakar.csengg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-04 16:54:10 -07:00
Changman Lee	1dbe415216	f2fs: large volume support f2fs's cp has one page which consists of struct f2fs_checkpoint and version bitmap of sit and nat. To support lots of segments, we need more blocks for sit bitmap. So let's arrange sit bitmap as following: +-----------------+------------+ \| f2fs_checkpoint \| sit bitmap \| \| + nat bitmap \| \| +-----------------+------------+ 0 4k N blocks Signed-off-by: Changman Lee <cm224.lee@samsung.com> [Jaegeuk Kim: simple code change for readability] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-06-04 13:34:30 +09:00
Chao Yu	e574843438	f2fs: add a tracepoint for f2fs_write_{meta,node,data}_pages This patch adds a tracepoint for f2fs_write_{meta,node,data}_pages to trace when pages are fsyncing/flushing. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:59 +09:00
Chao Yu	ecda0de343	f2fs: add a tracepoint for f2fs_write_{meta,node,data}_page This patch adds a tracepoint for f2fs_write_{meta,node,data}_page to trace when page is writting out. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:59 +09:00
Jaegeuk Kim	bde446866c	f2fs: no need to wait on page writebck to meta pages This patch removes grab_cache_page_write_begin for meta pages. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:58 +09:00
Fabian Frederick	b49ad51e6d	f2fs: add static to get_max_meta_blks inline get_max_meta_blks is only used in checkpoint.c Use standard static inline format. Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:55 +09:00
Jaegeuk Kim	ed57c27f73	f2fs: remove costly dirty_dir_inode operations This patch removes list opeations in handling dirty dir inodes. Previously, F2FS traverses whole the list of dirty dir inodes to check whether there is an existing inode or not, resulting in heavy CPU overheads. So this patch removes such the traverse operations by adding FI_DIRTY_DIR to indicate the inode lies on the list or not. Through this simple flag, we can remove redundant operations gracefully. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:54 +09:00
Jaegeuk Kim	76f60268e7	f2fs: call redirty_page_for_writepage This patch replace some general codes with redirty_page_for_writepage, which can be enabled after consideration on additional procedure like counting dirty pages appropriately. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:54 +09:00
Jaegeuk Kim	1e87a78d95	f2fs: avoid to conduct roll-forward due to the remained garbage blocks The f2fs always scans the next chain of direct node blocks. But some garbage blocks are able to be remained due to no discard support or SSR triggers. This occasionally wreaks recovering wrong inodes that were used or BUG_ONs due to reallocating node ids as follows. When mount this f2fs image: http://linuxtesting.org/downloads/f2fs_fault_image.zip BUG_ON is triggered in f2fs driver (messages below are generated on kernel 3.13.2; for other kernels output is similar): kernel BUG at fs/f2fs/node.c:215! Call Trace: [<ffffffffa032ebad>] recover_inode_page+0x1fd/0x3e0 [f2fs] [<ffffffff811446e7>] ? __lock_page+0x67/0x70 [<ffffffff81089990>] ? autoremove_wake_function+0x50/0x50 [<ffffffffa0337788>] recover_fsync_data+0x1398/0x15d0 [f2fs] [<ffffffff812b9e5c>] ? selinux_d_instantiate+0x1c/0x20 [<ffffffff811cb20b>] ? d_instantiate+0x5b/0x80 [<ffffffffa0321044>] f2fs_fill_super+0xb04/0xbf0 [f2fs] [<ffffffff811b861e>] ? mount_bdev+0x7e/0x210 [<ffffffff811b8769>] mount_bdev+0x1c9/0x210 [<ffffffffa0320540>] ? validate_superblock+0x210/0x210 [f2fs] [<ffffffffa031cf8d>] f2fs_mount+0x1d/0x30 [f2fs] [<ffffffff811b9497>] mount_fs+0x47/0x1c0 [<ffffffff81166e00>] ? __alloc_percpu+0x10/0x20 [<ffffffff811d4032>] vfs_kern_mount+0x72/0x110 [<ffffffff811d6763>] do_mount+0x493/0x910 [<ffffffff811615cb>] ? strndup_user+0x5b/0x80 [<ffffffff811d6c70>] SyS_mount+0x90/0xe0 [<ffffffff8166f8d9>] system_call_fastpath+0x16/0x1b Found by Linux File System Verification project (linuxtesting.org). Reported-by: Andrey Tsyvarev <tsyvarev@ispras.ru> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-05-07 10:21:54 +09:00
Chao Yu	2d7b822ad9	f2fs: use list_for_each_entry{_safe} for simplyfying code This patch use list_for_each_entry{_safe} instead of list_for_each{_safe} for simplfying code. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-04-02 09:56:27 +09:00
Chao Yu	cf0ee0f09b	f2fs: avoid free slab cache under spinlock Move kmem_cache_free out of spinlock protection region for better performance. Change log from v1: o remove spinlock protection for kmem_cache_free in destroy_node_manager suggested by Jaegeuk Kim. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-04-02 09:56:12 +09:00
Jaegeuk Kim	3cb5ad152b	f2fs: call f2fs_wait_on_page_writeback instead of native function If a page is on writeback, f2fs can face with deadlock due to under writepages. This is caused by merging IOs inside f2fs, so if it comes to detect, let's throw merged IOs, which is implemented by f2fs_wait_on_page_writeback. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-20 22:10:04 +09:00
Jaegeuk Kim	50c8cdb35a	f2fs: introduce nr_pages_to_write for segment alignment This patch introduces nr_pages_to_write to align page writes to the segment or other operational unit size, which can be tuned according to the system environment. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-18 16:37:53 +09:00

1 2 3 4 5

213 Commits