Commit Graph

639 Commits

Author SHA1 Message Date
Huang Ying 7704182387 f2fs: use nm_i->next_scan_nid as default for next_free_nid
Now, if there is no free nid in nm_i->free_nid_list, 0 may be saved
into next_free_nid of checkpoint, this may cause useless scanning for
next mount.  nm_i->next_scan_nid should be a better default value than
0.

Signed-off-by: Huang, Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-16 04:10:45 -07:00
Jaegeuk Kim c1ce1b02bb f2fs: give an option to enable in-place-updates during fsync to users
If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file
only starts to try in-place-updates.
And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it
keeps out-of-order manner. Otherwise, it triggers in-place-updates.

This may be used by storage showing very high random write performance.

For example, it can be used when,

Seq. writes (Data) + wait + Seq. writes (Node)

is pretty much slower than,

Rand. writes (Data)

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-16 04:10:44 -07:00
Jaegeuk Kim a7ffdbe22c f2fs: expand counting dirty pages in the inode page cache
Previously f2fs only counts dirty dentry pages, but there is no reason not to
expand the scope.

This patch changes the names on the management of dirty pages and to count
dirty pages in each inode info as well.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-16 04:10:39 -07:00
Jaegeuk Kim 2403c155b8 f2fs: remove lengthy inode->i_ino
This patch is to remove lengthy name by adding a new variable.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-10 17:00:25 -07:00
Jaegeuk Kim 0b4c5afde9 f2fs: fix negative value for lseek offset
If application throws negative value of lseek with SEEK_DATA|SEEK_HOLE,
previous f2fs went into BUG_ON in get_dnode_of_data, which was reported
by Tommi Rantala.

He could make a simple code to detect this having:
	lseek(fd, -17595150933902LL, SEEK_DATA);

This patch should resolve that bug.

Reported-by: Tommi Rentala <tt.rantala@gmail.com>
[Jaegeuk Kim: relocate the condition as suggested by Chao]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09 14:46:36 -07:00
Huang Ying 9a01b56b1a f2fs: avoid node page to be written twice in gc_node_segment
In gc_node_segment, if node page gc is run concurrently with node page
writeback, and check_valid_map and get_node_page run after page locked
and before cur_valid_map is updated as below, it is possible for the
page to be written twice unnecessarily.

			sync_node_pages
			  try_lock_page
			  ...
check_valid_map		  f2fs_write_node_page
			    ...
			    write_node_page
			      do_write_page
			        allocate_data_block
				  ...
				  refresh_sit_entry /* update cur_valid_map */
				  ...
			    ...
			    unlock_page
get_node_page
...
set_page_dirty
...
f2fs_put_page
  unlock_page

This can be solved via calling check_valid_map after get_node_page again.

Signed-off-by: Huang, Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09 13:15:07 -07:00
Gu Zheng 721bd4d5c3 f2fs: use lock-less list(llist) to simplify the flush cmd management
We use flush cmd control to collect many flush cmds, and flush them
together. In this case, we use two list to manage the flush cmds
(collect and dispatch), and one spin lock is used to protect this.
In fact, the lock-less list(llist) is very suitable to this case,
and we use simplify this routine.

-
v2:
-use llist_for_each_entry_safe to fix possible use-after-free issue.
-remove the unused field from struct flush_cmd.
Thanks for Yu's suggestion.
-

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09 13:15:06 -07:00
Chao Yu 184a5cd2ce f2fs: refactor flush_sit_entries codes for reducing SIT writes
In commit aec71382c6 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:

"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
   nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
   journal is full, then flush the left dirty entries to disk without merge
   journaled entries, so these journaled entries may be flushed to disk at next
   checkpoint but lost chance to flushed last time."

Actually, we have the same problem in using SIT journal area.

In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.

In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.

In my testing environment, it shows this patch can help to reduce SIT block
update obviously.

virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
		sit page num	cp count	sit pages/cp
based		2006.50		1349.75		1.486
patched		1566.25		1463.25		1.070

Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns)	dirty sit count
36038		2151
49168		2123
37174		2232

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09 13:15:05 -07:00
Chao Yu d3a14afd5e f2fs: remove unneeded sit_i in macro SIT_BLOCK_OFFSET/START_SEGNO
sit_i in macro SIT_BLOCK_OFFSET/START_SEGNO is not used, remove it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09 13:15:05 -07:00
Jaegeuk Kim b0c44f05a2 f2fs: need fsck.f2fs if the recovery was failed
If the roll-forward recovery was failed, we'd better conduct fsck.f2fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09 13:15:04 -07:00
Jaegeuk Kim ec325b5270 f2fs: handle bug cases by letting fsck.f2fs initiate
This patch adds to handle corner buggy cases for fsck.f2fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09 13:15:03 -07:00
Jaegeuk Kim 05796763b8 f2fs: add BUG cases to initiate fsck.f2fs
This patch replaces BUG cases with f2fs_bug_on to remain fsck.f2fs information.
And it implements some void functions to initiate fsck.f2fs too.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09 13:15:03 -07:00
Jaegeuk Kim 9850cf4a89 f2fs: need fsck.f2fs when f2fs_bug_on is triggered
If any f2fs_bug_on is triggered, fsck.f2fs is needed.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09 13:15:02 -07:00
Jaegeuk Kim 2ae4c673e3 f2fs: retain inconsistency information to initiate fsck.f2fs
This patch adds sbi->need_fsck to conduct fsck.f2fs later.
This flag can only be removed by fsck.f2fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-09 13:14:25 -07:00
Jaegeuk Kim 4081363fbe f2fs: introduce F2FS_I_SB, F2FS_M_SB, and F2FS_P_SB
This patch adds three inline functions to clean up dirty casting codes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-03 17:37:13 -07:00
Chao Yu b73e52824c f2fs: reposition unlock_new_inode to prevent accessing invalid inode
As the race condition on the inode cache, following scenario can appear:
[Thread a]				[Thread b]
					->f2fs_mkdir
					  ->f2fs_add_link
					    ->__f2fs_add_link
					      ->init_inode_metadata failed here
->gc_thread_func
  ->f2fs_gc
    ->do_garbage_collect
      ->gc_data_segment
        ->f2fs_iget
          ->iget_locked
            ->wait_on_inode
					  ->unlock_new_inode
        ->move_data_page
					  ->make_bad_inode
					  ->iput

When we fail in create/symlink/mkdir/mknod/tmpfile, the new allocated inode
should be set as bad to avoid being accessed by other thread. But in above
scenario, it allows f2fs to access the invalid inode before this inode was set
as bad.
This patch fix the potential problem, and this issue was found by code review.

change log from v1:
 o Add condition judgment in gc_data_segment() suggested by Changman Lee.
 o use iget_failed to simplify code.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-02 00:22:24 -07:00
Jaegeuk Kim 3304b56401 f2fs: fix wrong casting for dentry name
The dentry name type is unsigned char *.
If we don't match this type, some character codes can be changed by signed bit.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-29 00:26:50 -07:00
Dan Carpenter 922cedbd00 f2fs: simplify by using a literal
We can make the code a bit simpler because we know that "!retry" is
zero.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-28 09:25:29 -07:00
Jaegeuk Kim c2e69583a4 f2fs: truncate stale block for inline_data
This verifies to truncate any allocated blocks, offset[0], by inline_data.
Not figured out, but for making sure.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-25 14:52:09 -07:00
Chao Yu b5b822050c f2fs: use macro for code readability
This patch introduces DEF_NIDS_PER_INODE/GET_ORPHAN_BLOCKS/F2FS_CP_PACKS macro
instead of numbers in code for readability.

change log from v1:
 o fix typo pointed out by Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-22 13:56:47 -07:00
Chao Yu 9d1589ef2e f2fs: introduce need_do_checkpoint for readability
This patch introduce need_do_checkpoint() to include numerous judgment condition
for readability.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 13:57:07 -07:00
Chao Yu c200b1aa6c f2fs: fix incorrect calculation with total/free inode num
Theoretically, our total inodes number is the same as total node number, but
there are three node ids are reserved in f2fs, they are 0, 1 (node nid), and 2
(meta nid), and they should never be used by user, so our total/free inode
number calculated in ->statfs is wrong.

This patch indroduces F2FS_RESERVED_NODE_NUM and then fixes this issue by
recalculating total/free inode number with the macro.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 13:57:06 -07:00
Jaegeuk Kim 04859dba50 f2fs: remove rename and use rename2
Refer the following patch.

commit 7177a9c4b5
Author: Miklos Szeredi <mszeredi@suse.cz>
Date:   Wed Jul 23 15:15:30 2014 +0200

    fs: call rename2 if exists

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 13:57:04 -07:00
Jaegeuk Kim ec4e7af4ca f2fs: skip if inline_data was converted already
This patch checks inline_data one more time under the inode page lock whether
its inline_data is converted or not.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 13:57:03 -07:00
Jaegeuk Kim 202095a7a0 f2fs: remove rewrite_node_page
I think we need to let the dirty node pages remain in the page cache instead
of rewriting them in their places.
So, after done with successful recovery, write_checkpoint will flush all of them
through the normal write path.
Through this, we can avoid potential error cases in terms of block allocation.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 13:57:02 -07:00
Jaegeuk Kim 764aa3e978 f2fs: avoid double lock in truncate_blocks
The init_inode_metadata calls truncate_blocks when error is occurred.
The callers holds f2fs_lock_op, so we should not call it again in
truncate_blocks.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 13:57:01 -07:00
Jaegeuk Kim 14f4e69085 f2fs: prevent checkpoint during roll-forward
Any checkpoint should not be done during the core roll-forward procedure.
Especially, it includes error cases too.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 13:57:00 -07:00
Jaegeuk Kim b3fe0a0da2 f2fs: add WARN_ON in f2fs_bug_on
This patch adds WARN_ON when f2fs_bug_on is disable to see kernel messages.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 13:56:59 -07:00
Jaegeuk Kim cf779cab14 f2fs: handle EIO not to break fs consistency
There are two rules when EIO is occurred.
1. don't write any checkpoint data to preserve the previous checkpoint
2. don't lose the cached dentry/node/meta pages

So, at first, this patch adds set_page_dirty in f2fs_write_end_io's failure.
Then, writing checkpoint/dentry/node blocks is not allowed.

Note that, for the data pages, we can't just throw away by redirtying them.
Otherwise, kworker can fall into infinite loop to flush them.
(Ref. xfstests/019)

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 13:55:05 -07:00
Jaegeuk Kim 8501017e50 f2fs: check s_dirty under cp_mutex
It needs to check s_dirty under cp_mutex, since s_dirty is reset under that
mutex.
And previous condition was not correct, since we can omit doing checkpoint
when checkpoint was done followed by all the node pages were written back.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 09:21:02 -07:00
Jaegeuk Kim 5274651927 f2fs: unlock_page when node page is redirtied out
This patch fixes missing unlock_page when a node page is redirtied out.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 09:21:01 -07:00
Jaegeuk Kim 1e968fdfe6 f2fs: introduce f2fs_cp_error for readability
This patch adds f2fs_cp_error for readability.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 09:21:00 -07:00
Jaegeuk Kim ed2e621a95 f2fs: give a chance to mount again when encountering errors
This patch gives another chance to try mount process when we encounter an error.
This makes an effect on the roll-forward recovery failures as well.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 09:21:00 -07:00
Jaegeuk Kim 6f12ac25f0 f2fs: trigger release_dirty_inode in f2fs_put_super
The generic_shutdown_super calls sync_filesystem, evict_inode, and then
f2fs_put_super. In f2fs_evict_inode, we remain some dirty inode information
so we should release them at f2fs_put_super.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-21 09:20:29 -07:00
Jaegeuk Kim 97c3c5cac2 f2fs: don't skip checkpoint if there is no dirty node pages
This is the errorneous scenario.
1. write data
2. do checkpoint
3. produce some dirty node pages by the gc thread
4. write back dirty node pages
5. f2fs_put_super will skip the checkpoint, since dirty count for node pages is
  zero.

This patch removes such the wrong condition check.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19 10:01:35 -07:00
Jaegeuk Kim b307384e4f f2fs: avoid bug_on when error is occurred
During the recovery, if an error like EIO or ENOMEM, f2fs_bug_on should skip.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19 10:01:35 -07:00
Jaegeuk Kim 1c35a90e8a f2fs: fix to recover inline_xattr/data and blocks
This patch fixes not to skip xattr recovery and inline xattr/data recovery
order.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19 10:01:34 -07:00
Jaegeuk Kim e3b4d43f7c f2fs: should clear the inline_xattr flag
During the recovery, we should clear the inline_xattr flag if its xattr node
block is recovered.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19 10:01:34 -07:00
Jaegeuk Kim 695facc05a f2fs: clear FI_INC_LINK during the recovery
If an inode are fsynced multiple times with fsync & dent marks, this inode will
set FI_INC_LINK at find_fsync_dnodes during the recovery.
But, in recover_inode, recover_dentry doesn't clear that flag when multiple hits
were occurred.

So this patch removes the flag for the further consistency.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19 10:01:34 -07:00
Jaegeuk Kim 617deb8c05 f2fs: fix the initial inode page for recovery
If a new inode page is needed for recover_dentry, we should assing i_inline
as zero.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19 10:01:34 -07:00
Jaegeuk Kim 0342fd301a f2fs: make clear on test condition and return types
This patch adds a parentheses to make clear for condition check.
And also it changes the return type for better meanings.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19 10:01:33 -07:00
Jaegeuk Kim b067ba1f1b f2fs: should convert inline_data during the mkwrite
If mkwrite is called to an inode having inline_data, it can overwrite the data
index space as NEW_ADDR. (e.g., the first 4 bytes are coincidently zero)

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19 10:01:33 -07:00
arter97 e1c4204520 f2fs: fix typo
Fix typo and some grammatical errors.

The words "filesystem" and "readahead" are being used without the space treewide.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-19 10:01:33 -07:00
Chao Yu b65ee14818 f2fs: use for_each_set_bit to simplify the code
This patch uses for_each_set_bit to simplify some codes in f2fs.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-04 13:20:53 -07:00
Chao Yu 497a0930bb f2fs: add f2fs_balance_fs for expand_inode_data
This patch adds f2fs_balance_fs in expand_inode_data to avoid allocation failure
with segment.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-04 13:02:03 -07:00
Chao Yu 002a41cabb f2fs: invalidate xattr node page when evict inode
When inode is evicted, all the page cache belong to this inode should be
released including the xattr node page. But previously we didn't do this, this
patch fixed this issue.

v2:
 o reposition invalidate_mapping_pages() to the right place suggested by
Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-04 13:01:22 -07:00
Chao Yu 70cfed88ef f2fs: avoid skipping recover_inline_xattr after recover_inline_data
When we recover data of inode in roll-forward procedure, and the inode has both
inline data and inline xattr. We may skip recovering inline xattr if we recover
inline data form node page first.
This patch will fix the problem that we lost inline xattr data in above
scenario.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-02 07:43:51 -07:00
Chao Yu 70407fad85 f2fs: add tracepoint for f2fs_direct_IO
This patch adds a tracepoint for f2fs_direct_IO.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-02 07:34:46 -07:00
Chao Yu b3582c6892 f2fs: reduce competition among node page writes
We do not need to block on ->node_write among different node page writers e.g.
fsync/flush, unless we have a node page writer from write_checkpoint.
So it's better use rw_semaphore instead of mutex type for ->node_write to
promote performance.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-30 23:28:37 -07:00
Jaegeuk Kim 65b85ccce0 f2fs: fix coding style
This patch fixes wrong coding style.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-30 17:25:54 -07:00