It is useless and confusing and may make people believe they may just
change it, which is not true, because this will also change the on-flash
format.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
This patch changes the 'i_nlink' counter handling in 'ubifs_unlink()',
'ubifs_rmdir()' and 'ubifs_rename()'. In these function 'i_nlink' may become 0,
and if 'ubifs_jnl_update()' failed, we would use 'inc_nlink()' to restore
the previous 'i_nlink' value, which is incorrect from the VFS point of view and
would cause a 'WARN_ON()' (see 'inc_nlink() implementation).
This patches saves the previous 'i_nlink' value in a local variable and uses it
at the error path instead of calling 'inc_nlink()'. We do this only for the
inodes where 'i_nlink' may potentially become zero.
This change has been requested by Al Viro <viro@ZenIV.linux.org.uk>.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Most of the time we use the dumping function to dump something in case
of error. We use 'KERN_DEBUG' printk level, and the drawback is that users
do not see them in the console, while they see the other error messages
in the console. The result is that they send bug reports which does not
contain a lot of useful information. This patch changes the printk level
of the dump functions to 'KERN_ERR' to correct the situation.
I documented it in the MTD web site that people have to send the 'dmesg' output
when submitting bug reposts - it did not help.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Fix a brown paperbag bug introduced by me in the previous commit. I was
in hurry and forgot about the non-debug case completely.
Artem: amend the commit message and tweak the patch to preserve alignment.
This made the patch a bit less readable, though.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
* 'linux-next' of git://git.infradead.org/ubifs-2.6:
UBIFS: fix key printing
UBIFS: use snprintf instead of sprintf when printing keys
UBIFS: fix debugging messages
UBIFS: make debugging messages light again
UBI: fix debugging messages
UBI: make vid_hdr non-static
Before commit 56e46742e8 we have had locking
around all printing macros and we could use static buffers for creating
key strings and printing them. However, now we do not have that locking and
we cannot use static buffers. This commit removes the old DBGKEY() macros
and introduces few new helper macros for printing debugging messages plus
a key at the end. Thankfully, all the messages are already structures in
a way that the key is printed in the end.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Switch to 'snprintf()' which is more secure and reliable. This is also a
preparation to the subsequent key printing fixes.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Patch 56e46742e8 broke UBIFS debugging messages:
before that commit when UBIFS debugging was enabled, users saw few useful
debugging messages after mount. However, that patch turned 'dbg_msg()' into
'pr_debug()', so to enable the debugging messages users have to enable them
first via /sys/kernel/debug/dynamic_debug/control, which is very impractical.
This commit makes 'dbg_msg()' to use 'printk()' instead of 'pr_debug()', just
as it was before the breakage.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: stable@kernel.org [3.0+]
We switch to dynamic debugging in commit
56e46742e8 but did not take into account that
now we do not control anymore whether a specific message is enabled or not.
So now we lock the "dbg_lock" and release it in every debugging macro, which
make them not so light-weight.
This commit removes the "dbg_lock" protection from the debugging macros to
fix the issue.
The downside is that now our DBGKEY() stuff is broken, but this is not
critical at all and will be fixed later.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: stable@kernel.org [3.0+]
* 'linux-next' of git://git.infradead.org/ubifs-2.6:
UBI: fix use-after-free on error path
UBI: fix missing scrub when there is a bit-flip
UBIFS: Use kmemdup rather than duplicating its implementation
vfs_create() ignores everything outside of 16bit subset of its
mode argument; switching it to umode_t is obviously equivalent
and it's the only caller of the method
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
vfs_mkdir() gets int, but immediately drops everything that might not
fit into umode_t and that's the only caller of ->mkdir()...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
the cost of taking it into inode_init_always() will be negligible for pipes
and sockets and negative for everything else. Not to mention the removal of
boilerplate code from ->destroy_inode() instances...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The semantic patch that makes this change is available
in scripts/coccinelle/api/memdup.cocci.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
* 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: Add a 'reason' to wb_writeback_work
writeback: send work item to queue_io, move_expired_inodes
writeback: trace event balance_dirty_pages
writeback: trace event bdi_dirty_ratelimit
writeback: fix ppc compile warnings on do_div(long long, unsigned long)
writeback: per-bdi background threshold
writeback: dirty position control - bdi reserve area
writeback: control dirty pause time
writeback: limit max dirty pause time
writeback: IO-less balance_dirty_pages()
writeback: per task dirty rate limit
writeback: stabilize bdi->dirty_ratelimit
writeback: dirty rate control
writeback: add bg_threshold parameter to __bdi_update_bandwidth()
writeback: dirty position control
writeback: account per-bdi accumulated dirtied pages
Replace remaining direct i_nlink updates with a new set_nlink()
updater function.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Replace direct i_nlink updates with the respective updater function
(inc_nlink, drop_nlink, clear_nlink, inode_dec_link_count).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
This creates a new 'reason' field in a wb_writeback_work
structure, which unambiguously identifies who initiates
writeback activity. A 'wb_reason' enumeration has been
added to writeback.h, to enumerate the possible reasons.
The 'writeback_work_class' and tracepoint event class and
'writeback_queue_io' tracepoints are updated to include the
symbolic 'reason' in all trace events.
And the 'writeback_inodes_sbXXX' family of routines has had
a wb_stats parameter added to them, so callers can specify
why writeback is being started.
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
The dark space calculation should be 64 bit type-casted, when
assigning to tmp64 (similar to how total_free is calculated).
Overflow will occur for very large flashes.
Signed-off-by: srimugunthan <srimugunthan.dhandapani@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
With
$ grep -e UBIFS_FS_DEBUG -e DYNAMIC_DEBUG .config
# CONFIG_UBIFS_FS_DEBUG is not set
CONFIG_DYNAMIC_DEBUG=y
Debug messages are kept in the object files due to the
dynamic_pr_debug() macro, even if they are never going to be printed:
$ make fs/ubifs/super.o
$ strings fs/ubifs/super.o | grep 'compiled on'
compiled on: Aug 11 2011 at 12:21:38
Use plain printk to fix this.
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (107 commits)
vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filp
isofs: Remove global fs lock
jffs2: fix IN_DELETE_SELF on overwriting rename() killing a directory
fix IN_DELETE_SELF on overwriting rename() on ramfs et.al.
mm/truncate.c: fix build for CONFIG_BLOCK not enabled
fs:update the NOTE of the file_operations structure
Remove dead code in dget_parent()
AFS: Fix silly characters in a comment
switch d_add_ci() to d_splice_alias() in "found negative" case as well
simplify gfs2_lookup()
jfs_lookup(): don't bother with . or ..
get rid of useless dget_parent() in btrfs rename() and link()
get rid of useless dget_parent() in fs/btrfs/ioctl.c
fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers
drivers: fix up various ->llseek() implementations
fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek
Ext4: handle SEEK_HOLE/SEEK_DATA generically
Btrfs: implement our own ->llseek
fs: add SEEK_HOLE and SEEK_DATA flags
reiserfs: make reiserfs default to barrier=flush
...
Fix up trivial conflicts in fs/xfs/linux-2.6/xfs_super.c due to the new
shrinker callout for the inode cache, that clashed with the xfs code to
start the periodic workers later.
Btrfs needs to be able to control how filemap_write_and_wait_range() is called
in fsync to make it less of a painful operation, so push down taking i_mutex and
the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
file systems can drop taking the i_mutex altogether it seems, like ext3 and
ocfs2. For correctness sake I just pushed everything down in all cases to make
sure that we keep the current behavior the same for everybody, and then each
individual fs maintainer can make up their mind about what to do from there.
Thanks,
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
When the 1st LEB was unmapped and written but 2nd LEB not,
the master node recovery doesn't succeed after power cut.
We see following error when mounting UBIFS partition on NOR
flash:
UBIFS error (pid 1137): ubifs_recover_master_node: failed to recover master node
Correct 2nd master node offset check is needed to fix the
problem. If the 2nd master node is at the end in the 2nd LEB,
first master node is used for recovery. When checking for this
condition we should check whether the master node is exactly at
the end of the LEB (without remaining empty space) or whether
it is followed by an empty space less than the master node size.
Artem: when the error happened, offs2 = 261120, sz = 512, c->leb_size = 262016.
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Signed-off-by: Artem Bityutskiy <dedekind1@gmail.com>
This patch cleans-up and improves the power cut testing:
1. Kill custom 'simple_random()' function and use 'random32()' instead.
2. Make timeout larger
3. When cutting the buffer - fill the end with random data sometimes, not
only with 0xFFs.
4. Some times cut in the middle of the buffer, not always at the end.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Since the recovery testing is effectively about emulating power cuts by UBIFS,
use "power cut" as the base term for all the related variables and name them
correspondingly. This is just a minor clean-up for the sake of readability.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is a clean-up of the power-cut emulation code - remove the custom list of
superblocks which we maintained to find the superblock by the UBI volume
descriptor. We do not need that crud any longer, because now we can get the
superblock as a function argument.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Now when we use UBIFS helpers for all the I/O, we can remove the horrible hack
of re-defining UBI I/O functions.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Instead of using 'ubi_read()' function directly, used the 'ubifs_leb_read()'
helper function instead. This allows to get rid of several redundant error
messages and make sure that we always have a stack dump on read errors.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Introduce the following I/O helper functions: 'ubifs_leb_read()',
'ubifs_leb_write()', 'ubifs_leb_change()', 'ubifs_leb_unmap()',
'ubifs_leb_map()', 'ubifs_is_mapped().
The idea is to wrap all UBI I/O functions in order to encapsulate various
assertions and error path handling (error message, stack dump, switching to R/O
mode). And there are some other benefits of this which will be used in the
following patches.
This patch does not switch whole UBIFS to use these functions yet.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When switching to R/O mode due to an I/O error, always dump the stack, not only
when debugging is enabled.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch contains several minor clean-up and preparational cahnges.
1. Remove 'dbg_read()', 'dbg_write()', 'dbg_change()', and 'dbg_leb_erase()'
functions as they are not used.
2. Remove 'dbg_leb_read()' and 'dbg_is_mapped()' as they are not really needed,
it is fine to let reads go through in failure mode.
3. Rename 'offset' argument to 'offs' to be consistent with the rest of UBIFS
code.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Now we have per-FS (superblock) debugfs knobs, but they have one drawback - you
have to first mount the FS and only after this you can switch self-checks
on/off. But often we want to have the checks enabled during the mount.
Introduce global debugging knobs for this purpose.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Separate out pieces of code from the debugfs file read/write functions and
create separate 'interpret_user_input()'/'provide_user_output()' helpers. These
helpers will be needed in one of the following patches, so this is just a
preparational change.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Move 'dbg_debugfs_init()' and 'dbg_debugfs_exit()' functions which initialize
debugfs for whole UBIFS subsystem below the code which initializes debugfs for
a particular UBIFS instance. And do the same for 'ubifs_debugging_init()' and
'ubifs_debugging_exit()' functions. This layout is a bit better for the next
patches, so this is just a preparation.
Also, rename 'open_debugfs_file()' into 'dfs_file_open()' for consistency.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When we are testing UBIFS recovery, it is better to print in which eraseblock
we are going to fail. Currently UBIFS prints it only if recovery debugging
messages are enabled, but this is not very practical. So change 'dbg_rcvry()'
messages to 'ubifs_warn()' messages.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
UBIFS has many built-in self-check functions which can be enabled using the
debug_chks module parameter or the corresponding sysfs file
(/sys/module/ubifs/parameters/debug_chks). However, this is not flexible enough
because it is not per-filesystem. This patch moves this to debugfs interfaces.
We already have debugfs support, so this patch just adds more debugfs files.
While looking at debugfs support I've noticed that it is racy WRT file-system
unmount, and added a TODO entry for that. This problem has been there for long
time and it is quite standard debugfs PITA. The plan is to fix this later.
This patch is simple, but it is large because it changes many places where we
check if a particular type of checks is enabled or disabled.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
We have too many different debugging checks - lessen the amount by merging all
index-related checks into one. At the same time, move the "force in-the-gap"
test to the "index checks" class, because it is too heavy for the "general"
class.
This patch merges TNC, Old index, and Index size check and calles this just
"index checks".
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch introduces helper functions for all debugging checks, so instead of
doing
if (!(ubifs_chk_flags & UBIFS_CHK_GEN))
we now do
if (!dbg_is_chk_gen(c))
This is a preparation to further changes where the flags will go away, and
we'll need to only change the helper functions, but the code which utilizes
them won't be touched.
At the same time this patch removes 'dbg_force_in_the_gaps()',
'dbg_force_in_the_gaps_enabled()', and dbg_failure_mode helpers for
consistency.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Add 'const struct ubifs_info *c' parameter to 'dbg_check_synced_i_size()'
function because we'll need it in the next patch when we switch to debugfs.
So this patch is just a preparation.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Add 'struct ubifs_info *c' parameter to the 'dbg_check_name()' debugging
function - it will be needed in one of the following commits where we switch to
debugfs. So this is just a preparation.
Mark parameters as 'const' while on it.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Add a couple of comments - while looking into TNC I could not easily figure out
few facts, so it is a good idea to document them in the code.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The UBIFS lpt tree is in many aspects similar to the TNC tree, and we have
similar flags for these trees. And by mistake we use the COW_ZNODE flag for
LPT in some places, instead of the right flag COW_CNODE. And this works
only because these two constants have the same value.
This patch makes all the LPT code to use COW_CNODE and also changes COW_CNODE
constant value to make sure we do not misuse the flags any more.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
We have 3 znode flags: cow, obsolete, dirty. For the last flag we have a
'ubifs_zn_dirty()' helper function, but for the other 2 flags we use
'test_bit()' directly.
This patch makes the situation more consistent and introduces helpers for the
other 2 flags: 'ubifs_zn_cow()' and 'ubifs_zn_obsolete()'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Remove dead pieces of code under "if (c->min_io_size == 1)" statement -
we never execute it because in UBIFS 'c->min_io_size' is always at least 8.
This are leftovers from old pre-mainline prototype.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Remove unnecessary brackets in "inode->i_flags |= (S_NOCMTIME)" statement to
make the code not look silly.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Instead of using long "(inode->i_mode & S_IFMT) != S_IFREG" expression, use
shorted "!S_ISREG(inode->i_mode)".
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Teach 'dbg_dump_inode()' dump directory entries for directory inodes.
This requires few additional changes:
1. The 'c' argument of 'dbg_dump_inode()' cannot be const any more.
2. Users of 'dbg_dump_inode()' should not have 'tnc_mutex' locked.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch lessens the 'struct ubifs_debug_info' size by 90 bytes by
allocating less bytes for the debugfs root directory name. It introduces macros
for the name patter an length instead of hard-coding 100 bytes. It also makes
UBIFS use 'snprintf()' and teaches it to gracefully catch situations when the
name array is too short.
Additionally, this patch makes 2 unrelated changes - I just thought they do not
deserve separate commits: simplifies 'ubifs_assert()' for non-debugging case
and makes 'dbg_debugfs_init()' properly verify debugfs return code which may be
an error code or NULL, so we should you 'IS_ERR_OR_NULL()' instead of
'IS_ERR()'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
If commit failed and it is in broken state, UBIFS switches to R/O mode. Most
operations return -EROFS in this case, except of commit which returns -EINVAL.
Make it return -EROFS too for consistency. This is also important for our power
cut emulation testing.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
d251ed271d "ubifs: fix sget races" left out the goto from this
error path so the static checkers complain that we're dereferencing
"sb" when it's an ERR_PTR.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* allocate ubifs_info in ->mount(), fill it enough for sb_test() and
set ->s_fs_info to it in set() callback passed to sget().
* do *not* free it in ->put_super(); do that in ->kill_sb() after we'd
done kill_anon_super().
* don't free it in ubifs_fill_super() either - deactivate_locked_super()
done by caller when ubifs_fill_super() returns an error will take care
of that sucker.
* get rid of kludge with passing ubi to ubifs_fill_super() in ->s_fs_info;
we only need it in alloc_ubifs_info(), so ubifs_fill_super() will need
only ubifs_info. Which it will find in ->s_fs_info just fine, no need to
reassign anything...
As the result, sb_test() becomes safe to apply to all superblocks that
can be found by sget() (and a kludge with temporary use of ->s_fs_info
to store a pointer to very different structure goes away).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The free space fixup is currently initiated during mount after the call to
ubifs_write_master() which results in a write to PEBs; this has been observed
with the patch 'assert no fixup when writing a node' applied:
Move the free space fixup on mount to before the calls to
ubifs_recover_inl_heads() and ubifs_write_master(). This results in no
assertions with the previously mentioned patch applied.
Artem: tweaked the patch a bit
Signed-off-by: Ben Gardiner <bengardiner@nanometrics>
Reviewed-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The current 'mount_ubifs()' implementation does not initialize the LPT until the
the master node is marked dirty. Move the LPT initialization to before marking
the master node dirty. This is a preparation for the next patch which will move
the free-space-fixup check to before marking the master node dirty, because we
have to fix-up the free space before doing any writes.
Artem: massaged the patch and commit message.
Signed-off-by: Ben Gardiner <bengardiner@nanometrics.ca>
Reviewed-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The current free space fixup can result in some writing to the UBI volume
when the space_fixup flag is set.
To catch instances where UBIFS is writing to the NAND while the space_fixup
flag is set, add an assert to ubifs_write_node().
Artem: tweaked the patch, added similar assertion to the write buffer
write path.
Signed-off-by: Ben Gardiner <bengardiner@nanometrics.ca>
Reviewed-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
UBIFS maintains per-filesystem and global clean znode counters
('c->clean_zn_cnt' and 'ubifs_clean_zn_cnt'). It is important to maintain
correct values there since the shrinker relies on 'ubifs_clean_zn_cnt'.
However, in case of failures during commit the counters were corrupted. E.g.,
if a failure happens in the middle of 'write_index()', then some nodes in the
commit list ('c->cnext') are marked as clean, and some are marked as dirty. And
the 'ubifs_destroy_tnc_subtree()' frees does not retrun correct count, and we
end up with non-zero 'c->clean_zn_cnt' when unmounting. This means that if we
have 2 file-sytem and one of them fails, and we unmount it,
'ubifs_clean_zn_cnt' stays incorrect and confuses the shrinker.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
UBIFS leaks memory on error path in 'ubifs_jnl_update()' in case of write
failure because it forgets to free the 'struct ubifs_dent_node *dent' object.
Although the object is small, the alignment can make it large - e.g., 2KiB
if the min. I/O unit is 2KiB.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable@kernel.org
Sometimes VM asks the shrinker to return amount of objects it can shrink,
and we return the ubifs_clean_zn_cnt in that case. However, it is possible
that this counter is negative for a short period of time, due to the way
UBIFS TNC code updates it. And I can observe the following warnings sometimes:
shrink_slab: ubifs_shrinker+0x0/0x2b7 [ubifs] negative objects to delete nr=-8541616642706119788
This patch makes sure UBIFS never returns negative count of objects.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable@kernel.org
Unfortunately, the recovery fix d1606a59b6be4ea392eabd40d1250aa1eeb19efb
(UBIFS: fix extremely rare mount failure) broke recovery. This commit make
UBIFS drop the last min. I/O unit in all journal heads, but this is needed only
for the GC head. And this does not work for non-GC heads. For example, if
suppose we have min. I/O units A and B, and A contains a valid node X, which
was fsynced, and then a group of nodes Y which spans the rest of A and B. In
this case we'll drop not only Y, but also X, which is obviously incorrect.
This patch fixes the issue and additionally makes recovery to drop last min.
I/O unit only for the GC head, and leave things as they have been for ages for
the other heads - this is safer.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Instead of passing "grouped" parameter to 'ubifs_recover_leb()' which tells
whether the nodes are grouped in the LEB to recover, pass the journal head
number and let 'ubifs_recover_leb()' look at the journal head's 'grouped' flag.
This patch is a preparation to a further fix where we'll need to know the
journal head number for other purposes.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Journal heads are different in a way how UBIFS writes nodes there. All normal
journal heads receive grouped nodes, while the GC journal heads receives
ungrouped nodes. This patch adds a 'grouped' flag to 'struct ubifs_jhead' which
describes this property.
This patch is a preparation to a further recovery fix.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Commit ab51afe05273741f72383529ef488aa1ea598ec6 was a good clean-up, but
it introduced a regression - now UBIFS prints scary error messages during
recovery on all corrupted nodes, even though the corruptions are expected
(due to a power cut). This patch fixes the issue.
Additionally fix a typo in a commentary introduced by the same commit.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Commit 1495f230fa ("vmscan: change shrinker API by passing
shrink_control struct") changed the API of ->shrink(), but missed ubifs
and cifs instances.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ubifs does not have problems with references to unlinked directories.
CC: Artem Bityutskiy <dedekind1@gmail.com>
CC: Adrian Hunter <adrian.hunter@nokia.com>
CC: linux-mtd@lists.infradead.org
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
anything else, so that the filesystem can track internally if it
needs to push out a transaction for fdatasync or not.
This is just the prototype change with no user for it yet. I plan
to push large XFS changes for the next merge window, and getting
this trivial infrastructure in this window would help a lot to avoid
tree interdependencies.
Also remove incorrect comments that ->dirty_inode can't block. That
has been changed a long time ago, and many implementations rely on it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (25 commits)
cifs: remove unnecessary dentry_unhash on rmdir/rename_dir
ocfs2: remove unnecessary dentry_unhash on rmdir/rename_dir
exofs: remove unnecessary dentry_unhash on rmdir/rename_dir
nfs: remove unnecessary dentry_unhash on rmdir/rename_dir
ext2: remove unnecessary dentry_unhash on rmdir/rename_dir
ext3: remove unnecessary dentry_unhash on rmdir/rename_dir
ext4: remove unnecessary dentry_unhash on rmdir/rename_dir
btrfs: remove unnecessary dentry_unhash in rmdir/rename_dir
ceph: remove unnecessary dentry_unhash calls
vfs: clean up vfs_rename_other
vfs: clean up vfs_rename_dir
vfs: clean up vfs_rmdir
vfs: fix vfs_rename_dir for FS_RENAME_DOES_D_MOVE filesystems
libfs: drop unneeded dentry_unhash
vfs: update dentry_unhash() comment
vfs: push dentry_unhash on rename_dir into file systems
vfs: push dentry_unhash on rmdir into file systems
vfs: remove dget() from dentry_unhash()
vfs: dentry_unhash immediately prior to rmdir
vfs: Block mmapped writes while the fs is frozen
...
Only a few file systems need this. Start by pushing it down into each
rename method (except gfs2 and xfs) so that it can be dealt with on a
per-fs basis.
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Only a few file systems need this. Start by pushing it down into each
fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs
basis.
This does not change behavior for any in-tree file systems.
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Switch to debugging using dynamic printk (pr_debug()). There is no good reason
to carry custom debugging prints if there is so cool and powerful generic
dynamic printk infrastructure, see Documentation/dynamic-debug-howto.txt. With
dynamic printks we can switch on/of individual prints, per-file, per-function
and per format messages. This means that instead of doing old-fashioned
echo 1 > /sys/module/ubifs/parameters/debug_msgs
to enable general messages, we can do:
echo 'format "UBIFS DBG gen" +ptlf' > control
to enable general messages and additionally ask the dynamic printk
infrastructure to print process ID, line number and function name. So there is
no reason to keep UBIFS-specific crud if there is more powerful generic thing.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is a minor fix for UBIFS kernel-doc comments - we forgot the "@" symbol
for several 'struct ubifs_debug_info'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch fixes an extremely rare mount failure after a power cut, when mount
fails with ENOSPC error because UBIFS could not find the GC LEB.
In short, the reason for this failure is that after recovery the GC head LEB
contains less free space than it had contained just before the power cut
happened. As a result, if the FS is full, 'ubifs_rcvry_gc_commit()' is unable
to find a dirty LEB to GC and a free LEB, so mount fails.
This patch contains a huge comment with more detailed explanation, please refer
that comment.
Since this is really really rare and unlikely situation, I do not send this
patch to the stable tree, also because it requires a lot of preparation
patches which I did before. So sending this to -stable would be too risky.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Further simplify 'ubifs_recover_leb()' by noticing that we have to call
'clean_buf()' in any case, and it is fine to call it if the offset is
aligned to 'c->min_io_size'. Thus, we do not have to call it separately
from every "if" - just call it once at the end.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Now when we call 'ubifs_recover_leb()' only for LEBs which are potentially
corrupted (i.e., only for last buds, not for all of them), we can cleanup every
LEB, not only those where we find corruption. The reason - unstable bits. Even
though the LEB may look good now, it might contain unstable bits which may hit
us a bit later.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch cleans up 'ubifs_recover_leb()' function and makes it more readable.
Move things which are done only once out of the loop and kill unneeded 'switch'
statement.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
If a UBIFS filesystem is being mounted read-write, or is being remounted
from read-only to read-write, check for the "space_fixup" flag and fix
all LEBs containing empty space if necessary.
Artem: tweaked the patch a bit
Signed-off-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch adds the 'ubifs_fixup_free_space()' function which scans all
LEBs in the filesystem for those that are in-use but have one or more
empty pages, then re-maps the LEBs in order to erase the empty portions.
Afterward it removes the "space_fixup" flag from the UBIFS superblock.
Artem: massaged the patch
Signed-off-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The 'space_fixup' flag can be set in the superblock of a new filesystem by
mkfs.ubifs to indicate that any eraseblocks with free space remaining should be
fixed-up the first time it's mounted (after which the flag is un-set). This
means that the UBIFS image has been flashed by a "dumb" flasher and the free
space has been actually programmed (writing all 0xFFs), so this free space
cannot be used. UBIFS fixes the free space up by re-writing the contents of all
LEBs with free space using the atomic LEB change UBI operation.
Artem: improved commit message, add some more commentaries to the code.
Signed-off-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
We'll need to use the 'next_log_lnum()' helper function from log.c in the fixup
code, so let's move it to misc.h. IOW, this is a preparation to the following
free space fixup changes.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch improves UBIFS recovery and teaches it to expect corruption only
in the last buds. Indeed, currently we just recover all buds, which is
incorrect because only the last buds can have corruptions in case of a power
cut. So it is inconsistent with the rest of the recovery strategy which tries
hard to distinguish between corruptions cause by power cuts and other types of
corruptions.
This patch also adds one quirk - a bit older UBIFS was could have corruption in
the next to last bud because of the way it switched buds: when bud A is full,
it first searched for the next bud B, the wrote a reference node to the log
about B, and then synchronized the write-buffer of A. So we could end up with
buds A and B, where B is the last, but A had corruption. The UBIFS behavior
was fixed, though, so currently it always first synchronizes A's write-buffer
and only after this adds B to the log. However, to be make sure that we handle
unclean (after a power cut) UBIFS images belonging to older UBIFS - we need to
add a quirk and keep it for some time: we need to check for the situation
described above.
Thankfully, it is easy to check for that situation. When UBIFS adds B to the
log, it always first unmaps B, then maps it, and then syncs A's write-buffer.
Thus, in that situation we can check that B is empty, in which case it is OK to
have corruption in A. To check that B is empty it is enough to just read the
first few bytes of the bud and compare them with 0xFFs. This quirk may be
removed in a couple of years.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Currently when UBIFS fills up the current bud (which is the last in the journal
head) and switches to the next bud, it first writes the log reference node for
the next bud and only after this synchronizes the write-buffer of the previous
bud. This is not a big deal, but an unclean power cut may lead to a situation
when we have corruption in a next-to-last bud, although it is much more logical
that we have to have corruption only in the last bud.
This patch also removes write-buffer synchronization from
'ubifs_wbuf_seek_nolock()' because this is not needed anymore (we synchronize
the write-buffer explicitly everywhere now) and also because this is just
prone to various errors.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Remove a 'BUG()' statement when we are unable to find a bud and add a
similar 'ubifs_assert()' statement instead.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is a minor preparation patch which changes 'replay_bud()' interface -
instead of passing bud lnum, offs, jhead, etc directly, pass a pointer to the
bud entry which contains all the information. The bud entry will be also needed
in one of the following patches.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch simplifies replay even further - it removes the replay tree and
adds the replay list instead. Indeed, we just do not need to use a tree here -
all we need to do is to add all nodes to the list and then sort it. Using
RB-tree is an overkill - more code and slower. And since we replay buds in
order, we expect the nodes to follow in _mostly_ sorted order, so the merge
sort becomes much cheaper in average than an RB-tree.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch simplifies the replay code and makes it smaller. First of all, we
can notice that we do not really need to create bud replay entries and insert
them to the replay tree, because the only reason we do this is to set buds
lprops correctly at the end. Instead, we can just walk the list of buds at the
very end and set lprops for each bud. This allows us to get rid of whole
'insert_ref_node()' function, the 'REPLAY_REF' flag, and several fields in
'struct replay_entry'. Then we can also notice that we do not need the 'flags'
'struct replay_entry' field, because there is only one flag -
'REPLAY_DELETION'. Instead, we can just add a 'deletion' bit fields. As a
result, this patch deletes much more lines that in adds.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is just a small preparation patch which adds 'free' and 'drity' fields to
'struct bud_entry'. They will be used to set bud lprops.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is patch removes an unnecessary 'offs' variable from 'ubifs_wbuf_write_nolock()'
- we can just keep 'wbuf->offs' up-to-date instead. This patch is very minor
the only motivation for it was that it is cleaner to keep wbuf->offs up-to-date
by the time we call 'ubifs_leb_write()'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Commit 52c6e6f990 provides misleading infomation
in the commit messages - buds are replied in order. And the real reason why
that fix helped is probably because it made sure we seek head even in read-only
mode (so deferred recovery will have seeked heads).
This patch adds an assertion which will fire if we reply buds out of order.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is a minor change which makes 2 functions static because they
are not used outside the gc.c file: 'data_nodes_cmp()' and
'nondata_nodes_cmp()'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Now we return all errors from 'scan_check_cb()' directly, so we do not need
'struct scan_check_data' any more, and this patch removes it.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Simplify error path in 'scan_check_cb()' and stop using the special 'data->err'
field, but instead return the error code directly.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When doing the lprops extra check ('dbg_check_lprops()') we scan whole media.
We even scan empty and freeable LEBs which may contain garbage, which we handle
after scanning. This patch teach the lprops checking function
('scan_check_cb()') to avoid scanning for free and freeable LEBs and save time.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When re-mounting from R/O mode to R/W mode and the LEB count in the superblock
is not up-to date, because for the underlying UBI volume became larger, we
re-write the superblock. We allocate RAM for these purposes, but never free it.
So this is a memory leak, although very rare one.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable@kernel.org
This patch fixes a problem with the following symptoms:
UBIFS: deferred recovery completed
UBIFS error (pid 15676): dbg_check_synced_i_size: ui_size is 11481088, synced_i_size is 11459081, but inode is clean
UBIFS error (pid 15676): dbg_check_synced_i_size: i_ino 128, i_mode 0x81a4, i_size 11481088
It happens when additional debugging checks are enabled and we are recovering
from a power cut. When we fixup corrupted inode size during recovery, we change
them in-place and we change ui_size as well, but not synced_i_size, which
causes this failure. This patch makes sure we change both fields and fixes the
issue.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When the debugging self-checks are enabled, we go trough whole file-system
after mount and check/validate every single node referred to by the index.
This is implemented by the 'dbg_check_filesystem()' function. However, this
function fails if we mount "unclean" file-system, i.e., if we mount the
file-system after a power cut. It fails with the following symptoms:
UBIFS DBG (pid 8171): ubifs_recover_size: ino 937 size 3309925 -> 3317760
UBIFS: recovery deferred
UBIFS error (pid 8171): check_leaf: data node at LEB 1000:0 is not within inode size 3309925
The reason of failure is that recovery fixed up the inode size in memory, but
not on the flash so far. So the value on the flash is incorrect so far,
and would be corrected when we re-mount R/W. But 'check_leaf()' ignores
this fact and tries to validate the size of the on-flash inode, which is
incorrect, so it fails.
This patch teaches the checking code to look at the VFS inode cache first,
and if there is the inode in question, use that inode instead of the inode
on the flash media. This fixes the issue.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
In 'ubifs_recover_size()' we have an "if (!e->inode && c->ro_mount)" statement.
But if 'c->ro_mount' is true, then '!e->inode' must always be true as well. So
we can remove the unnecessary '!e->inode' test and put an
'ubifs_assert(!e->inode)' instead.
This patch also removes an extra trailing white-space in a debugging print,
as well as adds few empty lines to 'ubifs_recover_size()' to make it a bit more
readable.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When recovering the inode size, one of the debugging messages was printed
incorrecly, this patches fixes it.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This commits refactors and cleans up 'ubifs_rcvry_gc_commit()' which was quite
untidy, also removes the commentary which was not 100% correct.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Split the 'ubifs_rcvry_gc_commit()' function and introduce a 'grab_empty_leb()'
heler. This cleans 'ubifs_rcvry_gc_commit()' a little and makes it a bit less
of spagetti.
Also, add a commentary which explains why it is crucial to first search for an
empty LEB and then run commit.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When UBIFS is in the failure mode (used for power cut emulation testing) we for
some reasons do not dump the stack in many places, e.g., in assertions.
Probably at early days we had too many of them and disabled this to make the
development easier, but then never enabled. Nowadays I sometimes observe
assertion failures during power cut testing, but the useful stackdump is not
printed, which is bad. This patch makes UBIFS always print the stackdump when
debugging is enabled.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
If we fail to recover the gc_lnum we just return an error and it then
it is difficult to figure out why this happened. This patch adds useful
debugging information which should make it easier to debug the failure.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch removes a piece of code in 'ubifs_rcvry_gc_commit()' which is never
executed. We call 'ubifs_find_dirty_leb()' function with min_space =
wbuf->offs, so if it returns us an LEB, it is guaranteed to have at lease
'wbuf->offs' bytes of free+dirty space. So we can remove the subsequent code
which deals with "returned LEB has less than 'wbuf->offs' bytes of free+dirty
space". This simplifies 'ubifs_rcvry_gc_commit()' a little.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
We have duplicated code in 'ubifs_garbage_collect()' and
'ubifs_rcvry_gc_commit()', which is about handling the special case of free
LEB. In both cases we just want to garbage-collect the LEB using
'ubifs_garbage_collect_leb()'.
This patch teaches 'ubifs_garbage_collect_leb()' to handle free LEB's so that
the caller does not have to do this and the duplicated code is removed.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Remove the following commentary from 'ubifs_file_mmap()':
/* 'generic_file_mmap()' takes care of NOMMU case */
I do not understand what it means, and I could not find anything relater to
NOMMU in 'generic_file_mmap()'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch is a tiny improvement which removes few bytes of code.
UBIFS debugfs files are non-seekable and the file position is ignored,
so do not increase it in the write handler.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The 'dbg_dump_lprop()' is trying to detect journal head LEBs when printing,
so it looks at the write-buffers. However, if we are in R/O mode, we
de-allocate the write-buffers, so 'dbg_dump_lprop()' oopses. This patch fixes
the issue.
Note, this patch is not critical, it is only about the debugging code path, and
it is unlikely that anyone but UBIFS developers would ever hit this issue.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
We have our own flags indicating R/O mode, and c->ro_mode is equivalent
to MS_RDONLY. Let's be consistent and use UBIFS flags everywhere.
This patch is just a minor cleanup.
Additionally, add a comment that we are surprised with VFS behavior -
as a reminder to look at this some day.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When the debugging failure emulation is enabled and UBIFS decides to
emulate an I/O error, it uses EIO error code. In which case UBIFS
switches into R/O mode later on. The for the user-space is that when
a failure is emulated, the file-system sometimes returns EIO and
sometimes EROFS. This makes it more difficult to implement user-space
tests for the failure mode. Let's be consistent and return EROFS in
all the cases.
This patch is an improvement for the debugging code and does not affect
the functionality at all.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is just a tiny clean-up patch. The variable name for empty address
space operations is "empty_aops". Let's use consistent names for empty
inode and file operations: "empty_iops" and "empty_fops", instead of
inconsistent "none_inode_operations" and "none_file_operations".
Artem: re-write the commit message.
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Try to improve UBIFS testing coverage by randomly picking LEBs to
store in lsave, rather than picking them optimally. Create a debugging
version of 'populate_lsave()' for these purposes and enable it when
general debugging self-checks are enabled.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
UBIFS can force itself to use the 'in-the-gaps' commit method - the last resort
method which is normally invoced very very rarely. Currently this "force
int-the-gaps" debugging feature is a separate test mode. But it is a bit saner
to make it to be the "general" self-test check instead.
This patch is just a clean-up which should make the debugging code look a bit
nicer and easier to use - we have way too many debugging options.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch improves the 'dbg_check_space_info()' function which checks
whether the amount of space before re-mounting and after re-mounting
is the same (remounting from R/O to R/W modes and vice-versa).
The problem is that 'dbg_check_space_info()' does not save the budgeting
information before re-mounting, so when an error is reported, we do not
know why the amount of free space changed.
This patches makes the following changes:
1. Teaches 'dbg_dump_budg()' function to accept a 'struct ubifs_budg_info'
argument and print out the this argument. This way we may ask it to
print any saved budgeting info, no only the current one.
2. Accordingly changes all the callers of 'dbg_dump_budg()' to comply with
the changed interface.
3. Introduce a 'saved_bi' (saved budgeting info) field to
'struct ubifs_debug_info' and save the budgeting info before re-mounting
there.
4. Change 'dbg_check_space_info()' and make it print both old and new
budgeting information.
5. Additionally, save 'c->igx_gc_cnt' and print it if and error happens. This
value contributes to the amount of free space, so we have to print it.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Re-arrange the budget dump and make sure we first dump all
the 'struct ubifs_budg_info' fields, and then the other information.
Additionally, print the 'uncommitted_idx' variable.
This change is required for to the following dumping function
enhancement where it will be possible to dump saved
'struct ubifs_budg_info' objects, not only the current one.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The current 'dbg_dump_budg()' calling convention is that the
'c->space_lock' spinlock is held. However, none of the callers
actually use it from contects which have 'c->space_lock' locked,
so all callers have to explicitely lock and unlock the spinlock.
This is not very sensible convention. This patch changes it and
makes 'dbg_dump_budg()' lock the spinlock instead of imposing this
to the callers. This simplifies the code a little.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch separates out all the budgeting-related information
from 'struct ubifs_info' to 'struct ubifs_budg_info'. This way the
code looks a bit cleaner. However, the main driver for this is
that we want to save budgeting information and print it later,
so a separate data structure for this is helpful.
This patch is a preparation for the further debugging output
improvements.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
There was an attempt to standartize various "__attribute__" and
other macros in order to have potentially portable and more
consistent code, see commit 82ddcb0405.
Note, that commit refers Rober Love's blog post, but the URL
is broken, the valid URL is:
http://blog.rlove.org/2005/10/with-little-help-from-your-compiler.html
Moreover, nowadays checkpatch.pl warns about using
__attribute__((packed)):
"WARNING: __packed is preferred over __attribute__((packed))"
It is not a big deal for UBIFS to use __packed, so let's do it.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Fix several minor stylistic issues:
* lines longer than 80 characters
* space before closing parenthesis ')'
* spaces in the indentations
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Turn the debufs files UBIFS maintains into non-seekable. Indeed, none
of them is supposed to be seek'ed.
Do this by making the '.lseek()' handler to be 'no_llseek()' and by
using 'nonseekable_open()' in the '.open()' operation.
This does mean an API break but this debugging API is only used by a couple
of test scripts which do not rely in the 'llseek()' operation.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is the second fix of the following symptom:
UBIFS error (pid 34456): could not find an empty LEB
which sometimes happens after power cuts when we mount the file-system - UBIFS
refuses it with the above error message which comes from the
'ubifs_rcvry_gc_commit()' function. I can reproduce this using the integck test
with the UBIFS power cut emulation enabled.
Analysis of the problem.
Currently UBIFS replay seeks the journal heads to the last _replayed_ bud.
But the buds are replayed out-of-order, so the replay basically seeks journal
heads to the "random" bud belonging to this head, and not to the _last_ one.
The result of this is that the GC head may be seeked to a full LEB with no free
space, or very little free space. And 'ubifs_rcvry_gc_commit()' tries to find a
fully or mostly dirty LEB to match the current GC head (because we need to
garbage-collect that dirty LEB at one go, because we do not have @c->gc_lnum).
So 'ubifs_find_dirty_leb()' fails and we fall back to finding an empty LEB and
also fail. As a result - recovery fails and mounting fails.
This patch teaches the replay to initialize the GC heads exactly to the latest
buds, i.e. the buds which have the largest sequence number in corresponding
log reference nodes.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable@kernel.org
Currently UBIFS has a small optimization - it frees write-buffers when it is
re-mounted from R/W mode to R/O mode. Of course, when it is mounted R/O, it
does not allocate write-buffers as well.
This optimization is nice but it leads to subtle problems and complications
in recovery, which I can reproduce using the integck test. The symptoms are
that after a power cut the file-system cannot be mounted if we first mount
it R/O, and then re-mount R/W - 'ubifs_rcvry_gc_commit()' prints:
UBIFS error (pid 34456): could not find an empty LEB
Analysis of the problem.
When mounting R/W, the reply process sets journal heads to buds [1], but
when mounting R/O - it does not do this, because the write-buffers are not
allocated. So 'ubifs_rcvry_gc_commit()' works completely differently for the
same file-system but for the following 2 cases:
1. mounting R/W after a power cut and recover
2. mounting R/O after a power cut, re-mounting R/W and run deferred recovery
In the former case, we have journal heads seeked to the a bud, in the latter
case, they are non-seeked (wbuf->lnum == -1). So in the latter case we do not
try to recover the GC LEB by garbage-collecting to the GC head, but we just
try to find an empty LEB, and there may be no empty LEBs, so we just fail.
On the other hand, in the former case (mount R/W), we are able to make a GC LEB
(@c->gc_lnum) by garbage-collecting.
Thus, let's remove this small nice optimization and always allocate
write-buffers. This should not make too big difference - we have only 3
of them, each of max. write unit size, which is usually 2KiB. So this is
about 6KiB of RAM for the typical case, and only when mounted R/O.
[1]: Note, currently the replay process is setting (seeking) the journal heads
to _some_ buds, not necessarily to the buds which had been the journal heads
before the power cut happened. This will be fixed separately.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable@kernel.org
This patch fixes the following symptoms:
1. Unmount UBIFS cleanly.
2. Start mounting UBIFS R/W and have a power cut immediately
3. Start mounting UBIFS R/O, this succeeds
4. Try to re-mount UBIFS R/W - this fails immediately or later on,
because UBIFS will write the master node to the flash area
which has been written before.
The analysis of the problem:
1. UBIFS is unmounted cleanly, both copies of the master node are clean.
2. UBIFS is being mounter R/W, starts changing master node copy 1, and
a power cut happens. The copy N1 becomes corrupted.
3. UBIFS is being mounted R/O. It notices the copy N1 is corrupted and
reads copy N2. Copy N2 is clean.
4. Because of R/O mode, UBIFS cannot recover copy 1.
5. The mount code (ubifs_mount()) sees that the master node is clean,
so it decides that no recovery is needed.
6. We are re-mounting R/W. UBIFS believes no recovery is needed and
starts updating the master node, but copy N1 is still corrupted
and was not recovered!
Fix this problem by marking the master node as dirty every time we
recover it and we are in R/O mode. This forces further recovery and
the UBIFS cleans-up the corruptions and recovers the copy N1 when
re-mounting R/W later.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable@kernel.org
When UBIFS switches to R/O mode because it detects I/O failures, then
when we unmount, we still may have allocated budget, and the assertions
which verify that we have not budget will fire. But it is expected to
have the budget in case of I/O failures, so the assertion warnings will
be false. Suppress them for the I/O failure case.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch fixes UBIFS mount failure when the debugging support is enabled,
we are recovering from a power cut, we were first mounter R/O and we are
re-mounting R/W. In this case we should not assume that the amount of free
space before we have re-mounted R/W and after are equivalent, because
when we have mounted R/O the file-system is in a non-committed state so
the amount of free space is slightly smaller, due to the fact that we cannot
predict the amount of free space precisely before we commit.
This patch fixes the issue by skipping the debugging check in case of
recovery. This issue was reported by Caizhiyong <caizhiyong@huawei.com>
here: http://thread.gmane.org/gmane.linux.drivers.mtd/34350/focus=34387
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reported-by: Caizhiyong <caizhiyong@huawei.com>
Cc: stable@kernel.org [2.6.30+]
When compiling UBIFS with CONFIG_UBIFS_FS_DEBUG not set,
gcc-4.5.2 generates a slew of "warning: statement with no effect"
on references to non-void functions defined as 0.
To avoid these warnings, replace #defines with dummy inline functions.
Artem: massage the patch a bit, also remove the duplicate
'dbg_check_lprops()' prototype.
Signed-off-by: Maksim Rayskiy <maksim.rayskiy@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch fixes severe UBIFS bug: UBIFS oopses when we 'fsync()' an
file on R/O-mounter file-system. We (the UBIFS authors) incorrectly
thought that VFS would not propagate 'fsync()' down to the file-system
if it is read-only, but this is not the case.
It is easy to exploit this bug using the following simple perl script:
use strict;
use File::Sync qw(fsync sync);
die "File path is not specified" if not defined $ARGV[0];
my $path = $ARGV[0];
open FILE, "<", "$path" or die "Cannot open $path: $!";
fsync(\*FILE) or die "cannot fsync $path: $!";
close FILE or die "Cannot close $path: $!";
Thanks to Reuben Dowle <Reuben.Dowle@navico.com> for reporting about this
issue.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reported-by: Reuben Dowle <Reuben.Dowle@navico.com>
Cc: stable@kernel.org
* 'for-linus' of git://git.infradead.org/ubifs-2.6:
UBI: do not select KALLSYMS_ALL
UBI: do not compare array with NULL
UBI: check if we are in RO mode in the erase routine
UBIFS: fix debugging failure in dbg_check_space_info
UBIFS: fix error path in dbg_debugfs_init_fs
UBIFS: unify error path dbg_debugfs_init_fs
UBIFS: do not select KALLSYMS_ALL
UBIFS: fix assertion warnings
UBIFS: fix oops on error path in read_pnode
UBIFS: do not read flash unnecessarily
With the ->sync_page() hook gone, we have a few users that
add their own static address_space_operations without any
functions defined.
fs/inode.c already has an empty_aops that it uses for init
purposes. Lets export that and use it in the places where
an otherwise empty aops was defined.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
This patch fixes a debugging failure with which looks like this:
UBIFS error (pid 32313): dbg_check_space_info: free space changed from 6019344 to 6022654
The reason for this failure is described in the comment this patch adds
to the code. But in short - 'c->freeable_cnt' may be different before
and after re-mounting, and this is normal. So the debugging code should
make sure that free space calculations do not depend on 'c->freeable_cnt'.
A similar issue has been reported here:
http://lists.infradead.org/pipermail/linux-mtd/2011-April/034647.html
This patch should fix it.
For the -stable guys: this patch is only relevant for kernels 2.6.30
onwards.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable@kernel.org [2.6.30+]
The debug interface is substandard and on error returns either
NULL or an error code packed in the pointer. So using "IS_ERR"
for the pointers returned by debugfs function is incorrect.
Instead, we should use IS_ERR_OR_NULL.
This path is an improved vestion of the original patch from
Phil Carmody.
Reported-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Acked-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
This is just a small clean-up patch which simlifies and unifies the
error path in the dbg_debugfs_init_fs(). We have common error path
for all failure cases in this function except of the very first
case. And this patch makes the first failure case use the same
error path as the other cases by using the 'fname' and 'dent'
variables.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Acked-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
All UBIFS needs is to make sure we stacktraces when UBIFS debugging
is enabled. It is enough to select KALLSYMS for this, KALLSYMS_ALL
is not necessary. Moreover, Randy Dunlap reported that UBIFS causes
the following Kconfig dependency warning:
warning: (UBIFS_FS_DEBUG && LOCKDEP && LATENCYTOP) selects KALLSYMS_ALL
which has unmet direct dependencies (DEBUG_KERNEL && KALLSYMS)
The reason is that KALLSYMS_ALL requires DEBUG_KERNEL and KALLSYMS, so
ideally, to select KALLSYMS_ALL we'd need to select DEBUG_KERNEL and
KALLSYMS first.
This seems to be too much to select. The easiest way to go is to forget
about KALLSYMS_ALL and just select KALLSYMS when UBIFS debugging is
enabled - that should be enough for stackdumps.
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
This patch fixes UBIFS assertion warnings like:
UBIFS assert failed in ubifs_leb_unmap at 135 (pid 29365)
Pid: 29365, comm: integck Tainted: G I 2.6.37-ubi-2.6+ #34
Call Trace:
[<ffffffffa047c663>] ubifs_lpt_init+0x95e/0x9ee [ubifs]
[<ffffffffa04623a7>] ubifs_remount_fs+0x2c7/0x762 [ubifs]
[<ffffffff810f066e>] do_remount_sb+0xb6/0x101
[<ffffffff81106ff4>] ? do_mount+0x191/0x78e
[<ffffffff811070bb>] do_mount+0x258/0x78e
[<ffffffff810da1e8>] ? alloc_pages_current+0xa2/0xc5
[<ffffffff81107674>] sys_mount+0x83/0xbd
[<ffffffff81009a12>] system_call_fastpath+0x16/0x1b
They happen when we re-mount from R/O mode to R/W mode. While
re-mounting, we write to the media, but we still have the c->ro_mount
flag set. The fix is very simple - just clear the flag before
starting re-mounting R/W.
These warnings are caused by the following commit:
2ef13294d2
For -stable guys: this bug was introduced in 2.6.38, this is materieal
for 2.6.38-stable.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable@kernel.org [2.6.38]
Thanks to coverity which spotted that UBIFS will oops if 'kmalloc()'
in 'read_pnode()' fails and we dereference a NULL 'pnode' pointer
when we 'goto out'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable@kernel.org
This fix makes the 'dbg_check_old_index()' function return
immediately if debugging is disabled, instead of executing
incorrect 'goto out' which causes UBIFS to:
1. Allocate memory
2. Read the flash
On every commit. OK, we do not commit that often, but it is
still silly to do unneeded I/O anyway.
Credits to coverity for spotting this silly issue.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable@kernel.org
* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
Documentation/iostats.txt: bit-size reference etc.
cfq-iosched: removing unnecessary think time checking
cfq-iosched: Don't clear queue stats when preempt.
blk-throttle: Reset group slice when limits are changed
blk-cgroup: Only give unaccounted_time under debug
cfq-iosched: Don't set active queue in preempt
block: fix non-atomic access to genhd inflight structures
block: attempt to merge with existing requests on plug flush
block: NULL dereference on error path in __blkdev_get()
cfq-iosched: Don't update group weights when on service tree
fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
block: Require subsystems to explicitly allocate bio_set integrity mempool
jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
fs: make fsync_buffers_list() plug
mm: make generic_writepages() use plugging
blk-cgroup: Add unaccounted time to timeslice_used.
block: fixup plugging stubs for !CONFIG_BLOCK
block: remove obsolete comments for blkdev_issue_zeroout.
blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
...
Fix up conflicts in fs/{aio.c,super.c}
This patch fixes the following UBIFS assertion warning:
UBIFS assert failed in do_readpage at 115 (pid 199)
[<b00321b8>] (unwind_backtrace+0x0/0xdc) from [<af025118>]
(do_readpage+0x108/0x594 [ubifs])
[<af025118>] (do_readpage+0x108/0x594 [ubifs]) from [<af025764>]
(ubifs_write_end+0x1c0/0x2e8 [ubifs])
[<af025764>] (ubifs_write_end+0x1c0/0x2e8 [ubifs]) from
[<b00a0164>] (generic_file_buffered_write+0x18c/0x270)
[<b00a0164>] (generic_file_buffered_write+0x18c/0x270) from
[<b00a08d4>] (__generic_file_aio_write+0x478/0x4c0)
[<b00a08d4>] (__generic_file_aio_write+0x478/0x4c0) from
[<b00a0984>] (generic_file_aio_write+0x68/0xc8)
[<b00a0984>] (generic_file_aio_write+0x68/0xc8) from
[<af024a78>] (ubifs_aio_write+0x178/0x1d8 [ubifs])
[<af024a78>] (ubifs_aio_write+0x178/0x1d8 [ubifs]) from
[<b00d104c>] (do_sync_write+0xb0/0x100)
[<b00d104c>] (do_sync_write+0xb0/0x100) from [<b00d1abc>]
(vfs_write+0xac/0x154)
[<b00d1abc>] (vfs_write+0xac/0x154) from [<b00d1c10>]
(sys_write+0x3c/0x68)
[<b00d1c10>] (sys_write+0x3c/0x68) from [<b002d9a0>]
(ret_fast_syscall+0x0/0x2c)
The 'PG_checked' flag is used to indicate that the page does not
supposedly exist on the media (e.g., a hole or a page beyond the
inode size), so it requires slightly bigger budget, because we have
to account the indexing size increase. And this flag basically
tells that the budget for this page has to be "new page budget".
The "new page budget" is slightly bigger than the "existing page
budget".
The 'do_readpage()' function has the following assertion which
sometimes is hit: 'ubifs_assert(!PageChecked(page))'. Obviously,
the meaning of this assertion is: "I should not be asked to read
a page which does not exist on the media".
However, in 'ubifs_write_begin()' we have a small "trick". Notice,
that VFS may write pages which were not read yet, so the page data
were not loaded from the media to the page cache yet. If VFS tells
that it is going to change only some part of the page, we obviously
have to load it from the media. However, if VFS tells that it is
going to change whole page, we do not read it from the media for
optimization purposes.
However, since we do not read it, we do not know if it exists on
the media or not (a hole, etc). So we set the 'PG_checked' flag
to this page to force bigger budget, just in case.
So 'ubifs_write_begin()' sets 'PG_checked'. Then we are in
'ubifs_write_end()'. And VFS tells us: "hey, for some reasons I
changed my mind and did not change whole page". Frankly, I do not
know why this happens, but I hit this somehow on an ARM platform.
And this is extremely rare.
So in this case UBIFS does the following:
1. Cancels allocated budget.
2. Loads the page from the media by calling 'do_readpage()'.
3. Asks VFS to repeat the whole write operation from the very
beginning (call '->write_begin() again, etc).
And the assertion warning is hit at the step 2 - remember we have
the 'PG_checked' set for this page, and 'do_readpage()' does not
like this. So this patch fixes the problem by adding step 1.5 and
cleaning the 'PG_checked' before calling 'do_readpage()'.
All in all, this patch does not fix any functionality issue, but it
silences UBIFS false positive warning which may happen in very very
rare cases.
And while on it, this patch also improves a commentary which explains
the reasons of setting the 'PG_checked' flag for the page. The old
commentary was a bit difficult to understand.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Simplify UBIFS configuration menu and kill the option to enable self-check
compile-time. We do not really need this because we can do this run-time
using the module parameters or the corresponding sysfs interfaces. And
there is a value in simplifying the kernel configuration menu which becomes
increasingly large.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch fixes a brown-paperbag bug which was introduced by me:
I used incorrect "GFP_KERNEL | GFP_NOFS" allocation flags to make
sure my allocations do not cause write-back. But the correct form
is "GFP_NOFS".
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
And give it a kernel-doc comment.
[akpm@linux-foundation.org: btrfs changed in linux-next]
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'linux-next' of git://git.infradead.org/ubifs-2.6: (25 commits)
UBIFS: clean-up commentaries
UBIFS: save 128KiB or more RAM
UBIFS: allocate orphans scan buffer on demand
UBIFS: allocate lpt dump buffer on demand
UBIFS: allocate ltab checking buffer on demand
UBIFS: allocate scanning buffer on demand
UBIFS: allocate dump buffer on demand
UBIFS: do not check data crc by default
UBIFS: simplify UBIFS Kconfig menu
UBIFS: print max. index node size
UBIFS: handle allocation failures in UBIFS write path
UBIFS: use max_write_size during recovery
UBIFS: use max_write_size for write-buffers
UBIFS: introduce write-buffer size field
UBI: incorporate LEB offset information
UBIFS: incorporate maximum write size
UBI: provide LEB offset information
UBI: incorporate maximum write size
UBIFS: fix LEB number in printk
UBIFS: restrict world-writable debugfs files
...
When debugging is enabled, we allocate a buffer of PEB size for
various debugging purposes. However, now all users of this buffer
are gone and we can safely remove it and save 128KiB or more RAM.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Instead of using pre-allocated 'c->dbg->buf' buffer in
'dbg_scan_orphans()', dynamically allocate it when needed. The intend
is to get rid of the pre-allocated 'c->dbg->buf' buffer and save
128KiB of RAM (or more if PEB size is larger). Indeed, currently we
allocate this memory even if the user never enables any self-check,
which is wasteful.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Instead of using pre-allocated 'c->dbg->buf' buffer in
'dump_lpt_leb()', dynamically allocate it when needed. The intend
is to get rid of the pre-allocated 'c->dbg->buf' buffer and save
128KiB of RAM (or more if PEB size is larger). Indeed, currently we
allocate this memory even if the user never enables any self-check,
which is wasteful.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Instead of using pre-allocated 'c->dbg->buf' buffer in
'dbg_check_ltab_lnum()', dynamically allocate it when needed. The
intend is to get rid of the pre-allocated 'c->dbg->buf' buffer and
save 128KiB of RAM (or more if PEB size is larger). Indeed,
currently we allocate this memory even if the user never enables
any self-check, which is wasteful.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Instead of using pre-allocated 'c->dbg->buf' buffer in
'scan_check_cb()', dynamically allocate it when needed. The intend
is to get rid of the pre-allocated 'c->dbg->buf' buffer and save
128KiB of RAM (or more if PEB size is larger). Indeed, currently we
allocate this memory even if the user never enables any self-check,
which is wasteful.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Instead of using pre-allocated 'c->dbg->buf' buffer in
'dbg_dump_leb()', dynamically allocate it when needed. The intend
is to get rid of the pre-allocated 'c->dbg->buf' buffer and save
128KiB of RAM (or more if PEB size is larger). Indeed, currently we
allocate this memory even if the user never enables any self-check,
which is wasteful.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Now that VFS check for inode->i_nlink == 0 and returns proper
error, remove similar check from file system
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change the default UBIFS behavior WRT data CRC checking. Currently,
UBIFS checks data CRC when reading, which slows it down quite a bit,
and this is the default option. However, it looks like in average
user does not need this feature and would prefer faster read speed
over extra reliability. And this seems to be de-facto standard that
file-systems do not check data CRC every time they read from the
media.
Thus, make UBIFS default behavior so that it does not check data
CRC. This corresponds to the no_chk_data_crc mount option. Those users
who need extra protection can always enable it using the chk_data_crc
option.
Please, read more information about this feature here:
http://www.linux-mtd.infradead.org/doc/ubifs.html#L_checksumming
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Remove debug message level and debug checks Kconfig options as they
proved to be useless anyway. We have sysfs interface which we can
use for fine-grained debugging messages and checks selection, see
Documentation/filesystems/ubifs.txt for mode details.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Running kernel 2.6.37, my PPC-based device occasionally gets an
order-2 allocation failure in UBIFS, which causes the root FS to
become unwritable:
kswapd0: page allocation failure. order:2, mode:0x4050
Call Trace:
[c787dc30] [c00085b8] show_stack+0x7c/0x194 (unreliable)
[c787dc70] [c0061aec] __alloc_pages_nodemask+0x4f0/0x57c
[c787dd00] [c0061b98] __get_free_pages+0x20/0x50
[c787dd10] [c00e4f88] ubifs_jnl_write_data+0x54/0x200
[c787dd50] [c00e82d4] do_writepage+0x94/0x198
[c787dd90] [c00675e4] shrink_page_list+0x40c/0x77c
[c787de40] [c0067de0] shrink_inactive_list+0x1e0/0x370
[c787de90] [c0068224] shrink_zone+0x2b4/0x2b8
[c787df00] [c0068854] kswapd+0x408/0x5d4
[c787dfb0] [c0037bcc] kthread+0x80/0x84
[c787dff0] [c000ef44] kernel_thread+0x4c/0x68
Similar problems were encountered last April by Tomasz Stanislawski:
http://patchwork.ozlabs.org/patch/50965/
This patch implements Artem's suggested fix: fall back to a
mutex-protected static buffer, allocated at mount time. I tested it
by forcing execution down the failure path, and didn't see any ill
effects.
Artem: massaged the patch a little, improved it so that we'd not
allocate the write reserve buffer when we are in R/O mode.
Signed-off-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
When recovering from unclean reboots UBIFS scans the journal and checks nodes.
If a corrupted node is found, UBIFS tries to check if this is the last node
in the LEB or not. This is is done by checking if there only 0xFF bytes
starting from the next min. I/O unit. However, since now we write in
c->max_write_size, we should actually check for 0xFFs starting from the
next max. write unit.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Switch write-buffers from 'c->min_io_size' to 'c->max_write_size' which
presumably has to be more write speed-efficient. However, when write-buffer
is synchronized, write only the the min. I/O units which contain the
data, do not write whole write-buffer. This is more space-efficient.
Additionally, this patch takes into account that the LEB might not start
from the max. write unit-aligned address.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Currently we assume write-buffer size is always min_io_size. But
this is about to change and write-buffers may be of variable size.
Namely, they will be of max_write_size at the beginning, but will
get smaller when we are approaching the end of LEB.
This is a preparation patch which introduces 'size' field in
the write-buffer structure which carries the current write-buffer
size.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Incorporate the LEB offset information into UBIFS. We'll use this
information in one of the next patches to figure out what are the
max. write size offsets relative to the PEB. So this patch is just
a preparation.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Incorporate maximum write size into the UBIFS description data
structure. This patch just introduces new 'c->max_write_size'
and 'c->max_write_shift' fields as a preparation for the following
patches.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is a minor patch which fixes the LEB number we print when
corrupted empty space is found.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch adds more commentaries about UBIFS recovery logic which should
explain the famous UBIFS "corrupt empty space" errors.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch fixes suboptimal UBIFS 'sync_fs()' implementation which causes
flash I/O even if the file-system is synchronized. E.g., a 'printk()'
in the MTD erasure function (e.g., 'nand_erase_nand()') can show that
for every 'sync' shell command UBIFS erases at least one eraseblock.
So '$ while true; do sync; done' will cause huge amount of flash I/O.
The reason for this is that UBIFS commits in 'sync_fs()', and starts the
commit even if there is nothing to commit, e.g., it anyway changes the
log. This patch adds a check in the 'do_commit()' UBIFS functions which
prevents the commit if there is nothing to commit.
Reported-by: Hans J. Koch <hjk@linutronix.de>
Tested-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is a preparational patch which removes the 'c->always_chk_crc' which was
set during mounting and remounting to R/W mode and introduces 'c->mounting'
flag which is set when mounting. Now the 'c->always_chk_crc' flag is the
same as 'c->remounting_rw && c->mounting'.
This patch is a preparation for the next one which will need to know when we
are mounting and remounting to R/W mode, which is exactly what
'c->always_chk_crc' effectively is, but its name does not suite the
next patch. The other possibility would be to just re-name it, but then
we'd end up with less logical flags coverage.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is a cosmetic patch which re-arranges variables in 'struct ubifs_info'
so that all boolean-like variables which are only changed during mounting or
re-mounting to R/W mode are places together. Then they are turned into
bit-fields, which makes the structure a little bit smaller.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
RCU free the struct inode. This will allow:
- Subsequent store-free path walking patch. The inode must be consulted for
permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
to take i_lock no longer need to take sb_inode_list_lock to walk the list in
the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
page lock to follow page->mapping.
The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.
In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.
The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* 'linux-next' of git://git.infradead.org/ubifs-2.6:
UBIFS: do not allocate unneeded scan buffer
UBIFS: do not forget to cancel timers
UBIFS: remove a bit of unneeded code
UBIFS: add a commentary about log recovery
UBIFS: avoid kernel error if ubifs superblock read fails
UBIFS: introduce new flags for RO mounts
UBIFS: introduce new flag for RO due to errors
UBIFS: check return code of pnode_lookup
UBIFS: check return code of ubifs_lpt_lookup
UBIFS: improve error reporting when reading bad node
UBIFS: introduce list sorting debugging checks
UBIFS: fix assertion warnings in comparison function
UBIFS: mark unused key objects as invalid
UBIFS: do not write rubbish into truncation scanning node
UBIFS: improve assertion in node comparison functions
UBIFS: do not use key type in list_sort
UBIFS: do not look up truncation nodes
UBIFS: fix assertion warning
UBIFS: do not treat ENOSPC specially
UBIFS: switch to RO mode after synchronizing
In 'ubifs_replay_journal()' we allocate 'sbuf' for scanning the log.
However, we already have 'c->sbuf' for these purposes, so do not
allocate yet another one. This reduces UBIFS memory consumption while
recovering.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is a bug-fix: when we unmount, and we are currently in R/O
mode because of an error - we do not sync write-buffers, which
means we also do not cancel write-buffer timers we may possibly
have armed. This patch fixes the issue.
The issue can easily be reproduced by enabling UBIFS failure debug
mode (echo 4 > /sys/module/ubifs/parameters/debug_tsts) and
unmounting as soon as a failure happen. At some point the system
oopses because we have an armed hrtimer but UBIFS is unmounted
already.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is a clean-up patch which:
1. Removes explicite 'hrtimer_cancel()' after 'ubifs_wbuf_sync()' in
'ubifs_remount_ro()', because the timers will be canceled by
'ubifs_wbuf_sync()', no need to cancel them for the second time.
2. Remove "if (c->jheads)" check from 'ubifs_put_super()', because
at journal heads must always be allocated there, since we checked
earlier that we were mounted R/W, and the olny situation when
journal heads are not allocated is when mounter or re-mounted R/O.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Add a commentary which elaborates that 'ubifs_recover_log_leb()' recovers only
the last log LEB, not any. Also remove some unneeded newlines.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.
The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.
New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time. Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.
The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.
Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.
Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.
===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
// but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}
@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}
@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}
@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}
@ fops0 @
identifier fops;
@@
struct file_operations fops = {
...
};
@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
.llseek = llseek_f,
...
};
@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
.read = read_f,
...
};
@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
.write = write_f,
...
};
@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
.open = open_f,
...
};
// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
... .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};
@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
... .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};
// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
... .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};
// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};
// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};
@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+ .llseek = default_llseek, /* write accesses f_pos */
};
// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////
@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
.write = write_f,
.read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};
@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};
@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};
@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>
.get_sb is called on mounts with automatic fs detection too, so this
function should print an error if it cannot read the superblock in
debug mode only (new behaviour conforms the other fs types)
Signed-off-by: Steffen Sledz <sledz@dresearch.de>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Commit 2fde99cb55 "UBIFS: mark VFS SB RO too"
introduced regression. This commit made UBIFS set the 'MS_RDONLY' flag in the
VFS superblock when it switches to R/O mode due to an error. This was done
to make VFS show the R/O UBIFS flag in /proc/mounts.
However, several places in UBIFS relied on the 'MS_RDONLY' flag and assume this
flag can only change when we re-mount. For example, 'ubifs_put_super()'.
This patch introduces new UBIFS flag - 'c->ro_mount' which changes only when
we re-mount, and preserves the way UBIFS was originally mounted (R/W or R/O).
This allows us to de-initialize UBIFS cleanly in 'ubifs_put_super()'.
This patch also changes all 'ubifs_assert(!c->ro_media)' assertions to
'ubifs_assert(!c->ro_media && !c->ro_mount)', because we never should write
anything if the FS was mounter R/O.
All the places where we test for 'MS_RDONLY' flag in the VFS SB were changed
and now we test the 'c->ro_mount' flag instead, because it preserves the
original UBIFS mount type, unlike the 'MS_RDONLY' flag.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The R/O state may have various reasons:
1. The UBI volume is R/O
2. The FS is mounted R/O
3. The FS switched to R/O mode because of an error
However, in UBIFS we have only one variable which represents cases
1 and 3 - 'c->ro_media'. Indeed, we set this to 1 if we switch to
R/O mode due to an error, and then we test it in many places to
make sure that we stop writing as soon as the error happens.
But this is very unclean. One consequence of this, for example, is
that in 'ubifs_remount_fs()' we use 'c->ro_media' to check whether
we are in R/O mode because on an error, and we print a message
in this case. However, if we are in R/O mode because the media
is R/O, our message is bogus.
This patch introduces new flag - 'c->ro_error' which is set when
we switch to R/O mode because of an error. It also changes all
"if (c->ro_media)" checks to "if (c->ro_error)" checks, because
this is what the checks actually mean. We do not need to check
for 'c->ro_media' because if the UBI volume is in R/O mode, we
do not allow R/W mounting, and now writes can happen. This is
guaranteed by VFS. But it is good to double-check this, so this
patch also adds many "ubifs_assert(!c->ro_media)" checks.
In the 'ubifs_remount_fs()' function this patch makes a bit more
changes - it fixes the error messages as well.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Function pnode_lookup may return ERR_PTR(...). Check for it.
Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Function ubifs_lpt_lookup may return ERR_PTR(...). Check for it.
[Tweaked by Artem Bityutskiy]
Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When an error happens during validation of read node, the typical situation is that
the LEB we read is unmapped (due to some bug). It is handy to include the mapping
status into the error message.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The UBIFS bug in the GC list sorting comparison functions inspired
me to write internal debugging check functions which verify that
the list of nodes is sorted properly.
So, this patch implements 2 new debugging functions:
o 'dbg_check_data_nodes_order()' - check order of data nodes list
o 'dbg_check_nondata_nodes_order()' - check order of non-data nodes list
The debugging functions are executed only if general UBIFS debugging checks are
enabled. And they are compiled out if UBIFS debugging is disabled.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>