linux

Commit Graph

Author	SHA1	Message	Date
Ryusuke Konishi	72746ac643	nilfs2: fix regression that i-flag is not set on changeless checkpoints According to the report from Jiro SEKIBA titled "regression in 2.6.37?" (Message-Id: <8739n8vs1f.wl%jir@sekiba.com>), on 2.6.37 and later kernels, lscp command no longer displays "i" flag on checkpoints that snapshot operations or garbage collection created. This is a regression of nilfs2 checkpointing function, and it's critical since it broke behavior of a part of nilfs2 applications. For instance, snapshot manager of TimeBrowse gets to create meaningless snapshots continuously; snapshot creation triggers another checkpoint, but applications cannot distinguish whether the new checkpoint contains meaningful changes or not without the i-flag. This patch fixes the regression and brings that application behavior back to normal. Reported-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Jiro SEKIBA <jir@unicus.jp> Cc: stable <stable@kernel.org> [2.6.37]	2011-03-02 09:55:18 +09:00
Miklos Szeredi	2aa15890f3	mm: prevent concurrent unmap_mapping_range() on the same inode Michael Leun reported that running parallel opens on a fuse filesystem can trigger a "kernel BUG at mm/truncate.c:475" Gurudas Pai reported the same bug on NFS. The reason is, unmap_mapping_range() is not prepared for more than one concurrent invocation per inode. For example: thread1: going through a big range, stops in the middle of a vma and stores the restart address in vm_truncate_count. thread2: comes in with a small (e.g. single page) unmap request on the same vma, somewhere before restart_address, finds that the vma was already unmapped up to the restart address and happily returns without doing anything. Another scenario would be two big unmap requests, both having to restart the unmapping and each one setting vm_truncate_count to its own value. This could go on forever without any of them being able to finish. Truncate and hole punching already serialize with i_mutex. Other callers of unmap_mapping_range() do not, and it's difficult to get i_mutex protection for all callers. In particular ->d_revalidate(), which calls invalidate_inode_pages2_range() in fuse, may be called with or without i_mutex. This patch adds a new mutex to 'struct address_space' to prevent running multiple concurrent unmap_mapping_range() on the same mapping. [ We'll hopefully get rid of all this with the upcoming mm preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex lockbreak" patch in particular. But that is for 2.6.39 ] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reported-by: Michael Leun <lkml20101129@newton.leun.net> Reported-by: Gurudas Pai <gurudas.pai@oracle.com> Tested-by: Gurudas Pai <gurudas.pai@oracle.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-23 19:52:52 -08:00
Ryusuke Konishi	0ca7a5b9ac	nilfs2: fix crash after one superblock became unavailable Fixes the following kernel oops in nilfs_setup_super() which could arise if one of two super-blocks is unavailable. > BUG: unable to handle kernel NULL pointer dereference at (null) > Pid: 3529, comm: mount.nilfs2 Not tainted 2.6.37 #1 / > EIP: 0060:[<c03196bc>] EFLAGS: 00010202 CPU: 3 > EIP is at memcpy+0xc/0x1b > Call Trace: > [<f953720e>] ? nilfs_setup_super+0x6c/0xa5 [nilfs2] > [<f95369e9>] ? nilfs_get_root_dentry+0x81/0xcb [nilfs2] > [<f9537a08>] ? nilfs_mount+0x4f9/0x62c [nilfs2] > [<c02745cf>] ? kstrdup+0x36/0x3f > [<f953750f>] ? nilfs_mount+0x0/0x62c [nilfs2] > [<c0293940>] ? vfs_kern_mount+0x4d/0x12c > [<c02a5100>] ? get_fs_type+0x76/0x8f > [<c0293a68>] ? do_kern_mount+0x33/0xbf > [<c02a784a>] ? do_mount+0x2ed/0x714 > [<c02a6171>] ? copy_mount_options+0x28/0xfc > [<c02a7ce3>] ? sys_mount+0x72/0xaf > [<c0473085>] ? syscall_call+0x7/0xb Reported-by: Wakko Warner <wakko@animx.eu.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Wakko Warner <wakko@animx.eu.org> Cc: stable <stable@kernel.org> [2.6.37, 2.6.36] LKML-Reference: <20110121024918.GA29598@animx.eu.org>	2011-01-22 15:22:36 +09:00
Linus Torvalds	275220f0fc	Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits) block: ensure that completion error gets properly traced blktrace: add missing probe argument to block_bio_complete block cfq: don't use atomic_t for cfq_group block cfq: don't use atomic_t for cfq_queue block: trace event block fix unassigned field block: add internal hd part table references block: fix accounting bug on cross partition merges kref: add kref_test_and_get bio-integrity: mark kintegrityd_wq highpri and CPU intensive block: make kblockd_workqueue smarter Revert "sd: implement sd_check_events()" block: Clean up exit_io_context() source code. Fix compile warnings due to missing removal of a 'ret' variable fs/block: type signature of major_to_index(int) to major_to_index(unsigned) block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p) cfq-iosched: don't check cfqg in choose_service_tree() fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors cdrom: export cdrom_check_events() sd: implement sd_check_events() sr: implement sr_check_events() ...	2011-01-13 10:45:01 -08:00
Alexey Dobriyan	57cc7215b7	headers: kobject.h redux Remove kobject.h from files which don't need it, notably, sched.h and fs.h. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-10 08:51:44 -08:00
Ryusuke Konishi	365e215ce1	nilfs2: unfold nilfs_dat_inode function nilfs_dat_inode function was a wrapper to switch between normal dat inode and gcdat, a clone of the dat inode for garbage collection. This function got obsolete when the gcdat inode was removed, and now we can access the dat inode directly from a nilfs object. So, we will unfold the wrapper and remove it. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-01-10 14:38:39 +09:00
Ryusuke Konishi	bcbc8c648d	nilfs2: do not pass sbi to functions which can get it from inode This removes argument for passing nilfs_sb_info structure from nilfs_set_file_dirty and nilfs_load_inode_block functions. We can get a pointer to the structure from inodes. [Stephen Rothwell <sfr@canb.auug.org.au>: fix conflict with commit `b74c79e993`] Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-01-10 14:37:54 +09:00
Ryusuke Konishi	06df0f9992	nilfs2: get rid of nilfs_mount_options structure Only mount_opt member is used in the nilfs_mount_options structure, and we can simplify it. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-01-10 14:05:46 +09:00
Ryusuke Konishi	a7a8447ede	nilfs2: simplify nilfs_mdt_freeze_buffer nilfs_page_get_nth_block() function used in nilfs_mdt_freeze_buffer() always returns a valid buffer head, so its validity check can be removed. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-01-10 14:05:46 +09:00
Ryusuke Konishi	888da23c2f	nilfs2: get rid of loaded flag from nilfs object NILFS_LOADED flag of the nilfs object is not used now, so this will remove it. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-01-10 14:05:46 +09:00
Ryusuke Konishi	ae53a0a2ce	nilfs2: fix a checkpatch error in page.c Will correct the following checkpatch error: ERROR: trailing whitespace #494: FILE: page.c:494: + $ Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-01-10 14:05:46 +09:00
Ryusuke Konishi	622daaff0a	nilfs2: fiemap support This adds fiemap to nilfs. Two new functions, nilfs_fiemap and nilfs_find_uncommitted_extent are added. nilfs_fiemap() implements the fiemap inode operation, and nilfs_find_uncommitted_extent() helps to get a range of data blocks whose physical location has not been determined. nilfs_fiemap() collects extent information by looping through nilfs_bmap_lookup_contig and nilfs_find_uncommitted_extent routines. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-01-10 14:05:46 +09:00
Ryusuke Konishi	27e6c7a3ce	nilfs2: mark buffer heads as delayed until the data is written to disk Nilfs does not allocate new blocks on disk until they are actually written to. To implement fiemap, we need to deal with such blocks. To allow successive fiemap patch to distinguish mapped but unallocated regions, this marks buffer heads of those new blocks as delayed and clears the flag after the blocks are written to disk. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-01-10 14:05:45 +09:00
Ryusuke Konishi	e828949e5b	nilfs2: call nilfs_error inside bmap routines Some functions using nilfs bmap routines can wrongly return invalid argument error (i.e. -EINVAL) that bmap returns as an internal code for btree corruption. This fixes the issue by catching and converting the internal EINVAL to EIO and calling nilfs_error function inside bmap routines. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-01-10 14:05:45 +09:00
Joe Perches	b004a5eb0b	fs/nilfs2/super.c: Use printf extension %pV Using %pV reduces the number of printk calls and eliminates any possible message interleaving from other printk calls. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2011-01-10 14:05:45 +09:00
Nick Piggin	b74c79e993	fs: provide rcu-walk aware permission i_ops Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:29 +11:00
Nick Piggin	fa0d7e3de6	fs: icache RCU free inodes RCU free the struct inode. This will allow: - Subsequent store-free path walking patch. The inode must be consulted for permissions when walking, so an RCU inode reference is a must. - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want to take i_lock no longer need to take sb_inode_list_lock to walk the list in the first place. This will simplify and optimize locking. - Could remove some nested trylock loops in dcache code - Could potentially simplify things a bit in VM land. Do not need to take the page lock to follow page->mapping. The downsides of this is the performance cost of using RCU. In a simple creat/unlink microbenchmark, performance drops by about 10% due to inability to reuse cache-hot slab objects. As iterations increase and RCU freeing starts kicking over, this increases to about 20%. In cases where inode lifetimes are longer (ie. many inodes may be allocated during the average life span of a single inode), a lot of this cache reuse is not applicable, so the regression caused by this patch is smaller. The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU, however this adds some complexity to list walking and store-free path walking, so I prefer to implement this at a later date, if it is shown to be a win in real situations. I haven't found a regression in any non-micro benchmark so I doubt it will be a problem. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:26 +11:00
Nick Piggin	b7ab39f631	fs: dcache scale dentry refcount Make d_count non-atomic and protect it with d_lock. This allows us to ensure a 0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when we start protecting many other dentry members with d_lock. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:21 +11:00
Ryusuke Konishi	947b10ae0a	nilfs2: fix regression of garbage collection ioctl On 2.6.37-rc1, garbage collection ioctl of nilfs was broken due to the commit `263d90cefc` ("nilfs2: remove own inode hash used for GC"), and leading to filesystem corruption. The patch doesn't queue gc-inodes for log writer if they are reused through the vfs inode cache. Here, gc-inode is the inode which buffers blocks to be relocated on GC. That patch queues gc-inodes in nilfs_init_gcinode() function, but this function is not called when they don't have I_NEW flag. Thus, some of live blocks are wrongly overrode without being moved to new logs. This resolves the problem by moving the gc-inode queueing to an outer function to ensure it's done right. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-12-16 14:35:18 +09:00
Ryusuke Konishi	f6c26ec508	nilfs2: fix typo in comment of nilfs_dat_move function Fixes a typo: "uncommited" -> "uncommitted". Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-11-24 12:51:48 +09:00
Dan Carpenter	103cfcf522	nilfs2: nilfs_iget_for_gc() returns ERR_PTR nilfs_iget_for_gc() returns an ERR_PTR() on failure and doesn't return NULL. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-11-23 16:32:19 +09:00
Tejun Heo	d4d7762995	block: clean up blkdev_get() wrappers and their users After recent blkdev_get() modifications, open_by_devnum() and open_bdev_exclusive() are simple wrappers around blkdev_get(). Replace them with blkdev_get_by_dev() and blkdev_get_by_path(). blkdev_get_by_dev() is identical to open_by_devnum(). blkdev_get_by_path() is slightly different in that it doesn't automatically add %FMODE_EXCL to @mode. All users are converted. Most conversions are mechanical and don't introduce any behavior difference. There are several exceptions. * btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no reason to OR it explicitly on blkdev_put(). * gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in sb->s_mode. * With the above changes, sb->s_mode now always should contain FMODE_EXCL. WARN_ON_ONCE() added to kill_block_super() to detect errors. The new blkdev_get_*() functions are with proper docbook comments. While at it, add function description to blkdev_get() too. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Neil Brown <neilb@suse.de> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Joern Engel <joern@lazybastard.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: reiserfs-devel@vger.kernel.org Cc: xfs-masters@oss.sgi.com Cc: Alexander Viro <viro@zeniv.linux.org.uk>	2010-11-13 11:55:18 +01:00
Tejun Heo	e525fd89d3	block: make blkdev_get/put() handle exclusive access Over time, block layer has accumulated a set of APIs dealing with bdev open, close, claim and release. * blkdev_get/put() are the primary open and close functions. * bd_claim/release() deal with exclusive open. * open/close_bdev_exclusive() are combination of open and claim and the other way around, respectively. * bd_link/unlink_disk_holder() to create and remove holder/slave symlinks. * open_by_devnum() wraps bdget() + blkdev_get(). The interface is a bit confusing and the decoupling of open and claim makes it impossible to properly guarantee exclusive access as in-kernel open + claim sequence can disturb the existing exclusive open even before the block layer knows the current open if for another exclusive access. Reorganize the interface such that, * blkdev_get() is extended to include exclusive access management. @holder argument is added and, if is @FMODE_EXCL specified, it will gain exclusive access atomically w.r.t. other exclusive accesses. * blkdev_put() is similarly extended. It now takes @mode argument and if @FMODE_EXCL is set, it releases an exclusive access. Also, when the last exclusive claim is released, the holder/slave symlinks are removed automatically. * bd_claim/release() and close_bdev_exclusive() are no longer necessary and either made static or removed. * bd_link_disk_holder() remains the same but bd_unlink_disk_holder() is no longer necessary and removed. * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev() and blkdev_get(). It also has an unexpected extra bdev_read_only() test which probably should be moved into blkdev_get(). * open_by_devnum() is modified to take @holder argument and pass it to blkdev_get(). Most of bdev open/close operations are unified into blkdev_get/put() and most exclusive accesses are tested atomically at the open time (as it should). This cleans up code and removes some, both valid and invalid, but unnecessary all the same, corner cases. open_bdev_exclusive() and open_by_devnum() can use further cleanup - rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop special features. Well, let's leave them for another day. Most conversions are straight-forward. drbd conversion is a bit more involved as there was some reordering, but the logic should stay the same. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Brown <neilb@suse.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Philipp Reisner <philipp.reisner@linbit.com> Cc: Peter Osterlund <petero2@telia.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Alex Elder <aelder@sgi.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: dm-devel@redhat.com Cc: drbd-dev@lists.linbit.com Cc: Leo Chen <leochen@broadcom.com> Cc: Scott Branden <sbranden@broadcom.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Joern Engel <joern@logfs.org> Cc: reiserfs-devel@vger.kernel.org Cc: Alexander Viro <viro@zeniv.linux.org.uk>	2010-11-13 11:55:17 +01:00
Al Viro	e4c59d61e8	convert nilfs Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-29 04:16:53 -04:00
Linus Torvalds	426e1f5cec	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits) split invalidate_inodes() fs: skip I_FREEING inodes in writeback_sb_inodes fs: fold invalidate_list into invalidate_inodes fs: do not drop inode_lock in dispose_list fs: inode split IO and LRU lists fs: switch bdev inode bdi's correctly fs: fix buffer invalidation in invalidate_list fsnotify: use dget_parent smbfs: use dget_parent exportfs: use dget_parent fs: use RCU read side protection in d_validate fs: clean up dentry lru modification fs: split __shrink_dcache_sb fs: improve DCACHE_REFERENCED usage fs: use percpu counter for nr_dentry and nr_dentry_unused fs: simplify __d_free fs: take dcache_lock inside __d_path fs: do not assign default i_ino in new_inode fs: introduce a per-cpu last_ino allocator new helper: ihold() ...	2010-10-26 17:58:44 -07:00
Michael Rubin	f629d1c9bd	mm: add account_page_writeback() To help developers and applications gain visibility into writeback behaviour this patch adds two counters to /proc/vmstat. # grep nr_dirtied /proc/vmstat nr_dirtied 3747 # grep nr_written /proc/vmstat nr_written 3618 These entries allow user apps to understand writeback behaviour over time and learn how it is impacting their performance. Currently there is no way to inspect dirty and writeback speed over time. It's not possible for nr_dirty/nr_writeback. These entries are necessary to give visibility into writeback behaviour. We have /proc/diskstats which lets us understand the io in the block layer. We have blktrace for more in depth understanding. We have e2fsprogs and debugsfs to give insight into the file systems behaviour, but we don't offer our users the ability understand what writeback is doing. There is no way to know how active it is over the whole system, if it's falling behind or to quantify it's efforts. With these values exported users can easily see how much data applications are sending through writeback and also at what rates writeback is processing this data. Comparing the rates of change between the two allow developers to see when writeback is not able to keep up with incoming traffic and the rate of dirty memory being sent to the IO back end. This allows folks to understand their io workloads and track kernel issues. Non kernel engineers at Google often use these counters to solve puzzling performance problems. Patch #4 adds a pernode vmstat file with nr_dirtied and nr_written Patch #5 add writeback thresholds to /proc/vmstat Currently these values are in debugfs. But they should be promoted to /proc since they are useful for developers who are writing databases and file servers and are not debugging the kernel. The output is as below: # grep threshold /proc/vmstat nr_pages_dirty_threshold 409111 nr_pages_dirty_background_threshold 818223 This patch: This allows code outside of the mm core to safely manipulate page writeback state and not worry about the other accounting. Not using these routines means that some code will lose track of the accounting and we get bugs. Modify nilfs2 to use interface. Signed-off-by: Michael Rubin <mrubin@google.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Jiro SEKIBA <jir@unicus.jp> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:06 -07:00
Al Viro	7de9c6ee3e	new helper: ihold() Clones an existing reference to inode; caller must already hold one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:26:11 -04:00
Linus Torvalds	ab34c02afe	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (36 commits) nilfs2: eliminate sparse warning - "context imbalance" nilfs2: eliminate sparse warnings - "symbol not declared" nilfs2: get rid of bdi from nilfs object nilfs2: change license of exported header file nilfs2: add bdev freeze/thaw support nilfs2: accept 64-bit checkpoint numbers in cp mount option nilfs2: remove own inode allocator and destructor for metadata files nilfs2: get rid of back pointer to writable sb instance nilfs2: get rid of mi_nilfs back pointer to nilfs object nilfs2: see state of root dentry for mount check of snapshots nilfs2: use iget for all metadata files nilfs2: get rid of GCDAT inode nilfs2: add routines to redirect access to buffers of DAT file nilfs2: add routines to roll back state of DAT file nilfs2: add routines to save and restore bmap state nilfs2: do not allocate nilfs_mdt_info structure to gc-inodes nilfs2: allow nilfs_clear_inode to clear metadata file inodes nilfs2: get rid of snapshot mount flag nilfs2: simplify life cycle management of nilfs object nilfs2: do not allocate multiple super block instances for a device ...	2010-10-23 01:26:47 -07:00
Jiro SEKIBA	6b81e14e64	nilfs2: eliminate sparse warning - "context imbalance" insert sparse annotations to fix following sparse warning. fs/nilfs2/segment.c:2681:3: warning: context imbalance in 'nilfs_segctor_kill_thread' - unexpected unlock nilfs_segctor_kill_thread is only called inside sc_state_lock lock. sparse doesn't detect the context and warn "unexpected unlock". __acquires/__releases pretend to lock/unlock the sc_state_lock for sparse. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:40 +09:00
Jiro SEKIBA	abc0b50b6b	nilfs2: eliminate sparse warnings - "symbol not declared" change nilfs_dat_commit_free and nilfs_inode_cachep static to fix following warnings fs/nilfs2/super.c:72:19: warning: symbol 'nilfs_inode_cachep' was not declared. Should it be static? fs/nilfs2/dat.c:106:6: warning: symbol 'nilfs_dat_commit_free' was not declared. Should it be static? Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:40 +09:00
Ryusuke Konishi	026a7d63d5	nilfs2: get rid of bdi from nilfs object Nilfs now can use sb->s_bdi to get backing_dev_info, so we use it instead of ns_bdi on the nilfs object and remove ns_bdi. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:39 +09:00
Ryusuke Konishi	5beb6e0b20	nilfs2: add bdev freeze/thaw support Nilfs hasn't supported the freeze/thaw feature because it didn't work due to the peculiar design that multiple super block instances could be allocated for a device. This limitation was removed by the patch "nilfs2: do not allocate multiple super block instances for a device". So now this adds the freeze/thaw support to nilfs. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:39 +09:00
Ryusuke Konishi	c05dbfc260	nilfs2: accept 64-bit checkpoint numbers in cp mount option The current implementation doesn't mount snapshots with checkpoint numbers larger than INT_MAX since it uses match_int() for parsing "cp=" mount option. This uses simple_strtoull() for the conversion to resolve the issue. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:39 +09:00
Ryusuke Konishi	2879ed66e4	nilfs2: remove own inode allocator and destructor for metadata files This finally removes own inode allocator and destructor functions for metadata files. Several routines, nilfs_mdt_new(), nilfs_mdt_new_common(), nilfs_mdt_clear(), nilfs_mdt_destroy(), and nilfs_alloc_inode_common() will be gone. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:39 +09:00
Ryusuke Konishi	090fd5b101	nilfs2: get rid of back pointer to writable sb instance Nilfs object holds a back pointer to a writable super block instance in nilfs->ns_writer, and this became eliminable since sb is now made per device and all inodes have a valid pointer to it. This deletes the ns_writer pointer and a reader/writer semaphore protecting it. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:38 +09:00
Ryusuke Konishi	c6e071884a	nilfs2: get rid of mi_nilfs back pointer to nilfs object This removes a back pointer to nilfs object from nilfs_mdt_info structure that is attached to metadata files. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:38 +09:00
Ryusuke Konishi	032dbb3b50	nilfs2: see state of root dentry for mount check of snapshots After applied the patch that unified sb instances, root dentry of snapshots can be left in dcache even after their trees are unmounted. The orphan root dentry/inode keeps a root object, and this causes false positive of nilfs_checkpoint_is_mounted function. This resolves the issue by having nilfs_checkpoint_is_mounted test whether the root dentry is busy or not. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:38 +09:00
Ryusuke Konishi	f1e89c86fd	nilfs2: use iget for all metadata files This makes use of iget5_locked to allocate or get inode for metadata files to stop using own inode allocator. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:38 +09:00
Ryusuke Konishi	c1c1d70920	nilfs2: get rid of GCDAT inode This applies prepared rollback function and redirect function of metadata file to DAT file, and eliminates GCDAT inode. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:38 +09:00
Ryusuke Konishi	b1f6a4f294	nilfs2: add routines to redirect access to buffers of DAT file During garbage collection (GC), DAT file, which converts virtual block number to real block number, may return disk block number that is not yet written to the device. To avoid access to unwritten blocks, the current implementation stores changes to the caches of GCDAT during GC and atomically commit the changes into the DAT file after they are written to the device. This patch, instead, adds a function that makes a copy of specified buffer and stores it in nilfs_shadow_map, and a function to get the backup copy as needed (nilfs_mdt_freeze_buffer and nilfs_mdt_get_frozen_buffer respectively). Before DAT changes block number in an entry block, it makes a copy and redirect access to the buffer so that address conversion function (i.e. nilfs_dat_translate) refers to the old address saved in the copy. This patch gives requisites for such redirection. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:37 +09:00
Ryusuke Konishi	ebdfed4dc5	nilfs2: add routines to roll back state of DAT file This adds optional function to metadata files which makes a copy of bmap, page caches, and b-tree node cache, and rolls back to the copy as needed. This enhancement is intended to displace gcdat inode that provides a similar function in a different way. In this patch, nilfs_shadow_map structure is added to store a copy of the foregoing states. nilfs_mdt_setup_shadow_map relates this structure to a metadata file. And, nilfs_mdt_save_to_shadow_map() and nilfs_mdt_restore_from_shadow_map() provides save and restore functions respectively. Finally, nilfs_mdt_clear_shadow_map() clears states of nilfs_shadow_map. The copy of b-tree node cache and page cache is made by duplicating only dirty pages into corresponding caches in nilfs_shadow_map. Their restoration is done by clearing dirty pages from original caches and by copying dirty pages back from nilfs_shadow_map. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:37 +09:00
Ryusuke Konishi	a8070dd365	nilfs2: add routines to save and restore bmap state This adds routines to save and restore the state of bmap structure. The bmap state is stored in a given nilfs_bmap_store object. These routines will be used to roll back the state of dat inode without using gcdat inode. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:37 +09:00
Ryusuke Konishi	adbb39b548	nilfs2: do not allocate nilfs_mdt_info structure to gc-inodes GC-inode now doesn't need the nilfs_mdt_info structure and there is no reason that it is a sort of metadata files. This stops the allocation and makes them not dependent on metadata file routines. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:37 +09:00
Ryusuke Konishi	518d1a6a1d	nilfs2: allow nilfs_clear_inode to clear metadata file inodes Allows clear inode function (nilfs_clear_inode) to handle metadata files that uses bitmap-based object alloctor. DAT and ifile correspond to this. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:37 +09:00
Ryusuke Konishi	348fe8da13	nilfs2: simplify life cycle management of nilfs object This stops pre-allocating nilfs object in nilfs_get_sb routine, and stops managing its life cycle by reference counting. nilfs_find_or_create_nilfs() function, nilfs->ns_mount_mutex, nilfs_objects list, and the reference counter will be removed through the simplification. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:36 +09:00
Ryusuke Konishi	f11459ad7d	nilfs2: do not allocate multiple super block instances for a device This stops allocating multiple super block instances for a device. All snapshots and a current mode mount (i.e. latest tree) will be controlled with nilfs_root objects that are kept within an sb instance. nilfs_get_sb() is rewritten so that it always has a root object for the latest tree and snapshots make additional root objects. The root dentry of the latest tree is binded to sb->s_root even if it isn't attached on a directory. Root dentries of snapshots or the latest tree are binded to mnt->mnt_root on which they are mounted. With this patch, nilfs_find_sbinfo() function, nilfs->ns_supers list, and nilfs->ns_current back pointer, are deleted. In addition, init_nilfs() and load_nilfs() are simplified since they will be called once for a device, not repeatedly called for mount points. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:36 +09:00
Ryusuke Konishi	ab4d8f7ebf	nilfs2: split out nilfs_attach_snapshot This splits the code to attach snapshots into a separate routine for convenience sake. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:36 +09:00
Ryusuke Konishi	367ea33486	nilfs2: split out nilfs_get_root_dentry This splits the code to allocate root dentry into a separate routine for convenience in successive changes. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:35 +09:00
Ryusuke Konishi	dc3d3b810a	nilfs2: deny write access to inodes in snapshots Snapshots of nilfs are read-only. After super block instances (sb) will be unified, nilfs will need to check write access by a way other than implicit test with IS_RDONLY(inode). This is because IS_RDONLY() refers to MS_RDONLY bit of inode->i_sb->s_flags and it will become inaccurate after the unification of sb. To prepare for the issue, this uses i_op->permission to deny write access to inodes in snapshots. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:35 +09:00
Ryusuke Konishi	fd52202930	nilfs2: use checkpoint tree for mount check of snapshots This rewrites nilfs_checkpoint_is_mounted() function so that it decides whether a checkpoint is mounted by whether the corresponding root object is found in checkpoint tree. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:35 +09:00
Ryusuke Konishi	b7c0634204	nilfs2: move inode count and block count into root object This moves sbi->s_inodes_count and sbi->s_blocks_count into nilfs_root object. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:35 +09:00
Ryusuke Konishi	e912a5b668	nilfs2: use root object to get ifile This rewrites functions using ifile so that they get ifile from nilfs_root object, and will remove sbi->s_ifile. Some functions that don't know the root object are extended to receive it from caller. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:35 +09:00
Ryusuke Konishi	8e656fd518	nilfs2: make snapshots in checkpoint tree exportable The previous export operations cannot handle multiple versions of a filesystem if they belong to the same sb instance. This adds a new type of file handle and extends export operations so that they can get the inode specified by a checkpoint number as well as an inode number and a generation number. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:34 +09:00
Ryusuke Konishi	4d8d9293dc	nilfs2: set pointer to root object in inodes This puts a pointer to nilfs_root object in the private part of on-memory inode, and makes nilfs_iget function pick up the inode with the same root object. Non-root inodes inherit its nilfs_root object from parent inode. That of the root inode is allocated through nilfs_attach_checkpoint() function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:34 +09:00
Ryusuke Konishi	ba65ae4729	nilfs2: add checkpoint tree to nilfs object To hold multiple versions of a filesystem in one sb instance, a new on-memory structure is necessary to handle one or more checkpoints. This adds a red-black tree of checkpoints to nilfs object, and adds lookup and create functions for them. Each checkpoint is represented by "nilfs_root" structure, and this structure has rb_node to configure the rb-tree. The nilfs_root object is identified with a checkpoint number. For each snapshot, a nilfs_root object is allocated and the checkpoint number of snapshot is assigned to it. For a regular mount (i.e. current mode mount), NILFS_CPTREE_CURRENT_CNO constant is assigned to the corresponding nilfs_root object. Each nilfs_root object has an ifile inode and some counters. These items will displace those of nilfs_sb_info structure in successive patches. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:34 +09:00
Ryusuke Konishi	263d90cefc	nilfs2: remove own inode hash used for GC This uses inode hash function that vfs provides instead of the own hash table for caching gc inodes. This finally removes the own inode hash from nilfs. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:34 +09:00
Ryusuke Konishi	5e19a995f4	nilfs2: separate initializer of metadata file inode This separates a part of initialization code of metadata file inode, and makes it available from the nilfs iget function that a later patch will add to. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:34 +09:00
Ryusuke Konishi	0e14a3595b	nilfs2: use iget5_locked to get inode This uses iget5_locked instead of iget_locked so that gc cache can look up inodes with an inode number and an optional checkpoint number. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:33 +09:00
Ryusuke Konishi	6c43f41000	nilfs2: keep zero value in i_cno except for gc-inodes On-memory inode structures of nilfs have a member "i_cno" which stores a checkpoint number related to the inode. For gc-inodes, this field indicates version of data each gc-inode caches for GC. Log writer temporarily uses "i_cno" to transfer the latest checkpoint number. This stops the latter use and lets only gc-inodes use it. The purpose of this patch is to allow the successive change use "i_cno" for inode lookup. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:33 +09:00
Ryusuke Konishi	7d6cd92fe2	nilfs2: allow nilfs_dirty_inode to mark metadata file inodes dirty This allows sop->dirty_inode callback function (nilfs_dirty_inode) to handle metadata file inodes. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:33 +09:00
Ryusuke Konishi	b91c9a97c9	nilfs2: allow nilfs_destroy_inode to destroy metadata file inodes The current nilfs_destroy_inode() doesn't handle metadata file inodes including gc inodes (dummy inodes used for garbage collection). This allows nilfs_destroy_inode() to destroy inodes of metadata files. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:33 +09:00
Ryusuke Konishi	9566a7a851	nilfs2: accept future revisions Compatibility of nilfs partitions is now managed with three feature sets. This changes old compatibility check with revision number so that it can accept future revisions. Note that we can stop support of experimental versions of nilfs that doesn't know the feature sets by incrementing NILFS_CURRENT_REV. We don't have to do it soon, but it would be a possible option whenever the need arises. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-10-23 09:24:33 +09:00
Linus Torvalds	a2887097f2	Merge branch 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits) xen-blkfront: disable barrier/flush write support Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c block: remove BLKDEV_IFL_WAIT aic7xxx_old: removed unused 'req' variable block: remove the BH_Eopnotsupp flag block: remove the BLKDEV_IFL_BARRIER flag block: remove the WRITE_BARRIER flag swap: do not send discards as barriers fat: do not send discards as barriers ext4: do not send discards as barriers jbd2: replace barriers with explicit flush / FUA usage jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier jbd: replace barriers with explicit flush / FUA usage nilfs2: replace barriers with explicit flush / FUA usage reiserfs: replace barriers with explicit flush / FUA usage gfs2: replace barriers with explicit flush / FUA usage btrfs: replace barriers with explicit flush / FUA usage xfs: replace barriers with explicit flush / FUA usage block: pass gfp_mask and flags to sb_issue_discard dm: convey that all flushes are processed as empty ...	2010-10-22 17:07:18 -07:00
Jens Axboe	fa251f8990	Merge branch 'v2.6.36-rc8' into for-2.6.37/barrier Conflicts: block/blk-core.c drivers/block/loop.c mm/swapfile.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-19 09:13:04 +02:00
Jan Blunck	d6d4c19c5f	BKL: Remove BKL from NILFS2 The BKL is only used in put_super, fill_super and remount_fs that are all three protected by the superblocks s_umount rw_semaphore. Therefore it is safe to remove the BKL entirely. Signed-off-by: Jan Blunck <jblunck@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-04 21:10:41 +02:00
Jan Blunck	db71922217	BKL: Explicitly add BKL around get_sb/fill_super This patch is a preparation necessary to remove the BKL from do_new_mount(). It explicitly adds calls to lock_kernel()/unlock_kernel() around get_sb/fill_super operations for filesystems that still uses the BKL. I've read through all the code formerly covered by the BKL inside do_kern_mount() and have satisfied myself that it doesn't need the BKL any more. do_kern_mount() is already called without the BKL when mounting the rootfs and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called from various places without BKL: simple_pin_fs(), nfs_do_clone_mount() through nfs_follow_mountpoint(), afs_mntpt_do_automount() through afs_mntpt_follow_link(). Both later functions are actually the filesystems follow_link inode operation. vfs_kern_mount() is calling the specified get_sb function and lets the filesystem do its job by calling the given fill_super function. Therefore I think it is safe to push down the BKL from the VFS to the low-level filesystems get_sb/fill_super operation. [arnd: do not add the BKL to those file systems that already don't use it elsewhere] Signed-off-by: Jan Blunck <jblunck@infradead.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Christoph Hellwig <hch@infradead.org>	2010-10-04 21:10:10 +02:00
Christoph Hellwig	dd3932eddf	block: remove BLKDEV_IFL_WAIT All the blkdev_issue_* helpers can only sanely be used for synchronous caller. To issue cache flushes or barriers asynchronously the caller needs to set up a bio by itself with a completion callback to move the asynchronous state machine ahead. So drop the BLKDEV_IFL_WAIT flag that is always specified when calling blkdev_issue_* and also remove the now unused flags argument to blkdev_issue_flush and blkdev_issue_zeroout. For blkdev_issue_discard we need to keep it for the secure discard flag, which gains a more descriptive name and loses the bitops vs flag confusion. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-16 20:52:58 +02:00
Christoph Hellwig	f8c131f5b6	nilfs2: replace barriers with explicit flush / FUA usage Switch to the WRITE_FLUSH_FUA flag for log writes, remove the EOPNOTSUPP detection for barriers and stop setting the barrier flag for discards. tj: nilfs is now fixed to wait for discard completion. Updated this patch accordingly and dropped warning about it. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-10 12:35:39 +02:00
Ryusuke Konishi	4afc31345e	nilfs2: fix leak of shadow dat inode in error path of load_nilfs If load_nilfs() gets an error while doing recovery, it will fail to free the shadow inode of dat (nilfs->ns_gc_dat). This fixes the leak issue. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-08-30 10:18:03 +09:00
Linus Torvalds	a28e0852d4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: wait for discard to finish	2010-08-22 09:44:47 -07:00
Linus Torvalds	145c3ae46b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: fs: brlock vfsmount_lock fs: scale files_lock lglock: introduce special lglock and brlock spin locks tty: fix fu_list abuse fs: cleanup files_lock locking fs: remove extra lookup in __lookup_hash fs: fs_struct rwlock to spinlock apparmor: use task path helpers fs: dentry allocation consolidation fs: fix do_lookup false negative mbcache: Limit the maximum number of cache entries hostfs ->follow_link() braino hostfs: dumb (and usually harmless) tpyo - strncpy instead of strlcpy remove SWRITE* I/O types kill BH_Ordered flag vfs: update ctime when changing the file's permission by setfacl cramfs: only unlock new inodes fix reiserfs_evict_inode end_writeback second call	2010-08-18 09:35:08 -07:00
Ryusuke Konishi	1cb0c924fa	nilfs2: wait for discard to finish nilfs_discard_segment() doesn't wait for completion of discard requests. This specifies BLKDEV_IFL_WAIT flag when calling blkdev_issue_discard() in order to fix the sync failure. Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Christoph Hellwig <hch@lst.de>	2010-08-19 00:11:06 +09:00
Christoph Hellwig	87e99511ea	kill BH_Ordered flag Instead of abusing a buffer_head flag just add a variant of sync_dirty_buffer which allows passing the exact type of write flag required. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-18 01:09:00 -04:00
Ryusuke Konishi	ea1a16f716	nilfs2: fix false warning saying one of two super blocks is broken After applying commit `b2ac86e1`, the following message got appeared after unclean shutdown: > NILFS warning: broken superblock. using spare superblock. This turns out to be a false message due to the change which updates two super blocks alternately. The secondary super block now can be selected if it's newer than the primary one. This kills the false warning by suppressing it if another super block is not actually broken. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-08-16 11:08:36 +09:00
Ryusuke Konishi	af4e36318e	nilfs2: fix list corruption after ifile creation failure If nilfs_attach_checkpoint() gets a memory allocation failure during creation of ifile, it will return without removing nilfs_sb_info struct from ns_supers list. When a concurrently mounted snapshot is unmounted or another new snapshot is mounted after that, this causes kernel oops as below: > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: [<f83662ff>] nilfs_find_sbinfo+0x74/0xa4 [nilfs2] > *pde = 00000000 > Oops: 0000 [#1] SMP <snip> > Call Trace: > [<f835dc29>] ? nilfs_get_sb+0x165/0x532 [nilfs2] > [<c1173c87>] ? ida_get_new_above+0x16d/0x187 > [<c109a7f8>] ? alloc_vfsmnt+0x7e/0x10a > [<c1070790>] ? kstrdup+0x2c/0x40 > [<c1089041>] ? vfs_kern_mount+0x96/0x14e > [<c108913d>] ? do_kern_mount+0x32/0xbd > [<c109b331>] ? do_mount+0x642/0x6a1 > [<c101a415>] ? do_page_fault+0x0/0x2d1 > [<c1099c00>] ? copy_mount_options+0x80/0xe2 > [<c10705d8>] ? strndup_user+0x48/0x67 > [<c109b3f1>] ? sys_mount+0x61/0x90 > [<c10027cc>] ? sysenter_do_call+0x12/0x22 This fixes the problem. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable@kernel.org	2010-08-16 11:08:36 +09:00
Linus Torvalds	2f9e825d3e	Merge branch 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits) block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n xen-blkfront: fix missing out label blkdev: fix blkdev_issue_zeroout return value block: update request stacking methods to support discards block: fix missing export of blk_types.h writeback: fix bad _bh spinlock nesting drbd: revert "delay probes", feature is being re-implemented differently drbd: Initialize all members of sync_conf to their defaults [Bugz 315] drbd: Disable delay probes for the upcomming release writeback: cleanup bdi_register writeback: add new tracepoints writeback: remove unnecessary init_timer call writeback: optimize periodic bdi thread wakeups writeback: prevent unnecessary bdi threads wakeups writeback: move bdi threads exiting logic to the forker thread writeback: restructure bdi forker loop a little writeback: move last_active to bdi writeback: do not remove bdi from bdi_list writeback: simplify bdi code a little writeback: do not lose wake-ups in bdi threads ... Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and drivers/scsi/scsi_error.c as per Jens.	2010-08-10 15:22:42 -07:00
Linus Torvalds	5f248c9c25	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits) no need for list_for_each_entry_safe()/resetting with superblock list Fix sget() race with failing mount vfs: don't hold s_umount over close_bdev_exclusive() call sysv: do not mark superblock dirty on remount sysv: do not mark superblock dirty on mount btrfs: remove junk sb_dirt change BFS: clean up the superblock usage AFFS: wait for sb synchronization when needed AFFS: clean up dirty flag usage cifs: truncate fallout mbcache: fix shrinker function return value mbcache: Remove unused features add f_flags to struct statfs(64) pass a struct path to vfs_statfs update VFS documentation for method changes. All filesystems that need invalidate_inode_buffers() are doing that explicitly convert remaining ->clear_inode() to ->evict_inode() Make ->drop_inode() just return whether inode needs to be dropped fs/inode.c:clear_inode() is gone fs/inode.c:evict() doesn't care about delete vs. non-delete paths now ... Fix up trivial conflicts in fs/nilfs2/super.c	2010-08-10 11:26:52 -07:00
Al Viro	6fd1e5c994	convert nilfs2 to ->evict_inode() [folded build fix from sfr] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:25 -04:00
Al Viro	a4ffdde6e5	simplify checks for I_CLEAR/I_FREEING add I_CLEAR instead of replacing I_FREEING with it. I_CLEAR is equivalent to I_FREEING for almost all code looking at either; it's there to keep track of having called clear_inode() exactly once per inode lifetime, at some point after having set I_FREEING. I_CLEAR and I_FREEING never get set at the same time with the current code, so we can switch to setting i_flags to I_FREEING \| I_CLEAR instead of I_CLEAR without loss of information. As the result of such change, checks become simpler and the amount of code that needs to know about I_CLEAR shrinks a lot. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:44 -04:00
Christoph Hellwig	1025774ce4	remove inode_setattr Replace inode_setattr with opencoded variants of it in all callers. This moves the remaining call to vmtruncate into the filesystem methods where it can be replaced with the proper truncate sequence. In a few cases it was obvious that we would never end up calling vmtruncate so it was left out in the opencoded variant: spufs: explicitly checks for ATTR_SIZE earlier btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above In addition to that ncpfs called inode_setattr with handcrafted iattrs, which allowed to trim down the opencoded variant. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:37 -04:00
Christoph Hellwig	155130a4f7	get rid of block_write_begin_newtrunc Move the call to vmtruncate to get rid of accessive blocks to the callers in preparation of the new truncate sequence and rename the non-truncating version to block_write_begin. While we're at it also remove several unused arguments to block_write_begin. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:33 -04:00
Christoph Hellwig	6e1db88d53	introduce __block_write_begin Split up the block_write_begin implementation - __block_write_begin is a new trivial wrapper for block_prepare_write that always takes an already allocated page and can be either called from block_write_begin or filesystem code that already has a page allocated. Remove the handling of already allocated pages from block_write_begin after switching all callers that do it to __block_write_begin. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:32 -04:00
Christoph Hellwig	f4e420dc42	clean up write_begin usage for directories in pagecache For filesystem that implement directories in pagecache we call block_write_begin with an already allocated page for this code, while the normal regular file write path uses the default block_write_begin behaviour. Get rid of the __foofs_write_begin helper and opencode the normal write_begin call in foofs_write_begin, while adding a new foofs_prepare_chunk helper for the directory code. The added benefit is that foofs_prepare_chunk has a much saner calling convention. Note that the interruptible flag passed into block_write_begin is always ignored if we already pass in a page (see next patch for details), and we never were doing truncations of exessive blocks for this case either so we can switch directly to block_write_begin_newtrunc. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:31 -04:00
Christoph Hellwig	eafdc7d190	sort out blockdev_direct_IO variants Move the call to vmtruncate to get rid of accessive blocks to the callers in prepearation of the new truncate calling sequence. This was only done for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant was not needed anyway. Get rid of blockdev_direct_IO_no_locking and its _newtrunc variant while at it as just opencoding the two additional paramters is shorted than the name suffix. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:47:29 -04:00
Christoph Hellwig	7b6d91daee	block: unify flags for struct bio and struct request Remove the current bio flags and reuse the request flags for the bio, too. This allows to more easily trace the type of I/O from the filesystem down to the block driver. There were two flags in the bio that were missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've renamed two request flags that had a superflous RW in them. Note that the flags are in bio.h despite having the REQ_ name - as blkdev.h includes bio.h that is the only way to go for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:20:39 +02:00
Ryusuke Konishi	89c0fd014d	nilfs2: reject filesystem with unsupported block size This inserts sanity check that refuses to mount a filesystem with unsupported block size. Previously, kernel code of nilfs was looking only limitation of devices though mkfs.nilfs2 limits the range of block sizes; there was no check that prevents rec_len overflow with larger block sizes. With this change, block sizes larger than 64KB or smaller than 1KB will get rejected explicitly by kernel. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-25 23:29:21 +09:00
Ryusuke Konishi	6cda9fa257	nilfs2: avoid rec_len overflow with 64KB block size With 64KB blocksize, a directory entry can have size 64KB which does not fit into 16 bits we have for entry length. So this patch stores 0xffff instead and converts value when read from / written to disk. Nilfs derives its directory implementation from ext2 filesystem, and this draws upon the corresponding change on ext2. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-25 20:46:43 +09:00
Ryusuke Konishi	c28e69d933	nilfs2: simplify nilfs_get_page function Implementation of nilfs_get_page() is a bit old as below: - A common read_mapping_page inline function is now available instead of its read_cache_page use. - wait_on_page_locked() use in the function is eliminable since read_cache_page function does the same thing through wait_on_page_read(). - PageUptodate() check is eliminable for the same reason. This renews nilfs_get_page() based on these points. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-24 17:36:29 +09:00
Ryusuke Konishi	c5ca48aabe	nilfs2: reject incompatible filesystem This forces nilfs to check compatibility of feature flags so as to reject a filesystem with unknown features when it mounts or remounts the filesystem. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:16 +09:00
Ryusuke Konishi	03bdb5ac58	nilfs2: apply read-ahead for nilfs_btree_lookup_contig This applies read-ahead to nilfs_btree_do_lookup and nilfs_btree_lookup_contig functions and extends them to read ahead siblings of level 1 btree nodes that hold data blocks. At present, the read-ahead is not applied to most btree operations; only get_block() callback function, which is used during read of regular files or directories, receives the benefit. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:16 +09:00
Ryusuke Konishi	4e13e66bee	nilfs2: introduce check flag to btree node buffer nilfs_btree_get_block() now may return untested buffer due to read-ahead. This adds a new flag for buffer heads so that the btree code can check whether the buffer is already verified or not. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:15 +09:00
Ryusuke Konishi	464ece8863	nilfs2: add btree get block function with readahead option This adds __nilfs_btree_get_block() function that can issue a series of read-ahead requests for sibling btree nodes. This read-ahead needs parent node block, so nilfs_btree_readahead_info structure is added to pass the information that __nilfs_btree_get_block() needs. This also replaces the previous nilfs_btree_get_block() implementation with a wrapper function of __nilfs_btree_get_block(). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:15 +09:00
Ryusuke Konishi	26dfdd8e29	nilfs2: add read ahead mode to nilfs_btnode_submit_block This adds mode argument to nilfs_btnode_submit_block() function and allows it to issue a read-ahead request. An optional submit_ptr argument is also added to store the actual block address for which bio is sent. submit_ptr is used for a series of read-ahead requests, and helps to decide if each requested block is continous to the previous one on disk. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:15 +09:00
Ryusuke Konishi	f8e6cc013b	nilfs2: fix buffer head leak in nilfs_btnode_submit_block nilfs_btnode_submit_block() refers to buffer head just before returning from the function, but it releases the buffer head earlier than that if nilfs_dat_translate() gets an error. This has potential for oops in the erroneous case. This fixes the issue. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:15 +09:00
Ryusuke Konishi	7c397a81fe	nilfs2: eliminate inline keywords in btree implementation This removes all inline uses from btree.c. Gcc now agressively apply inline expansion even for the functions declared without the keyword; the inline use in btree.c looks excessive. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:15 +09:00
Ryusuke Konishi	5ad2686e92	nilfs2: get maximum number of child nodes from bmap object The patch "reduce repetitive calculation of max number of child nodes" gathered up the calculation of maximum number of child nodes into nilfs_btree_nchildren_per_block() function. This makes the function get resultant value from a private variable in bmap object instead of calculating it for each call. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:14 +09:00
Ryusuke Konishi	9b7b265c9a	nilfs2: reduce repetitive calculation of max number of child nodes The current btree implementation repeats the same calculation on the maximum number of child nodes. This is because a few low level routines use the calculation for index addressing in a btree node block. This reduces the calculation by explicitly passing the maximum number of child nodes (ncmax) through their argument. This changes parameter passing of the following functions: - nilfs_btree_node_dptrs - nilfs_btree_node_get_ptr - nilfs_btree_node_set_ptr - nilfs_btree_node_init - nilfs_btree_node_move_left - nilfs_btree_node_move_right - nilfs_btree_node_insert - nilfs_btree_node_delete, and - nilfs_btree_get_node The following functions are removed: - nilfs_btree_node_nchildren_min - nilfs_btree_node_nchildren_max Most middle level btree operations are rewritten to pass a proper ncmax value depending on whether each occurrence of node is "root" or not. A constant NILFS_BTREE_ROOT_NCHILDREN_MAX is used for the root node, whereas nilfs_btree_nchildren_per_block() function is used for non-root nodes. If a node could be either root or a non-root node, an output argument of nilfs_btree_get_node() is used to set up ncmax. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:14 +09:00
Ryusuke Konishi	ea64ab87cd	nilfs2: optimize calculation of min/max number of btree node children nilfs_btree_node_nchildren_max() and nilfs_btree_node_nchildren_min() functions switch return value depending on whether target node is the root or a node block. In most uses of these functions, however, the node type is fixed, and moreover the same calculation is repeatedly performed in loop. This unfold these functions depending on context and move them outside loops wherever possible. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:14 +09:00
Ryusuke Konishi	364ec2d700	nilfs2: remove redundant pointer checks in bmap lookup functions nilfs_bmap_lookup and its variants are supposed to take a valid pointer argument to return a block address, thus pointer checks in nilfs_btree_lookup and nilfs_direct_lookup are needless. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:14 +09:00
Ryusuke Konishi	05d0e94b66	nilfs2: get rid of nilfs_bmap_union This removes nilfs_bmap_union and finally unifies three structures and the union in bmap/btree code into one. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:14 +09:00
Ryusuke Konishi	dc935be2a0	nilfs2: unify bmap set_target_v operations This unifies two similar functions nilfs_btree_set_target_v and nilfs_direct_set_target_v into one, nilfs_bmap_set_target_v. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:14 +09:00
Ryusuke Konishi	e7c274f808	nilfs2: get rid of nilfs_btree uses This replaces all uses of nilfs_btree struct in implementation of btree mapping with nilfs_bmap struct. Name of local variable "btree" is kept not to bloat amount of change. And, a part of local variables "bmap" is renamed to "btree" to uniform naming rule. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:13 +09:00
Ryusuke Konishi	10ff885ba6	nilfs2: get rid of nilfs_direct uses This replaces all uses of nilfs_direct struct in implementation of direct mapping with nilfs_bmap struct. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:13 +09:00
Ryusuke Konishi	583ada4761	nilfs2: remove constant qualifier from argument of bmap propagate The first argument of bops->bop_propagate operation takes a constant qualifier, and causes compilation error when removed cast to pointer of nilfs_btree structure type. This fixes the issue to prepare for succesive removal of nilfs_btree struct. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:13 +09:00
Ryusuke Konishi	25b8d7ded0	nilfs2: get rid of private conversion macros on bmap key and pointer Will remove nilfs_bmap_key_to_dkey(), nilfs_bmap_dkey_to_key(), nilfs_bmap_ptr_to_dptr(), and nilfs_bmap_dptr_to_ptr() for simplicity. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:13 +09:00
Ryusuke Konishi	1d5385b9f3	nilfs2: verify btree node after reading This inserts sanity checks soon after read btree node from disk. This allows early detection of broken btree nodes, and helps to narrow down problems due to file system corruption. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:13 +09:00
Ryusuke Konishi	cfa913a507	nilfs2: add sanity check in nilfs_btree_add_dirty_buffer According to the report titled "problem with nilfs_cleanerd" from Łukasz Wójcicki, nilfs_btree_lookup_dirty_buffers or nilfs_btree_add_dirty_buffer got memory violation during garbage collection. This could happen if a level field of given btree node buffer is incorrect, which is a crucial internal bug. This inserts a sanity check to figure out the problem. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:12 +09:00
Ryusuke Konishi	7c01745781	nilfs2: pass remount flag to parse_options This adds is_remount argument to the parse_options() function that obtains mount options from strings. Previously, parse_options did not distinguish context whether it's called for a new mount or remount, so the caller needed additional verifications outside the function. This allows parse_options to verify options and print messages depending on the context. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:12 +09:00
Ryusuke Konishi	c6b4d57ddf	nilfs2: use seq_puts to print mount options without argument This replaces seq_printf() with seq_puts() in nilfs_show_options for mount options which have no argument. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:12 +09:00
Ryusuke Konishi	802d317754	nilfs2: add nodiscard mount option Nilfs has "discard" mount option which issues discard/TRIM commands to underlying block device, but it lacks a complementary option and has no way to disable the feature through remount. This adds "nodiscard" option to resolve this imbalance. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:12 +09:00
Ryusuke Konishi	773bc4f3b6	nilfs2: add barrier mount option Nilfs enables write barriers by default and has "nobarrier" mount option to disable this feature. But it lacks the complementary option and has no way to re-enable the feature on remount. This adds "barrier" option to resolve this imbalance. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:12 +09:00
Ryusuke Konishi	325020477a	nilfs2: do not update log cursor for small change Super blocks of nilfs are periodically overwritten in order to record the recent log position. This shortens recovery time after unclean unmount, but the current implementation performs the update even for a few blocks of change. If the filesystem gets small changes slowly and continually, super blocks may be updated excessively. This moderates the issue by skipping update of log cursor if it does not cross a segment boundary. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:11 +09:00
Ryusuke Konishi	6c12516083	nilfs2: implement fallback for super root search Although nilfs redundantly uses two super blocks and each may point to different position on log, the current version of nilfs does not try fallback to the spare super block when it doesn't find any valid log at the position that the primary super block points to. This has been a cause of mount failures due to write order reversals on barrier less block devices. This inserts fallback code in error path of nilfs_search_super_root routine to resolve the mount failure problem. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:11 +09:00
Ryusuke Konishi	2d72b99ecd	nilfs2: add missing error code in comment of nilfs_search_super_root nilfs_search_super_root can return -ENOMEM, but this error code is not described in its kernel-doc comment. This fixes the discrepancy. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:11 +09:00
Ryusuke Konishi	843d63baa5	nilfs2: separate setup of log cursor from init_nilfs This separates a setup routine of log cursor from init_nilfs(). The routine, nilfs_store_log_cursor, reads the last position of the log containing a super root, and initializes relevant state on the nilfs object. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:11 +09:00
Jiro SEKIBA	b2ac86e1a8	nilfs2: sync super blocks in turns This will sync super blocks in turns instead of syncing duplicate super blocks at the time. This will help searching valid super root when super block is written into disk before log is written, which is happen when barrier-less block devices are unmounted uncleanly. In the situation, old super block likely points to valid log. This patch introduces ns_sbwcount member to the nilfs object and adds nilfs_sb_will_flip() function; ns_sbwcount counts how many times super blocks write back to the disk. And, nilfs_sb_will_flip() decides whether flipping required or not based on the count of ns_sbwcount to sync super blocks asymmetrically. The following functions are also changed: - nilfs_prepare_super(): flips super blocks according to the argument. The argument is calculated by nilfs_sb_will_flip() function. - nilfs_cleanup_super(): sets "clean" flag to both super blocks if they point to the same checkpoint. To update both of super block information, caller of nilfs_commit_super must set the information on both super blocks. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:11 +09:00
Jiro SEKIBA	d26493b6f0	nilfs2: introduce nilfs_prepare_super This function checks validity of super block pointers. If first super block is invalid, it will swap the super blocks. The function should be called before any super block information updates. Caller must obtain nilfs->ns_sem. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:10 +09:00
Ryusuke Konishi	60f46b7efc	nilfs2: separate function that updates log position This moves out section that updates information of the recent log position stored in super blocks from nilfs_commit_super to a new routine named nilfs_set_log_cursor. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:10 +09:00
Ryusuke Konishi	c8a11c8a14	nilfs2: add nilfs_set_error This function marks error state and write it on super blocks. This is a preparation for making super block writeback alternately. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:10 +09:00
Ryusuke Konishi	7ecaa46cfe	nilfs2: add nilfs_cleanup_super This function write out filesystem state to super blocks in order to share the same cleanup work. This is a preparation for making super block writeback alternately. Cc: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:10 +09:00
Ryusuke Konishi	bde4e696e4	nilfs2: do not update mount time on rw->ro remount Mount time field in super block is wrongly updated when nilfs remounts the partition from read-write to read-only. This fixes the issue. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:10 +09:00
Ryusuke Konishi	57a4bfc486	nilfs2: get rid of ns_free_segments_count This counter is unused. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:09 +09:00
Ryusuke Konishi	4762077c7b	nilfs2: get rid of macros for segment summary information This removes macros to test segment summary flags and redefines a few relevant macros with inline functions. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:09 +09:00
Ryusuke Konishi	85655484f8	nilfs2: do not use nilfs_segsum_info structure in recovery code This will get rid of nilfs_segsum_info use from recovery functions for simplicity. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:09 +09:00
Ryusuke Konishi	354fa8be28	nilfs2: divide load_segment_summary function load_segment_summary function has two distinct roles: getting summary header of a log, and verifying consistencies of the log. This divide it into two corresponding functions, nilfs_read_log_header and nilfs_validate_log to clarify the meaning. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:09 +09:00
Ryusuke Konishi	aee5ce2f57	nilfs2: rename nilfs_recover_logical_segments function The function name of nilfs_recover_logical_segments makes no sense. This changes the name into nilfs_salvage_orphan_logs to clarify the role of the function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:09 +09:00
Ryusuke Konishi	8b94025c00	nilfs2: refactor recovery logic routines Most functions in recovery code take an argument of a super block instance or a nilfs_sb_info struct for convenience sake. This replaces them aggressively with a nilfs object by applying __bread and __breadahead against routines using sb_bread and sb_breadahead. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:08 +09:00
Ryusuke Konishi	92c60ccaf3	nilfs2: add blocksize member to nilfs object This stores blocksize in nilfs objects for the successive refactoring of recovery logic. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-07-23 10:02:08 +09:00
Ryusuke Konishi	c29684d683	nilfs2: remove obsolete declarations of cache constructor and destructor The commit `41c88bd7` ("nilfs2: cleanup multi kmem_cache_{create,destroy} code") consolidated slab constructors and destructors used in nilfs, but it left some declarations in header files. This gets rid of the obsolete declarations. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-31 20:50:29 +09:00
Ryusuke Konishi	84cb099985	nilfs2: fix style issue in nilfs_destroy_cachep This gets rid of unwanted space chars in front of conditional sentences of nilfs_destroy_cachep(). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-31 20:50:29 +09:00
Christoph Hellwig	7ea8085910	drop unused dentry argument to ->fsync Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-27 22:05:02 -04:00
Linus Torvalds	e8bebe2f71	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (69 commits) fix handling of offsets in cris eeprom.c, get rid of fake on-stack files get rid of home-grown mutex in cris eeprom.c switch ecryptfs_write() to struct inode , kill on-stack fake files switch ecryptfs_get_locked_page() to struct inode simplify access to ecryptfs inodes in ->readpage() and friends AFS: Don't put struct file on the stack Ban ecryptfs over ecryptfs logfs: replace inode uid,gid,mode initialization with helper function ufs: replace inode uid,gid,mode initialization with helper function udf: replace inode uid,gid,mode init with helper ubifs: replace inode uid,gid,mode initialization with helper function sysv: replace inode uid,gid,mode initialization with helper function reiserfs: replace inode uid,gid,mode initialization with helper function ramfs: replace inode uid,gid,mode initialization with helper function omfs: replace inode uid,gid,mode initialization with helper function bfs: replace inode uid,gid,mode initialization with helper function ocfs2: replace inode uid,gid,mode initialization with helper function nilfs2: replace inode uid,gid,mode initialization with helper function minix: replace inode uid,gid,mode init with helper ext4: replace inode uid,gid,mode init with helper ... Trivial conflict in fs/fs-writeback.c (mark bitfields unsigned)	2010-05-21 19:37:45 -07:00
Dmitry Monakhov	73459dcc67	nilfs2: replace inode uid,gid,mode initialization with helper function Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-21 18:31:25 -04:00
Jens Axboe	ee9a3607fb	Merge branch 'master' into for-2.6.35 Conflicts: fs/ext3/fsync.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-05-21 21:27:26 +02:00
Ryusuke Konishi	d240e06713	nilfs2: disallow remount of snapshot from/to a regular mount Snapshots and regular ro/rw mounts are essentially-different within the meaning whether the checkpoint is static or not and is marked with a snapshot flag or not. The current implemenation, however, allows to remount a snapshot to a regular rw-mount if the checkpoint number equals the latest one. This transition is actually impossible since changing a checkpoint to a snapshot makes another checkpoint, thus the condition is never satisfied. This fixes the weird state of affairs, and specifically separates snapshots and regular rw/ro-mounts. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:34 +09:00
Ryusuke Konishi	cdce214e39	nilfs2: use huge_encode_dev/huge_decode_dev This replaces uses of new_encode_dev/new_decode_dev with their 64-bit counterparts, huge_encode_dev/huge_decode_dev respectively. This is just for clarification and has no impact on the disk format. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:34 +09:00
Ryusuke Konishi	b87ca91948	nilfs2: update comment on deactivate_super at nilfs_get_sb deactivate_super was replaced with deactivate_locked_super, but the comment of nilfs_get_sb remain unchanged. This renews the comment. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:34 +09:00
Ryusuke Konishi	e2d1591a13	nilfs2: replace MS_VERBOSE with MS_SILENT MS_VERBOSE is deprecated. This replaces it with MS_SILENT in reference to get_sb_bdev function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:34 +09:00
Ryusuke Konishi	4571b82cdc	nilfs2: add missing initialization of s_mode An fmode_t argument is passed to kill_block_super() through s_mode member of the super_block structure. This is used to release the block device with the same mode, however, nilfs does not set s_mode anywhere. This modifies nilfs_get_sb function to properly initialize the s_mode member. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:33 +09:00
Ryusuke Konishi	13e905592b	nilfs2: fix misuse of open_bdev_exclusive/close_bdev_exclusive The second argument of open_bdev_exclusive/close_bdev_exclusive takes fmode_t flags instead of mount flags. This fixes the misuse. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:33 +09:00
Ryusuke Konishi	25294d8c37	nilfs2: use checkpoint number instead of timestamp to select super block Nilfs maintains two super blocks, and selects the new one on mount time if they both have valid checksums and their timestamps differ. However, this has potential for mis-selection since the system clock may be rewinded and the resolution of the timestamps is not high. Usually this doesn't become an issue because both super blocks are updated at the same time when the file system is unmounted. Even if the file system wasn't unmounted cleanly, the roll-forward recovery will find the proper log which stores the latest super root. Thus, the issue can appear only if update of one super block fails and the clock happens to be rewinded. This fixes the issue by using checkpoint numbers instead of timestamps to pick the super block storing the location of the latest log. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:32 +09:00
Ryusuke Konishi	34cb9b5c97	nilfs2: add missing endian conversion on super block magic number This adds missing endian conversions in comparision of the magic number of super blocks. It was coincidence that prior versions didn't incur problems; the upper byte of the magic number happened to be equal to the lower byte. But, semantically it's wrong to depend on this. This won't change anything else nor suffer any compatibility issues. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:32 +09:00
Ryusuke Konishi	4e819509cb	nilfs2: make nilfs_sc_*_ops static This kills the following sparse warnings: fs/nilfs2/segment.c:567:28: warning: symbol 'nilfs_sc_file_ops' was not declared. Should it be static? fs/nilfs2/segment.c:617:28: warning: symbol 'nilfs_sc_dat_ops' was not declared. Should it be static? fs/nilfs2/segment.c:625:28: warning: symbol 'nilfs_sc_dsync_ops' was not declared. Should it be static? Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:31 +09:00
Ryusuke Konishi	db55d92252	nilfs2: add kernel doc comments to persistent object allocator functions The implementation of persistent object allocator (alloc.c) is poorly documented. This adds kernel doc style comments on that functions. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:31 +09:00
Li Hong	fdce895ea5	nilfs2: change sc_timer from a pointer to an embedded one in struct nilfs_sc_info In nilfs_segctor_thread(), timer is a local variable allocated on stack. Its address can't be set to sci->sc_timer and passed in several procedures. It works now by chance, just because other procedures are called by nilfs_segctor_thread() directly or indirectly and the stack hasn't been deallocated yet. Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:31 +09:00
Li Hong	154ac5a830	nilfs2: remove nilfs_segctor_init() in segment.c There are only two lines of code in nilfs_segctor_init(). From a logic design view, the first line 'sci->sc_seq_done = sci->sc_seq_request;' should be put in nilfs_segctor_new(). Even in nilfs_segctor_new(), this initialization is needless because sci is kzalloc-ed. So nilfs_segctor_init() is only a wrap call to nilfs_segctor_start_thread(). Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:31 +09:00
Ryusuke Konishi	50614bcf29	nilfs2: insert checkpoint number in segment summary header This adds a field to record the latest checkpoint number in the nilfs_segment_summary structure. This will help to recover the latest checkpoint number from logs on disk. This field is intended for crucial cases in which super blocks have lost pointer to the latest log. Even though this will change the disk format, both backward and forward compatibility is preserved by a size field prepared in the segment summary header. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:31 +09:00
Li Hong	9f130263f3	nilfs2: add a print message after loading nilfs2 Printing a message after loading a file system is a practice. Add this to provide a better user-friendly experience. Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:30 +09:00
Li Hong	41c88bd74d	nilfs2: cleanup multi kmem_cache_{create,destroy} code This cleanup patch gives several improvements: - Moving all kmem_cache_{create_destroy} calls into one place, which removes some small function calls, cleans up error check code and clarify the logic. - Mark all initial code in __init section. - Remove some very obvious comments. - Adjust some declarations. - Fix some space-tab issues. Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:30 +09:00
Ryusuke Konishi	aaed1d5bfa	nilfs2: move out checksum routines to segment buffer code This moves out checksum routines in log writer to segbuf.c for cleanup. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:30 +09:00
Ryusuke Konishi	1e2b68bf28	nilfs2: move pointer to super root block into logs This moves a pointer to buffer storing super root block to each log buffer from nilfs_sc_info struct for simplicity. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:30 +09:00
Ryusuke Konishi	277a6a3417	nilfs2: change default of 'errors' mount option to 'remount-ro' mode Like ext3, nilfs has 'errors' mount option to allow specifying desired behavior on severe errors. Currently, the default action is 'errors=continue' and has potential to advance filesystem corruption for severe errors. This will change the action to 'errors=remount-ro' to avoid the issue. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:30 +09:00
Li Hong	73bb48869b	nilfs2: Combine nilfs_btree_release_path() and nilfs_btree_free_path() nilfs_btree_release_path() and nilfs_btree_free_path() are bound into each other tightly. Make them into one procedure to clearify the logic and avoid some misusages. Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:29 +09:00
Li Hong	f905440f5e	nilfs2: Combine nilfs_btree_alloc_path() and nilfs_btree_init_path() nilfs_btree_alloc_path() and nilfs_btree_init_path() are bound into each other tightly. Make them into one procedure to clearify the logic and avoid some misusages. Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-05-10 11:32:29 +09:00
Ryusuke Konishi	973bec34bf	nilfs2: fix sync silent failure As of `32a88aa1`, __sync_filesystem() will return 0 if s_bdi is not set. And nilfs does not set s_bdi anywhere. I noticed this problem by the warning introduced by the recent commit `5129a469` ("Catch filesystem lacking s_bdi"). WARNING: at fs/super.c:959 vfs_kern_mount+0xc5/0x14e() Hardware name: PowerEdge 2850 Modules linked in: nilfs2 loop tpm_tis tpm tpm_bios video shpchp pci_hotplug output dcdbas Pid: 3773, comm: mount.nilfs2 Not tainted 2.6.34-rc6-debug #38 Call Trace: [<c1028422>] warn_slowpath_common+0x60/0x90 [<c102845f>] warn_slowpath_null+0xd/0x10 [<c1095936>] vfs_kern_mount+0xc5/0x14e [<c1095a03>] do_kern_mount+0x32/0xbd [<c10a811e>] do_mount+0x671/0x6d0 [<c1073794>] ? __get_free_pages+0x1f/0x21 [<c10a684f>] ? copy_mount_options+0x2b/0xe2 [<c107b634>] ? strndup_user+0x48/0x67 [<c10a81de>] sys_mount+0x61/0x8f [<c100280c>] sysenter_do_call+0x12/0x32 This ensures to set s_bdi for nilfs and fixes the sync silent failure. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-03 07:36:01 -07:00
Stephen Rothwell	6a47dc1418	nilfs: fix breakage caused by barrier flag changes After merging the block tree, today's linux-next build (powerpc ppc64_defconfig) failed like this: fs/nilfs2/the_nilfs.c: In function 'nilfs_discard_segments': fs/nilfs2/the_nilfs.c:673: error: 'DISCARD_FL_BARRIER' undeclared (first use in this function) Caused by commit `fbd9b09a17` ("blkdev: generalize flags for blkdev_issue_fn functions") interacting with commit `e902ec9906` ("nilfs2: issue discard request after cleaning segments") (which netered Linus' tree on about March 4 - before v2.6.34-rc1). Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-04-29 09:32:00 +02:00
Linus Torvalds	44fa2b4bee	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix typo "numer" -> "number" in alloc.c nilfs2: Remove an uninitialization warning in nilfs_btree_propagate_v() nilfs2: fix a wrong type conversion in nilfs_ioctl()	2010-04-12 18:34:25 -07:00
Ryusuke Konishi	be3bd2223b	nilfs2: fix typo "numer" -> "number" in alloc.c Fixes the typo found in a warning message of a persistent object allocator function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-04-12 01:51:03 +09:00
Li Hong	308f44193f	nilfs2: Remove an uninitialization warning in nilfs_btree_propagate_v() `make CONFIG_NILFS2_FS=m M=fs/nilfs2/` will give the following warnings: fs/nilfs2/btree.c: In function 'nilfs_btree_propagate': fs/nilfs2/btree.c:1882: warning: 'maxlevel' may be used uninitialized in this function fs/nilfs2/btree.c:1882: note: 'maxlevel' was declared here Set maxlevel = 0 to fix it. Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-04-02 20:03:30 +09:00
Li Hong	753234007f	nilfs2: fix a wrong type conversion in nilfs_ioctl() (void * __user ) should be (void __user ) Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-03-31 16:55:00 +09:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Ryusuke Konishi	d067633b44	nilfs2: fix imperfect completion wait in nilfs_wait_on_logs nilfs_wait_on_logs has a potential to slip out before completion of all bio requests when it met an error. This synchronization fault may cause unexpected results, for instance, violative access to freed segment buffers from an end-bio callback routine. This fixes the issue by ensuring that nilfs_wait_on_logs waits all given logs. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-03-24 01:17:20 +09:00
Ryusuke Konishi	110d735a0a	nilfs2: fix hang-up of cleaner after log writer returned with error According to the report from Andreas Beckmann (Message-ID: <4BA54677.3090902@abeckmann.de>), nilfs in 2.6.33 kernel got stuck after a disk full error. This turned out to be a regression by log writer updates merged at kernel 2.6.33. nilfs_segctor_abort_construction, which is a cleanup function for erroneous cases, was skipping writeback completion for some logs. This fixes the bug and would resolve the hang issue. Reported-by: Andreas Beckmann <debian@abeckmann.de> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable <stable@kernel.org> [2.6.33.x]	2010-03-24 00:03:06 +09:00
Ryusuke Konishi	2d8428acae	nilfs2: fix duplicate call to nilfs_segctor_cancel_freev Andreas Beckmann gave me a report that nilfs logged the following warnings when it got a disk full: nilfs_sufile_do_cancel_free: segment 0 must be clean nilfs_sufile_do_cancel_free: segment 1 must be clean These arise from a duplicate call to nilfs_segctor_cancel_freev in an error path of log writer. This will fix the issue. Reported-by: Andreas Beckmann <debian@abeckmann.de> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-03-22 14:41:07 +09:00
Ryusuke Konishi	c91cea11df	nilfs2: remove whitespaces before quoted newlines This kills the following checkpatch warnings: WARNING: unnecessary whitespace before a quoted newline #869: FILE: super.c:869: + "remount to a different snapshot. \n", WARNING: unnecessary whitespace before a quoted newline #389: FILE: the_nilfs.c:389: + printk(KERN_ERR "NILFS: too short segment. \n"); Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-03-14 10:29:51 +09:00
Ryusuke Konishi	55480a06e9	nilfs2: remove spaces before tabs This kills the following checkpatch warnings: WARNING: please, no space before tabs #74: FILE: segment.h:74: +^Iunsigned ^I^Iflags;$ WARNING: please, no space before tabs #35: FILE: segbuf.c:35: +^Iint ^I^I^Istart, end; /* The region to be submitted */$ Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-03-14 10:29:51 +09:00
Ryusuke Konishi	7a65004bba	nilfs2: fix various typos in comments This fixes various typos I found in comments of nilfs2. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-03-14 10:29:51 +09:00
Ryusuke Konishi	1621562b6a	nilfs2: fix typo "cout" -> "count" in error message Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-03-14 10:29:50 +09:00
Ryusuke Konishi	9ccf56c138	nilfs2: fix function name typos in docbook comments Fixes the following typos in docbook comments: nilfs_detroy_transaction_cache -> nilfs_destroy_transaction_cache nilfs_secgtor_start_timer -> nilfs_segctor_start_timer Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-03-14 10:29:50 +09:00
Ryusuke Konishi	6c477d44a7	nilfs2: fix discrepancy in use of static specifier Two segbuf functions, nilfs_segbuf_write and nilfs_segbuf_wait, are declared with the static storage class specifier, but their implementations are not. This fixes the discrepancy. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-03-14 10:27:27 +09:00
Linus Torvalds	0f2cc4ecd8	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits) init: Open /dev/console from rootfs mqueue: fix typo "failues" -> "failures" mqueue: only set error codes if they are really necessary mqueue: simplify do_open() error handling mqueue: apply mathematics distributivity on mq_bytes calculation mqueue: remove unneeded info->messages initialization mqueue: fix mq_open() file descriptor leak on user-space processes fix race in d_splice_alias() set S_DEAD on unlink() and non-directory rename() victims vfs: add NOFOLLOW flag to umount(2) get rid of ->mnt_parent in tomoyo/realpath hppfs can use existing proc_mnt, no need for do_kern_mount() in there Mirror MS_KERNMOUNT in ->mnt_flags get rid of useless vfsmount_lock use in put_mnt_ns() Take vfsmount_lock to fs/internal.h get rid of insanity with namespace roots in tomoyo take check for new events in namespace (guts of mounts_poll()) to namespace.c Don't mess with generic_permission() under ->d_lock in hpfs sanitize const/signedness for udf nilfs: sanitize const/signedness in dealing with ->d_name.name ... Fix up fairly trivial (famous last words...) conflicts in drivers/infiniband/core/uverbs_main.c and security/tomoyo/realpath.c	2010-03-04 08:15:33 -08:00
Al Viro	072f98b463	nilfs: sanitize const/signedness in dealing with ->d_name.name Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-03 14:07:58 -05:00
Al Viro	0319003d0d	nilfs really shouldn't slap struct dentry on stack... ... especially when it only needs (and initializes) .d_name of it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-03 14:07:58 -05:00
Jiro SEKIBA	0d561f12b4	nilfs2: add reader's lock for cno in nilfs_ioctl_sync This adds reader's lock for the_nilfs->cno in nilfs_ioctl_sync, for the_nilfs->cno should be proctected by segctor_sem when reading. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-02-20 21:18:19 +09:00
Jiro SEKIBA	03f29365e8	nilfs2: delete unnecessary condition in load_segment_summary This is a trivial patch to remove unnecessary condition. load_segment_summary() checks crc of segment_summary OR crc of whole log data blocks based on boolean argument full_check. However, callers of the function pass only 1 as full_check, which means only whole log data blocks checking code is running all the time. This patch deletes the condition and full_check argument and also deletes enum 'NILFS_SEG_FAIL_CHECKSUM_SEGSUM' and corresponding case clause, for it is nolonger used anymore. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-02-18 20:09:03 +09:00
Ryusuke Konishi	d1c6b72a72	nilfs2: move iterator to write log into segment buffer This moves iterator to submit write requests for a series of logs into segbuf.c, and hides nilfs_segbuf_write() and nilfs_segbuf_wait() in the file. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-02-13 12:26:03 +09:00
Ryusuke Konishi	e605f0a724	nilfs2: get rid of s_dirt flag use This replaces s_dirt flag use in nilfs with a new flag added on the nilfs object. The s_dirt flag was used to indicate if sop->write_super() should be called, however the current version of nilfs does not use the callback. Thus, it can be replaced with the own flag. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Jiro SEKIBA <jir@unicus.jp>	2010-02-13 12:26:03 +09:00
Ryusuke Konishi	dcd7618695	nilfs2: get rid of nilfs_segctor_req struct This will clean up nilfs_segctor_req struct and the obscure request argument passed among private methods of segment constructor. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-02-13 12:26:03 +09:00
Jiro SEKIBA	086d1764b2	nilfs2: delete unnecessary condition in nilfs_dat_translate This is a trivial patch to delete unnecessary condition in nilfs_dat_translate. nilfs_dat_translate() will asign translated address to *blocknrp if blocknrp is not NULL. However the condition is unneeded, because all callers of nilfs_dat_translate() pass blocknrp properly. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-02-13 12:26:03 +09:00
Ryusuke Konishi	fe5f171bb2	nilfs2: fix potential hang in nilfs_error on errors=remount-ro nilfs_error() calls nilfs_detach_segment_constructor() if errors=remount-ro option is specified, and this may lead to a hang due to recursive locking of, for instance, nilfs->ns_segctor_sem and others. In this case, detaching segment constructor is not necessary because read-only flag is set to the filesystem and further writes are blocked. This fixes the potential hang issue by removing the nilfs_detach_segment_constructor() call from nilfs_error. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-02-13 12:26:03 +09:00
Ryusuke Konishi	7512487e6d	nilfs2: use mnt_want_write in ioctls where write access is needed A few nilfs2 ioctls need to ask for and then later release write access to the mount in order to avoid potential write to read-only mounts. This adds the missing mnt_want_write and mnt_drop_write in nilfs_ioctl_change_cpmode, nilfs_ioctl_delete_checkpoint, and nilfs_ioctl_clean_segments. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-02-13 12:26:02 +09:00
Jiro SEKIBA	e902ec9906	nilfs2: issue discard request after cleaning segments This adds a function to send discard requests for given array of segment numbers, and calls the function when garbage collection succeeded. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-02-13 12:26:02 +09:00
Ryusuke Konishi	3256a05531	nilfs2: fix potential leak of dirty data on umount This fixes incorrect usage of nilfs_segctor_confirm() test function in nilfs_segctor_destroy(); nilfs_segctor_confirm() returns zero if the filesystem is not clean, so its use in nilfs_segctor_destroy() needs inversion. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-01-31 14:57:31 +09:00
Tobias Klauser	33e189bd57	nilfs2: Storage class should be before const qualifier The C99 specification states in section 6.11.5: The placement of a storage-class specifier other than at the beginning of the declaration specifiers in a declaration is an obsolescent feature. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-12-25 13:01:50 +09:00
Jiro SEKIBA	5ee5814832	nilfs2: trivial coding style fix This is a trivial style fix patch to mend errors/warnings reported by "checkpatch.pl --file". Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-12-25 13:01:50 +09:00
Linus Torvalds	b6e3224fb2	Revert "task_struct: make journal_info conditional" This reverts commit `e4c570c4cb`, as requested by Alexey: "I think I gave a good enough arguments to not merge it. To iterate: * patch makes impossible to start using ext3 on EXT3_FS=n kernels without reboot. * this is done only for one pointer on task_struct" None of config options which define task_struct are tristate directly or effectively." Requested-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-17 13:23:24 -08:00
Al Viro	a95161aaa8	switch nilfs2 to deactivate_locked_super() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-12-16 12:16:42 -05:00
Hiroshi Shimamoto	e4c570c4cb	task_struct: make journal_info conditional journal_info in task_struct is used in journaling file system only. So introduce CONFIG_FS_JOURNAL_INFO and make it conditional. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:27 -08:00
Ryusuke Konishi	a694291a62	nilfs2: separate wait function from nilfs_segctor_write This separates wait function for submitted logs from the write function nilfs_segctor_write(). A new list of segment buffers "sc_write_logs" is added to hold logs under writing, and double buffering is partially applied to hide io latency. At this point, the double buffering is disabled for blocksize < pagesize because page dirty flag is turned off during write and dirty buffers are not properly collected for pages crossing over segments. To receive full benefit of the double buffering, further refinement is needed to move the io wait outside the lock section of log writer. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-30 21:17:52 +09:00
Ryusuke Konishi	e29df395bc	nilfs2: add iterator for segment buffers This adds a few iterator functions for segment buffers to make it easy to handle multiple series of logs. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-30 21:06:35 +09:00
Ryusuke Konishi	9c965bac16	nilfs2: hide nilfs_write_info struct in segment buffer code Hides nilfs_write_info struct and nilfs_segbuf_prepare_write function in segbuf.c to simplify the interface of nilfs_segbuf_write function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-30 21:06:35 +09:00
Ryusuke Konishi	9284ad2a90	nilfs2: relocate io status variables to segment buffer This moves io status variables in nilfs_write_info struct to nilfs_segment_buffer struct. This is a preparation to hide nilfs_write_info in segment buffer code. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-30 21:05:57 +09:00
Ryusuke Konishi	5f1586d0dd	nilfs2: do not return io error for bio allocation failure Previously, log writer had possibility to set an io error flag on segments even in case of memory allocation failure. This fixes the issue. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-29 19:59:00 +09:00
Ryusuke Konishi	0935db7477	nilfs2: use list_splice_tail or list_splice_tail_init This applies list_splice_tail (or list_splice_tail_init) operation instead of list_splice (or list_splice_init, respectively) to append a new list to tail of an existing list. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-29 02:50:46 +09:00
Jiro SEKIBA	abdb318b79	nilfs2: replace mark_inode_dirty as nilfs_mark_inode_dirty Replace mark_inode_dirty() as nilfs_mark_inode_dirty() to reduce deep function calls. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-27 20:05:16 +09:00
Jiro SEKIBA	3534573b58	nilfs2: delete mark_inode_dirty in nilfs_delete_entry Delete mark_inode_dirty() in nilfs_delete_entry() to reduce duplicate mark_inode_dirty() calls both in nilfs_rename() and nilfs_delete_entry(). Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-27 20:05:16 +09:00
Jiro SEKIBA	58d55471cb	nilfs2: delete mark_inode_dirty in nilfs_commit_chunk Delete mark_inode_dirty() in nilfs_commit_chunk(), for callers of nilfs_commit_chunk() will call equivalent mark_inode_dirty() after calling nilfs_commit_chunk(). Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-27 20:05:16 +09:00
Jiro SEKIBA	2093abf9cb	nilfs2: change return type of nilfs_commit_chunk change return type of nilfs_commit_chunk() as void from int, for nilfs_set_file_dirty() usually does not return error. This is an intermediate patch to reduce mark_inode_dirty() in nilfs_commit_chunk(). Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-27 20:05:15 +09:00
Jiro SEKIBA	4cd76c3c93	nilfs2: split nilfs_unlink as nilfs_do_unlink and nilfs_unlink Split nilfs_unlink() to reduce nested transaction and duplicate mark_inode_dirty() calls when calling nilfs_unlink() from nilfs_rmdir(). nilfs_do_unlink() is an actual unlink functionality which is not in transaction and does not call mark_inode_dirty() for dentry argument. nilfs_unlink() is a wrapper function for do_nilfs_unlink() with transaction and mark_inode_dirty() for dentry argument. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-27 20:05:15 +09:00
Jiro SEKIBA	1749147276	nilfs2: delete redundant mark_inode_dirty delete redundant mark_inode_dirty() calls Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-27 20:05:15 +09:00
Jiro SEKIBA	565de406e7	nilfs2: expand inode_inc_link_count and inode_dec_link_count This is an intermidiate patch to reduce redandunt mark_inode_dirty() calls by calling inode_inc_link_count() and inode_dec_link_count() functions. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-27 20:05:15 +09:00
Jiro SEKIBA	43f8bc262f	nilfs2: delete mark_inode_dirty from nilfs_set_link Delete mark_inode_dirty() from nilfs_set_link() to reduce redundant mark_inode_dirty() calls in caller of nilfs_set_link(). Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-27 20:05:15 +09:00
Jiro SEKIBA	9ca941d4b6	nilfs2: delete mark_inode_dirty in nilfs_new_inode It is redundant to call mark_inode_dirty() in nilfs_new_inode() because all caller of nilfs_new_inode() will call mark_inode_dirty() after calling nilfs_new_inode() directly or indirectly in transaction. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-27 20:05:15 +09:00
Ryusuke Konishi	0234576d04	nilfs2: add norecovery mount option This adds "norecovery" mount option which disables temporal write access to read-only mounts or snapshots during mount/recovery. Without this option, write access will be even performed for those types of mounts; the temporal write access is needed to mount root file system read-only after an unclean shutdown. This option will be helpful when user wants to prevent any write access to the device. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Eric Sandeen <sandeen@redhat.com>	2009-11-20 10:05:52 +09:00
Ryusuke Konishi	a057d2c011	nilfs2: add helper to get if volume is in a valid state This adds a helper function, nilfs_valid_fs() which returns if nilfs is in a valid state or not. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:52 +09:00
Ryusuke Konishi	f50a4c8149	nilfs2: move recovery completion into load_nilfs function Although mount recovery of nilfs is integrated in load_nilfs() procedure, the completion of recovery was isolated from the procedure and performed at the end of the fill_super routine. This was somewhat confusing since the recovery is needed for the nilfs object, not for a super block instance. To resolve the inconsistency, this will integrate the recovery completion into load_nilfs(). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:52 +09:00
Ryusuke Konishi	050b4142c9	nilfs2: apply readahead for recovery on mount This inserts readahead in the recovery code. The readahead request is issued per segment while searching the latest super root block. This will shorten mount time after unclean unmount. A measurement shows the recovery time was reduced by more than 60 percent: e.g. real 0m11.586s -> 0m3.918s (x 2.96) Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:51 +09:00
Ryusuke Konishi	f021759d74	nilfs2: clean up get/put function of a segment usage This eliminates obsolete nilfs_get_sufile_get_segment_usage() and nilfs_set_sufile_segment_usage() from sufile. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:51 +09:00
Ryusuke Konishi	071ec54dd7	nilfs2: move routine to set segment usage into sufile This adds nilfs_sufile_set_segment_usage() function in sufile to replace direct access to the sufile metadata in log writer code. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:51 +09:00
Ryusuke Konishi	61a189e9c6	nilfs2: move routine marking segment usage dirty into sufile This adds nilfs_sufile_mark_dirty() function in sufile to replace nilfs_touch_segusage() function in log writer code. This is a preparation for the further cleanup which will move out low level sufile operations in the log writer. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:51 +09:00
Ryusuke Konishi	70622a2091	nilfs2: insert cache operation in palloc get block routines This implements cache operation in get block routines of palloc code: nilfs_palloc_get_desc_block(), nilfs_palloc_get_bitmap_block(), and nilfs_palloc_get_entry_block(). This will complete the palloc cache. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:51 +09:00
Ryusuke Konishi	49fa7a5902	nilfs2: add palloc cache to ifile This adds the palloc cache to ifile. The palloc cache is allocated on the extended region of nilfs_mdt_info struct. The struct nilfs_ifile_info defines the extended on memory structure of ifile. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:50 +09:00
Ryusuke Konishi	c3ea56c800	nilfs2: flush palloc cache before manipulating data pages of GC dat Data pages in gcdat metadata file (i.e. the secondary DAT for GC), are cleared or even moved back to the normal DAT when a shot of garbage collection was done. Buffer heads held by the palloc cache of gcdat must be cleared before these page cache manipulation. This adds nilfs_palloc_clear_cache() to ensure this. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:50 +09:00
Ryusuke Konishi	8908b2f70b	nilfs2: add palloc cache to dat This adds the palloc cache to DAT file. The palloc cache is allocated on the extended region of nilfs_mdt_info struct. The struct nilfs_dat_info defines the extended on memory structure of DAT. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:50 +09:00
Ryusuke Konishi	db38d5ad32	nilfs2: add cache framework for persistent object allocator This adds setup and cleanup routines of the persistent object allocator cache. According to ftrace analyses, accessing buffers of the DAT file suffers indispensable overhead many times. To mitigate the overhead, This introduce cache framework for the persistent object allocator (palloc) which the DAT file and ifile are using. struct nilfs_palloc_cache represents the cache object per metadata file using palloc. The cache is initialized through nilfs_palloc_setup_cache() and destroyed by nilfs_palloc_destroy_cache(); callers of the former function will be added to individual allocators of DAT and ifile on successive patches. nilfs_palloc_destroy_cache() will be called from nilfs_mdt_destroy() if the cache is attached to a metadata file. A companion function nilfs_palloc_clear_cache() is provided to allow releasing buffer head references independently with the cleanup task. This adjunctive function will be used before invalidating pages of metadata file with the cache. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:50 +09:00
Ryusuke Konishi	141bbdba9c	nilfs2: unfold nilfs_palloc_block_get_bitmap function This expands a trivial address calculation in the function into its every callsite. This expansion improves readability of the callers. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:50 +09:00
Ryusuke Konishi	1376e931b7	nilfs2: eliminate nilfs_btnode_get function This removes the obsolete nilfs_btnode_get() function and makes nilfs_btree_get_block() directly call nilfs_btnode_submit_block(). This expansion will provide better opportunity for code optimization. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:50 +09:00
Ryusuke Konishi	75f65edfcc	nilfs2: remove newblk argument from nilfs_btnode_submit_block This removes the obsolete argument from nilfs_btnode_submit_block(). This will complete separating a create function of btree node. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:50 +09:00
Ryusuke Konishi	45f4910bc0	nilfs2: use nilfs_btnode_create_block function This displaces nilfs_btnode_get() use to create new btree node block with nilfs_btnode_create_block. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:49 +09:00
Ryusuke Konishi	d501d73689	nilfs2: separate function for creating new btree node block Adds a separate routine for creating a btree node block. This is a preparation to reduce the depth of function calls during submitting btree node buffer. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:49 +09:00
Ryusuke Konishi	b34a65069c	nilfs2: avoid readahead on metadata file for create mode This turns off readhead action of metadata file if nilfs_mdt_get_block function was called with a create flag. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:49 +09:00
Ryusuke Konishi	ef7d4757a5	nilfs2: simplify nilfs_sufile_get_ncleansegs function Previously, this function took an status code to return possible error codes. The ("nilfs2: add local variable to cache the number of clean segments") patch removed the possibility to return errors. So, this simplifies the function definition to make it directly return the number of clean segments. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:49 +09:00
Ryusuke Konishi	aa474a2201	nilfs2: add local variable to cache the number of clean segments This makes it possible for sufile to get the number of clean segments faster. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:49 +09:00
Ryusuke Konishi	7b16c8a211	nilfs2: unfold nilfs_sufile_block_get_header function This unfolds the nilfs_sufile_block_get_header() function for simplicity. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:48 +09:00
Ryusuke Konishi	fd66c0d5c3	nilfs2: hide nilfs_mdt_clear calls in nilfs_mdt_destroy This will hide a function call of nilfs_mdt_clear() in nilfs_mdt_destroy(). This ensures nilfs_mdt_destroy() to do cleanup jobs included in nilfs_mdt_clear(). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:48 +09:00
Ryusuke Konishi	3961f0e277	nilfs2: eliminate inlines to directly read/write inode of metadata files Removes two inline functions: nilfs_mdt_read_inode_direct() and nilfs_mdt_write_inode_direct(). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:48 +09:00
Ryusuke Konishi	8707df3847	nilfs2: separate read method of meta data files on super root block Will displace nilfs_mdt_read_inode_direct function with an individual read method: nilfs_dat_read, nilfs_sufile_read, nilfs_cpfile_read. This provides the opportunity to initialize local variables of each metadata file after reading the inode. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:48 +09:00
Ryusuke Konishi	79739565e1	nilfs2: separate constructor of metadata files This will displace nilfs_mdt_new() constructor with individual metadata file constructors like nilfs_dat_new(), new_sufile_new(), nilfs_cpfile_new(), and nilfs_ifile_new(). This makes it possible for each metadata file to have own intialization code. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:48 +09:00
Ryusuke Konishi	5731e191f2	nilfs2: add size option of private object to metadata file allocator This adds an optional "object size" argument to nilfs_mdt_new_common() function; the argument specifies the size of private object attached to a newly allocated metadata file inode. This will afford space to keep local variables for meta data files. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:48 +09:00
Ryusuke Konishi	9cb4e0d2b9	nilfs2: move out mark_inode_dirty calls from bmap routines Previously, nilfs_bmap_add_blocks() and nilfs_bmap_sub_blocks() called mark_inode_dirty() after they changed the number of data blocks. This moves these calls outside bmap outermost functions like nilfs_bmap_insert() or nilfs_bmap_truncate(). This will mitigate overhead for truncate or delete operation since they repeatedly remove set of blocks. Nearly 10 percent improvement was observed for removal of a large file: # dd if=/dev/zero of=/test/aaa bs=1M count=512 # time rm /test/aaa real 2.968s -> 2.705s Further optimization may be possible by eliminating these mark_inode_dirty() uses though I avoid mixing separate changes here. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:47 +09:00
Ryusuke Konishi	09bf4aae0a	nilfs2: stop marking metadata inode dirty within btree operations Since metadata file routines mark the inode dirty after they successfully changed bmap objects, nilfs_mdt_mark_dirty() calls in nilfs_bmap_add_blocks() and nilfs_bmap_sub_blocks() are redundant. This removes these overlapping calls from the bmap routines. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:47 +09:00
Ryusuke Konishi	30db4e6c3d	nilfs2: remove buffer locking from btree code lock_buffer() and unlock_buffer() uses in btree.c are eliminable because btree functions gain buffer heads through nilfs_btnode_get(), which never returns an on-the-fly buffer. Although nilfs_clear_dirty_page() and nilfs_copy_back_pages() in nilfs_commit_gcdat_inode() juggle btree node buffers of DAT, this is safe because these operations are protected by a log writer lock or the metadata file semaphore of DAT. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:47 +09:00
Ryusuke Konishi	a49762fd11	nilfs2: remove buffer locking in nilfs_mark_inode_dirty This lock is eliminable because inodes on the buffer can be updated independently. Although a log writer also fills in bmap data on the on-disk inodes, this update is exclusively done by a log writer lock. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:47 +09:00
Jiro SEKIBA	e2073e7857	nilfs2: cleanup unused match_bool function match_bool function is not used anymore. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:47 +09:00
Jiro SEKIBA	91f1953bf3	nilfs2: Using nobarrier option instead of barrier=off Since most of fs using nofoobar style option, modified barrier=off option as nobarrier. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:47 +09:00
Jiro SEKIBA	6600b9dd8e	nilfs2: move definition of struct nilfs_btree_node This is a trivial patch to expose struct nilfs_fs_btree_node. The struct should be exposed outside of kernel, for it is disk format. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:46 +09:00
Ryusuke Konishi	9b945d537d	nilfs2: get rid of BUG_ON use in btree lookup routines The current btree lookup routines make a kernel oops when detected inconsistency in btree blocks. These routines should instead return a proper error code because the inconsistency usually comes from corruption of on-disk metadata. This fixes the issue by converting BUG_ON calls to proper error handlings. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-20 10:05:46 +09:00
Jiro SEKIBA	18dafac1a4	nilfs2: deleted inconsistent comment in nilfs_load_inode_block() The comment says, "Caller of this function MUST lock s_inode_lock", however just above the comment, it locks s_inode_lock in the function. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-15 17:17:46 +09:00
Ryusuke Konishi	c1ea985c71	nilfs2: fix lock order reversal in chcp operation Will fix the following lock order reversal lockdep detected: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-rc6 #7 ------------------------------------------------------- chcp/30157 is trying to acquire lock: (&nilfs->ns_mount_mutex){+.+.+.}, at: [<fed7cfcc>] nilfs_cpfile_change_cpmode+0x46/0x752 [nilfs2] but task is already holding lock: (&nilfs->ns_segctor_sem){++++.+}, at: [<fed7ca32>] nilfs_transaction_begin+0xba/0x110 [nilfs2] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&nilfs->ns_segctor_sem){++++.+}: [<c105799c>] __lock_acquire+0x109c/0x139d [<c1057d26>] lock_acquire+0x89/0xa0 [<c14151e2>] down_read+0x31/0x45 [<fed6d77b>] nilfs_attach_checkpoint+0x8f/0x16b [nilfs2] [<fed6e393>] nilfs_get_sb+0x3e7/0x653 [nilfs2] [<c10c0ccb>] vfs_kern_mount+0x8b/0x124 [<c10c0db2>] do_kern_mount+0x37/0xc3 [<c10d7517>] do_mount+0x64d/0x69d [<c10d75cd>] sys_mount+0x66/0x95 [<c1002a14>] sysenter_do_call+0x12/0x32 -> #1 (&type->s_umount_key#31/1){+.+.+.}: [<c105799c>] __lock_acquire+0x109c/0x139d [<c1057d26>] lock_acquire+0x89/0xa0 [<c104c0f3>] down_write_nested+0x34/0x52 [<c10c08fe>] sget+0x22e/0x389 [<fed6e133>] nilfs_get_sb+0x187/0x653 [nilfs2] [<c10c0ccb>] vfs_kern_mount+0x8b/0x124 [<c10c0db2>] do_kern_mount+0x37/0xc3 [<c10d7517>] do_mount+0x64d/0x69d [<c10d75cd>] sys_mount+0x66/0x95 [<c1002a14>] sysenter_do_call+0x12/0x32 -> #0 (&nilfs->ns_mount_mutex){+.+.+.}: [<c1057727>] __lock_acquire+0xe27/0x139d [<c1057d26>] lock_acquire+0x89/0xa0 [<c1414d63>] mutex_lock_nested+0x41/0x23e [<fed7cfcc>] nilfs_cpfile_change_cpmode+0x46/0x752 [nilfs2] [<fed801b2>] nilfs_ioctl+0x11a/0x7da [nilfs2] [<c10cca12>] vfs_ioctl+0x27/0x6e [<c10ccf93>] do_vfs_ioctl+0x491/0x4db [<c10cd022>] sys_ioctl+0x45/0x5f [<c1002a14>] sysenter_do_call+0x12/0x32 Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-13 10:33:24 +09:00
Ryusuke Konishi	c083234f15	nilfs2: fix missing cleanup of gc cache on error cases This fixes an -rc1 regression brought by the commit: `1cf58fa840` ("nilfs2: shorten freeze period due to GC in write operation v3"). Although the patch moved out a function call of nilfs_ioctl_move_blocks() to nilfs_ioctl_clean_segments() from nilfs_ioctl_prepare_clean_segments(), it didn't move corresponding cleanup job needed for the error case. This will move the missing cleanup job to the destination function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Jiro SEKIBA <jir@unicus.jp>	2009-11-08 19:04:25 +09:00
Ryusuke Konishi	5399dd1fc8	nilfs2: fix kernel oops in error case of nilfs_ioctl_move_blocks This fixes a kernel oops reported by Markus Trippelsdorf in the email titled "[NILFS users] kernel Oops while running nilfs_cleanerd". The oops was caused by a bug of error path in nilfs_ioctl_move_blocks() function, which was inlined in nilfs_ioctl_clean_segments(). nilfs_ioctl_move_blocks checks duplication of blocks which will be moved in garbage collection. But, the check should have be done within nilfs_ioctl_move_inode_block() to prevent list corruption among buffers storing the target blocks. To fix the kernel oops, this moves forward the duplication check before the list insertion. I also tested this for stable trees [2.6.30, 2.6.31]. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable <stable@kernel.org>	2009-11-08 19:01:35 +09:00
Ryusuke Konishi	05b4358ad5	nilfs2: add zero-fill for new btree node buffers Adds missing initialization of newly allocated b-tree node buffers. This avoids garbage data to be mixed in b-tree node blocks. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-03 12:32:03 +09:00
Ryusuke Konishi	aeda7f6343	nilfs2: fix irregular checkpoint creation due to data flush When nilfs flushes out dirty data to reduce memory pressure, creation of checkpoints is wrongly postponed. This bug causes irregular checkpoint creation especially in small footprint systems. To correct this issue, a timer for the checkpoint creation has to be continued if a log writer does not create a checkpoint. This will do the correction. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-11-03 12:32:03 +09:00
Ryusuke Konishi	b1e19e5601	nilfs2: fix dirty page accounting leak causing hang at write Bruno Prémont and Dunphy, Bill noticed me that NILFS will certainly hang on ARM-based targets. I found this was caused by an underflow of dirty pages counter. A b-tree cache routine was marking page dirty without adjusting page account information. This fixes the dirty page accounting leak and resolves the hang on arm-based targets. Reported-by: Bruno Prémont <bonbons@linux-vserver.org> Reported-by: Dunphy, Bill <WDunphy@tandbergdata.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Bruno Prémont <bonbons@linux-vserver.org> Cc: stable <stable@kernel.org>	2009-11-03 12:31:36 +09:00
Alexey Dobriyan	828c09509b	const: constify remaining file_operations [akpm@linux-foundation.org: fix KVM] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-01 16:11:11 -07:00
Ryusuke Konishi	3cc811bffd	nilfs2: fix missing initialization of i_dir_start_lookup member The i_dir_start_lookup field in nilfs_inode_info objects should be cleared when the objects are allocated, but the the initialization was missing in case of reading from disk. This adds the initialization. Since the variable just gives a start page on directory lookups, the bug was nonfatal until now. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-29 20:32:13 +09:00
Ryusuke Konishi	1f28fcd925	nilfs2: fix missing zero-fill initialization of btree node cache This will fix file system corruption which infrequently happens after mount. The problem was reported from users with the title "[NILFS users] Fail to mount NILFS." (Message-ID: <200908211918.34720.yuri@itinteg.net>), and so forth. I've also experienced the corruption multiple times on kernel 2.6.30 and 2.6.31. The problem turned out to be caused due to discordance between mapping->nrpages of a btree node cache and the actual number of pages hung on the cache; if the mapping->nrpages becomes zero even as it has pages, truncate_inode_pages() returns without doing anything. Usually this is harmless except it may cause page leak, but garbage collection fairly infrequently sees a stale page remained in the btree node cache of DAT (i.e. disk address translation file of nilfs), and induces the corruption. I identified a missing initialization in btree node caches was the root cause. This corrects the bug. I've tested this for kernel 2.6.30 and 2.6.31. Reported-by: Yuri Chislov <yuri@itinteg.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable <stable@kernel.org>	2009-09-29 20:12:56 +09:00
Alexey Dobriyan	f0f37e2f77	const: mark struct vm_struct_operations * mark struct vm_area_struct::vm_ops as const * mark vm_ops in AGP code But leave TTM code alone, something is fishy there with global vm_ops being used. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-27 11:39:25 -07:00
Alexey Dobriyan	6e1d5dcc2b	const: mark remaining inode_operations as const Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:24 -07:00
Alexey Dobriyan	7f09410bbc	const: mark remaining address_space_operations const Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:24 -07:00
Alexey Dobriyan	ac4cfdd6d1	const: mark remaining export_operations const Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:24 -07:00
Alexey Dobriyan	b87221de6a	const: mark remaining super_operations const Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:24 -07:00
Jens Axboe	2c96ce9f20	fs: remove bdev->bd_inode_backing_dev_info It has been unused since it was introduced in: commit 520808bf20e90fdbdb320264ba7dd5cf9d47dcac Author: Andrew Morton <akpm@osdl.org> Date: Fri May 21 00:46:17 2004 -0700 [PATCH] block device layer: separate backing_dev_info infrastructure So lets just kill it. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-16 15:16:18 +02:00
Ryusuke Konishi	41f4db0f48	fs/Kconfig: move nilfs2 outside misc filesystems Some people asked me questions like the following: On Wed, 15 Jul 2009 13:11:21 +0200, Leon Woestenberg wrote: > just wondering, any reasons why NILFS2 is one of the miscellaneous > filesystems and, for example, btrfs, is not in Kconfig? Actually, nilfs is NOT a filesystem came from other operating systems, but a filesystem created purely for Linux. Nor is it a flash filesystem but that for generic block devices. So, this moves nilfs outside the misc category as I responded in LKML "Re: Why does NILFS2 hide under Miscellaneous filesystems?" (Message-Id: <20090716.002526.93465395.ryusuke@osrg.net>). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:16 +09:00
Ryusuke Konishi	0f3fe33b39	nilfs2: convert nilfs_bmap_lookup to an inline function The nilfs_bmap_lookup() is now a wrapper function of nilfs_bmap_lookup_at_level(). This moves the nilfs_bmap_lookup() to a header file converting it to an inline function and gives an opportunity for optimization. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:16 +09:00
Ryusuke Konishi	2e0c2c7392	nilfs2: allow btree code to directly call dat operations The current btree code is written so that btree functions call dat operations via wrapper functions in bmap.c when they allocate, free, or modify virtual block addresses. This abstraction requires additional function calls and causes frequent call of nilfs_bmap_get_dat() function since it is used in the every wrapper function. This removes the wrapper functions and makes them available from btree.c and direct.c, which will increase the opportunity of compiler optimization. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:16 +09:00
Ryusuke Konishi	bd8169efae	nilfs2: add update functions of virtual block address to dat This is a preparation for the successive cleanup ("nilfs2: allow btree to directly call dat operations"). This adds functions bundling a few operations to change an entry of virtual block address on the dat file. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:15 +09:00
Ryusuke Konishi	7a102b0923	nilfs2: remove individual gfp constants for each metadata file This gets rid of NILFS_CPFILE_GFP, NILFS_SUFILE_GFP, NILFS_DAT_GFP, and NILFS_IFILE_GFP. All of these constants refer to NILFS_MDT_GFP, and can be removed. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:15 +09:00
Ryusuke Konishi	3218929dbd	nilfs2: stop zero-fill of btree path just before free it The btree path object is cleared just before it is freed. This will remove the code doing the unnecessary clear operation. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:15 +09:00
Ryusuke Konishi	6d28f7ea43	nilfs2: remove unused btree argument from btree functions Even though many btree functions take a btree object as their first argument, most of them are not used in their functions. This sticky use of the btree argument is hurting code readability and giving the possibility of inefficient code generation. So, this removes the unnecessary btree arguments. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:15 +09:00
Ryusuke Konishi	9ead986373	nilfs2: remove nilfs_dat_abort_start and nilfs_dat_abort_free These functions are not called from any functions. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:15 +09:00
Jiro SEKIBA	1cf58fa840	nilfs2: shorten freeze period due to GC in write operation v3 This is a re-revised patch to shorten freeze period. This version include a fix of the bug Konishi-san mentioned last time. When GC is runnning, GC moves live block to difference segments. Copying live blocks into memory is done in a transaction, however it is not necessarily to be in the transaction. This patch will get the nilfs_ioctl_move_blocks() out from transaction lock and put it before the transaction. I ran sysbench fileio test against nilfs partition. I copied some DVD/CD images and created snapshot to create live blocks before starting the benchmark. Followings are summary of rc8 and rc8 w/ the patch of per-request statistics, which is min/max and avg. I ran each test three times and bellow is average of those numers. According to this benchmark result, average time is slightly degrated. However, worstcase (max) result is significantly improved. This can address a few seconds write freeze. - random write per-request performance of rc8 min 0.843ms max 680.406ms avg 3.050ms - random write per-request performance of rc8 w/ this patch min 0.843ms -> 100.00% max 380.490ms -> 55.90% avg 3.233ms -> 106.00% - sequential write per-request performance of rc8 min 0.736ms max 774.343ms avg 2.883ms - sequential write per-request performance of rc8 w/ this patch min 0.720ms -> 97.80% max 644.280ms-> 83.20% avg 3.130ms -> 108.50% -----8<-----8<-----nilfs_cleanerd.conf-----8<-----8<----- protection_period 150 selection_policy timestamp # timestamp in ascend order nsegments_per_clean 2 cleaning_interval 2 retry_interval 60 use_mmap log_priority info -----8<-----8<-----nilfs_cleanerd.conf-----8<-----8<----- Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:15 +09:00
Zhu Yanhai	43be0ec038	nilfs2: add more check routines in mount process nilfs2: Add more safeguard routines and protections in mount process, which also makes nilfs2 report consistency error messages when checkpoint number is invalid. Signed-off-by: Zhu Yanhai <zhu.yanhai@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:14 +09:00
Zhang Qiang	a4f0b9c5b4	nilfs2: An unassigned variable is assigned to a never used structure member nilfs2: In procedure 'nilfs_get_sb()', when a nilfs filesysttem is mounted for the first time, local variable 'nilfs->ns_last_cno' is used before loading the latest checkpoint number from disk (in 'nilfs_fill_super'). 'nilfs->ns_last_cno' is assigned to 'sd.cno', but 'sd.cno' has never been used in the procedure. Signed-off-by: Zhang Qiang <zhangqiang.buaa@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:14 +09:00
Ryusuke Konishi	c1b353f04a	nilfs2: use GFP_NOIO for bio_alloc instead of GFP_NOWAIT Alberto Bertogli advised me about bio_alloc() use in nilfs: On Sat, 13 Jun 2009 22:52:40 -0300, Alberto Bertogli wrote: > By the way, those bio_alloc()s are using GFP_NOWAIT but it looks > like they could use at least GFP_NOIO or GFP_NOFS, since the caller > can (and sometimes do) sleep. The only caller is nilfs_submit_bh(), > which calls nilfs_submit_seg_bio() which can sleep calling > wait_for_completion(). This takes in the comment and replaces the use of GFP_NOWAIT flag with GFP_NOIO. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:14 +09:00
Jiro SEKIBA	1dfa27105a	nilfs2: stop using periodic write_super callback This removes nilfs_write_super and commit super block in nilfs internal thread, instead of periodic write_super callback. VFS layer calls ->write_super callback periodically. However, it looks like that calling back is ommited when disk I/O is busy. And when cleanerd (nilfs GC) is runnig, disk I/O tend to be busy thus nilfs superblock is not synchronized as nilfs designed. To avoid it, syncing superblock by nilfs thread instead of pdflush. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:14 +09:00
Jiro SEKIBA	79efdd9411	nilfs2: clean up nilfs_write_super Separate conditions that check if syncing super block and alternative super block are required as inline functions to reuse the conditions. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:14 +09:00
Jiro SEKIBA	6233caa9d5	nilfs2: fix disorder of nilfs_write_super in nilfs_sync_fs This fixes disorder of nilfs_write_super in nilfs_sync_fs. Commiting super block must be the end of the function so that every changes are reflected. ->sync_fs() is not called frequently so this makes nilfs_sync_fs call nilfs_commit_super instead of nilfs_write_super. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:14 +09:00
Jiro SEKIBA	ec5d66abdb	nilfs2: remove redundant super block commit This removes redundant super block commit. nilfs_write_super will call nilfs_commit_super to store super block into block device. However, nilfs_put_super will call nilfs_commit_super right after calling nilfs_write_super. So calling nilfs_write_super in nilfs_put_super would be redundant. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:13 +09:00
Jiro SEKIBA	b58a285ba4	nilfs2: implement nilfs_show_options to display mount options in /proc/mounts This is a patch to display mount options in procfs. Mount options will show up in the /proc/mounts as other fs does. ... /dev/sda6 /mnt nilfs2 ro,relatime,barrier=off,cp=3,order=strict 0 0 ... Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:13 +09:00
Ryusuke Konishi	1435110467	nilfs2: always lookup disk block address before reading metadata block The current metadata file code skips disk address lookup for its data block if the buffer has a mapped flag. This has a potential risk to cause read request to be performed against the stale block address that GC moved, and it may lead to meta data corruption. The mapped flag is safe if the buffer has an uptodate flag, otherwise it may prevent necessary update of disk address in the next read. This will avoid the potential problem by ensuring disk address lookup before reading metadata block even for buffers with the mapped flag. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:13 +09:00
Ryusuke Konishi	027d6404eb	nilfs2: use semaphore to protect pointer to a writable FS-instance will get rid of nilfs_get_writer() and nilfs_put_writer() pair used to retain a writable FS-instance for a period. The pair functions were making up some kind of recursive lock with a mutex, but they became overkill since the commit `201913ed74`. Furthermore, they caused the following lockdep warning because the mutex can be released by a task which didn't lock it: ===================================== [ BUG: bad unlock balance detected! ] ------------------------------------- kswapd0/422 is trying to release lock (&nilfs->ns_writer_mutex) at: [<c1359ff5>] mutex_unlock+0x8/0xa but there are no more locks to release! other info that might help us debug this: no locks held by kswapd0/422. stack backtrace: Pid: 422, comm: kswapd0 Not tainted 2.6.31-rc4-nilfs #51 Call Trace: [<c1358f97>] ? printk+0xf/0x18 [<c104fea7>] print_unlock_inbalance_bug+0xcc/0xd7 [<c11578de>] ? prop_put_global+0x3/0x35 [<c1050195>] lock_release+0xed/0x1dc [<c1359ff5>] ? mutex_unlock+0x8/0xa [<c1359f83>] __mutex_unlock_slowpath+0xaf/0x119 [<c1359ff5>] mutex_unlock+0x8/0xa [<d1284add>] nilfs_mdt_write_page+0xd8/0xe1 [nilfs2] [<c1092653>] shrink_page_list+0x379/0x68d [<c109171b>] ? isolate_pages_global+0xb4/0x18c [<c1092bd2>] shrink_list+0x26b/0x54b [<c10930be>] shrink_zone+0x20c/0x2a2 [<c10936b7>] kswapd+0x407/0x591 [<c1091667>] ? isolate_pages_global+0x0/0x18c [<c1040603>] ? autoremove_wake_function+0x0/0x33 [<c10932b0>] ? kswapd+0x0/0x591 [<c104033b>] kthread+0x69/0x6e [<c10402d2>] ? kthread+0x0/0x6e [<c1003e33>] kernel_thread_helper+0x7/0x1a This patch uses a reader/writer semaphore instead of the own lock and kills this warning. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:13 +09:00
Heiko Carstens	b5696e5e0d	nilfs2: fix format string compile warning (ino_t) Unlike on most other architectures ino_t is an unsigned int on s390. So add an explicit cast to avoid this compile warning: fs/nilfs2/recovery.c: In function 'recover_dsync_blocks': fs/nilfs2/recovery.c:555: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'ino_t' Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:13 +09:00
Ryusuke Konishi	1b2f5a641b	nilfs2: fix ignored error code in __nilfs_read_inode() The __nilfs_read_inode function is ignoring the error code returned from nilfs_read_inode_common(), and wrongly delivers a success code (zero) when it escapes from the function in erroneous cases. This adds the missing error handling. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-09-14 18:27:12 +09:00
Ryusuke Konishi	b1f1b8ce0a	nilfs2: fix preempt count underflow in nilfs_btnode_prepare_change_key This will fix the following preempt count underflow reported from users with the title "[NILFS users] segctord problem" (Message-ID: <949415.6494.qm@web58808.mail.re1.yahoo.com> and Message-ID: <debc30fc0908270825v747c1734xa59126623cfd5b05@mail.gmail.com>): WARNING: at kernel/sched.c:4890 sub_preempt_count+0x95/0xa0() Hardware name: HP Compaq 6530b (KR980UT#ABC) Modules linked in: bridge stp llc bnep rfcomm l2cap xfs exportfs nilfs2 cowloop loop vboxnetadp vboxnetflt vboxdrv btusb bluetooth uvcvideo videodev v4l1_compat v4l2_compat_ioctl32 arc4 snd_hda_codec_analog ecb iwlagn iwlcore rfkill lib80211 mac80211 snd_hda_intel snd_hda_codec ehci_hcd uhci_hcd usbcore snd_hwdep snd_pcm tg3 cfg80211 psmouse snd_timer joydev libphy ohci1394 snd_page_alloc hp_accel lis3lv02d ieee1394 led_class i915 drm i2c_algo_bit video backlight output i2c_core dm_crypt dm_mod Pid: 4197, comm: segctord Not tainted 2.6.30-gentoo-r4-64 #7 Call Trace: [<ffffffff8023fa05>] ? sub_preempt_count+0x95/0xa0 [<ffffffff802470f8>] warn_slowpath_common+0x78/0xd0 [<ffffffff8024715f>] warn_slowpath_null+0xf/0x20 [<ffffffff8023fa05>] sub_preempt_count+0x95/0xa0 [<ffffffffa04ce4db>] nilfs_btnode_prepare_change_key+0x11b/0x190 [nilfs2] [<ffffffffa04d01ad>] nilfs_btree_assign_p+0x19d/0x1e0 [nilfs2] [<ffffffffa04d10ad>] nilfs_btree_assign+0xbd/0x130 [nilfs2] [<ffffffffa04cead7>] nilfs_bmap_assign+0x47/0x70 [nilfs2] [<ffffffffa04d9bc6>] nilfs_segctor_do_construct+0x956/0x20f0 [nilfs2] [<ffffffff805ac8e2>] ? _spin_unlock_irqrestore+0x12/0x40 [<ffffffff803c06e0>] ? __up_write+0xe0/0x150 [<ffffffff80262959>] ? up_write+0x9/0x10 [<ffffffffa04ce9f3>] ? nilfs_bmap_test_and_clear_dirty+0x43/0x60 [nilfs2] [<ffffffffa04cd627>] ? nilfs_mdt_fetch_dirty+0x27/0x60 [nilfs2] [<ffffffffa04db5fc>] nilfs_segctor_construct+0x8c/0xd0 [nilfs2] [<ffffffffa04dc3dc>] nilfs_segctor_thread+0x15c/0x3a0 [nilfs2] [<ffffffffa04dbe20>] ? nilfs_construction_timeout+0x0/0x10 [nilfs2] [<ffffffff80252633>] ? add_timer+0x13/0x20 [<ffffffff802370da>] ? __wake_up_common+0x5a/0x90 [<ffffffff8025e960>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2] [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2] [<ffffffff8025e556>] kthread+0x56/0x90 [<ffffffff8020cdea>] child_rip+0xa/0x20 [<ffffffff8025e500>] ? kthread+0x0/0x90 [<ffffffff8020cde0>] ? child_rip+0x0/0x20 This problem was caused due to a missing radix_tree_preload() call in the retry path of nilfs_btnode_prepare_change_key() function. Reported-by: Eric A <eric225125@yahoo.com> Reported-by: Jerome Poulin <jeromepoulin@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Jerome Poulin <jeromepoulin@gmail.com> Cc: stable@kernel.org	2009-08-31 12:03:06 +09:00
Ryusuke Konishi	a924586036	nilfs2: fix oopses with doubly mounted snapshots will fix kernel oopses like the following: # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test1 # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test2 # umount /test1 # umount /test2 BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1069 in_atomic(): 0, irqs_disabled(): 1, pid: 3886, name: umount.nilfs2 1 lock held by umount.nilfs2/3886: #0: (&type->s_umount_key#31){+.+...}, at: [<c10b398a>] deactivate_super+0x52/0x6c irq event stamp: 1219 hardirqs last enabled at (1219): [<c135c774>] __mutex_unlock_slowpath+0xf8/0x119 hardirqs last disabled at (1218): [<c135c6d5>] __mutex_unlock_slowpath+0x59/0x119 softirqs last enabled at (1214): [<c1033316>] __do_softirq+0x1a5/0x1ad softirqs last disabled at (1205): [<c1033354>] do_softirq+0x36/0x5a Pid: 3886, comm: umount.nilfs2 Not tainted 2.6.31-rc6 #55 Call Trace: [<c1023549>] __might_sleep+0x107/0x10e [<c13603c0>] do_page_fault+0x246/0x397 [<c136017a>] ? do_page_fault+0x0/0x397 [<c135e753>] error_code+0x6b/0x70 [<c136017a>] ? do_page_fault+0x0/0x397 [<c104f805>] ? __lock_acquire+0x91/0x12fd [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd [<c1050b2b>] lock_acquire+0xba/0xdd [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2] [<c135d4fe>] down_write+0x2a/0x46 [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2] [<d0d17d3f>] nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2] [<c104ea2c>] ? mark_held_locks+0x43/0x5b [<c104ecb1>] ? trace_hardirqs_on_caller+0x10b/0x133 [<c104ece4>] ? trace_hardirqs_on+0xb/0xd [<d0d09ac1>] nilfs_put_super+0x2f/0xca [nilfs2] [<c10b3352>] generic_shutdown_super+0x49/0xb8 [<c10b33de>] kill_block_super+0x1d/0x31 [<c10e6599>] ? vfs_quota_off+0x0/0x12 [<c10b398f>] deactivate_super+0x57/0x6c [<c10c4bc3>] mntput_no_expire+0x8c/0xb4 [<c10c5094>] sys_umount+0x27f/0x2a4 [<c10c50c6>] sys_oldumount+0xd/0xf [<c10031a4>] sysenter_do_call+0x12/0x38 ... This turns out to be a bug brought by an -rc1 patch ("nilfs2: simplify remaining sget() use"). In the patch, a new "put resource" function, nilfs_put_sbinfo() was introduced to delay freeing nilfs_sb_info struct. But the nilfs_put_sbinfo() mistakenly used atomic_dec_and_test() function to check the reference count, and it caused the nilfs_sb_info was freed when user mounted a snapshot twice. This bug also suggests there was unseen memory leak in usual mount /umount operations for nilfs. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-08-19 02:10:13 +09:00
Zhang Qiang	1154ecbd2f	nilfs2: missing a read lock for segment writer in nilfs_attach_checkpoint() 'ns_cno' of structure 'the_nilfs' must be protected from segment writer, in other words, the caller of nilfs_get_checkpoint should hold read lock for nilfs->ns_segctor_sem. This patch adds the lock/unlock operations in nilfs_attach_checkpoint() when calling nilfs_cpfile_get_checkpoint(). Signed-off-by: Zhang Qiang <zhangqiang.buaa@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-08-18 17:32:27 +09:00
Ryusuke Konishi	01a261e09a	nilfs2: fix missing unlock in error path of nilfs_mdt_write_page This adds a missing unlock of nilfs->ns_writer_mutex in nilfs_mdt_write_page() function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-08-02 22:24:15 +09:00
Ryusuke Konishi	a97778457f	nilfs2: fix oops due to inconsistent state in page with discrete b-tree nodes Andrea Gelmini gave me a report that a kernel oops hit on a nilfs filesystem with a 1KB block size when doing rsync. This turned out to be caused by an inconsistency of dirty state between a page and its buffers storing b-tree node blocks. If the page had multiple buffers split over multiple logs, and if the logs were written at a time, a dirty flag remained in the page even every dirty flag in the buffers was cleared. This will fix the failure by dropping the dirty flag properly for pages with the discrete multiple b-tree nodes. Reported-by: Andrea Gelmini <andrea.gelmini@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Andrea Gelmini <andrea.gelmini@gmail.com> Cc: stable@kernel.org	2009-08-01 22:48:32 +09:00
Ryusuke Konishi	4fed598a49	fs/Kconfig: move nilfs2 out fs/Kconfig file was split into individual fs/*/Kconfig files before nilfs was merged. I've found the current config entry of nilfs is tainting the work. Sorry, I didn't notice. This fixes the violation. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Alexey Dobriyan <adobriyan@gmail.com>	2009-07-14 12:34:17 +09:00
Alexey Dobriyan	405f55712d	headers: smp_lock.h redux * Remove smp_lock.h from files which don't need it (including some headers!) * Add smp_lock.h to files which do need it * Make smp_lock.h include conditional in hardirq.h It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT This will make hardirq.h inclusion cheaper for every PREEMPT=n config (which includes allmodconfig/allyesconfig, BTW) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-12 12:22:34 -07:00
Jiro SEKIBA	d9a0a345ab	nilfs2: fix disorder in cp count on error during deleting checkpoints This fixes a bug that checkpoint count gets wrong on errors when deleting a series of checkpoints. The count error is persistent since the checkpoint count is stored on disk. Some userland programs refer to the count via ioctl, and this bugfix is needed to prevent malfunction of such programs. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable@kernel.org	2009-07-05 10:44:20 +09:00
Ryusuke Konishi	ff54de363a	nilfs2: fix lockdep warning between regular file and inode file This will fix the following false positive of recursive locking which lockdep has detected: ============================================= [ INFO: possible recursive locking detected ] 2.6.30-nilfs #42 --------------------------------------------- nilfs_cleanerd/10607 is trying to acquire lock: (&bmap->b_sem){++++-.}, at: [<e0d025b7>] nilfs_bmap_lookup_at_level+0x1a/0x74 [nilfs2] but task is already holding lock: (&bmap->b_sem){++++-.}, at: [<e0d024e0>] nilfs_bmap_truncate+0x19/0x6a [nilfs2] other info that might help us debug this: 2 locks held by nilfs_cleanerd/10607: #0: (&nilfs->ns_segctor_sem){++++.+}, at: [<e0d0d75a>] nilfs_transaction_begin+0xb6/0x10c [nilfs2] #1: (&bmap->b_sem){++++-.}, at: [<e0d024e0>] nilfs_bmap_truncate+0x19/0x6a [nilfs2] Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-07-05 10:44:20 +09:00
Ryusuke Konishi	4a52df7797	nilfs2: fix incorrect KERN_CRIT messages in case of write failures In case of write-failure retries, the following KERN_CRIT level messages are mistakenly output by nilfs_dat_commit_start() function: nilfs_dat_commit_start: vbn = 408463, start = 12506, end = 18446744073709551615, pbn = 530210 nilfs_dat_commit_start: vbn = 408515, start = 12506, end = 18446744073709551615, pbn = 530211 nilfs_dat_commit_start: vbn = 408464, start = 12506, end = 18446744073709551615, pbn = 530212 ... This suppresses these messages. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable@kernel.org	2009-07-05 10:44:20 +09:00
Ryusuke Konishi	8227b29722	nilfs2: fix hang problem of log writer which occurs after write failures Leandro Lucarella gave me a report that nilfs gets stuck after its write function fails. The problem turned out to be caused by bugs which leave writeback flag on pages. This fixes the problem by ensuring to clear the writeback flag in error path. Reported-by: Leandro Lucarella <llucax@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable@kernel.org	2009-07-05 10:44:20 +09:00
Ryusuke Konishi	0cfae3d879	nilfs2: remove unlikely directive causing mis-conversion of error code The following error code handling in nilfs_segctor_write() function wrongly converted negative error codes to a truth value (i.e. 1): err = unlikely(err) ? : res; which originaly meant to be err = err ? : res; This mis-conversion caused that write or sync functions receive the unexpected error code. This fixes the bug by removing the unlikely directive. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable@kernel.org	2009-07-05 10:44:19 +09:00
Al Viro	d441b1c293	switch nilfs2 to inode->i_acl Actually, get rid of private analog, since nothing in there is using ACLs at all so far. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:17:05 -04:00
Linus Torvalds	9c7cb99a82	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (22 commits) nilfs2: support contiguous lookup of blocks nilfs2: add sync_page method to page caches of meta data nilfs2: use device's backing_dev_info for btree node caches nilfs2: return EBUSY against delete request on snapshot nilfs2: modify list of unsupported features in caveats nilfs2: enable sync_page method nilfs2: set bio unplug flag for the last bio in segment nilfs2: allow future expansion of metadata read out via get info ioctl NILFS2: Pagecache usage optimization on NILFS2 nilfs2: remove nilfs_btree_operations from btree mapping nilfs2: remove nilfs_direct_operations from direct mapping nilfs2: remove bmap pointer operations nilfs2: remove useless b_low and b_high fields from nilfs_bmap struct nilfs2: remove pointless NULL check of bpop_commit_alloc_ptr function nilfs2: move get block functions in bmap.c into btree codes nilfs2: remove nilfs_bmap_delete_block nilfs2: remove nilfs_bmap_put_block nilfs2: remove header file for segment list operations nilfs2: eliminate removal list of segments nilfs2: add sufile function that can modify multiple segment usages ...	2009-06-15 09:13:49 -07:00
Ryusuke Konishi	aa7dfb8954	nilfs2: get rid of bd_mount_sem use from nilfs This will remove every bd_mount_sem use in nilfs. The intended exclusion control was replaced by the previous patch ("nilfs2: correct exclusion control in nilfs_remount function") for nilfs_remount(), and this patch will replace remains with a new mutex that this inserts in nilfs object. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:18 -04:00
Ryusuke Konishi	e59399d010	nilfs2: correct exclusion control in nilfs_remount function nilfs_remount() changes mount state of a superblock instance. Even though nilfs accesses other superblock instances during mount or remount, the mount state was not properly protected in nilfs_remount(). Moreover, nilfs_remount() has a lock order reversal problem; nilfs_get_sb() holds: 1. bdev->bd_mount_sem 2. sb->s_umount (sget acquires) and nilfs_remount() holds: 1. sb->s_umount (locked by the caller in vfs) 2. bdev->bd_mount_sem To avoid these problems, this patch divides a semaphore protecting super block instances from nilfs->ns_sem, and applies it to the mount state protection in nilfs_remount(). With this change, bd_mount_sem use is removed from nilfs_remount() and the lock order reversal will be resolved. And the new rw-semaphore, nilfs->ns_super_sem will properly protect the mount state except the modification from nilfs_error function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:18 -04:00
Ryusuke Konishi	6dd4740662	nilfs2: simplify remaining sget() use This simplifies the test function passed on the remaining sget() callsite in nilfs. Instead of checking mount type (i.e. ro-mount/rw-mount/snapshot mount) in the test function passed to sget(), this patch first looks up the nilfs_sb_info struct which the given mount type matches, and then acquires the super block instance holding the nilfs_sb_info. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:18 -04:00
Ryusuke Konishi	3f82ff5516	nilfs2: get rid of sget use for checking if current mount is present This stops using sget() for checking if an r/w-mount or an r/o-mount exists on the device. This elimination uses a back pointer to the current mount added to nilfs object. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:17 -04:00
Ryusuke Konishi	33c8e57c86	nilfs2: get rid of sget use for acquiring nilfs object This will change the way to obtain nilfs object in nilfs_get_sb() function. Previously, a preliminary sget() call was performed, and the nilfs object was acquired from a super block instance found by the sget() call. This patch, instead, instroduces a new dedicated function find_or_create_nilfs(); as the name implies, the function finds an existent nilfs object from a global list or creates a new one if no object is found on the device. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:17 -04:00
Ryusuke Konishi	81fc20bd0e	nilfs2: remove meaningless EBUSY case from nilfs_get_sb function The following EBUSY case in nilfs_get_sb() is meaningless. Indeed, this error code is never returned to the caller. if (!s->s_root) { ... } else if (!(s->s_flags & MS_RDONLY)) { err = -EBUSY; } This simply removes the else case. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:17 -04:00
Christoph Hellwig	d731e06323	nilfs2: call nilfs2_write_super from nilfs2_sync_fs The call to ->write_super from __sync_filesystem will go away, so make sure nilfs2 performs the same actions from inside ->sync_fs. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:17 -04:00
Alessio Igor Bogani	337eb00a2c	Push BKL down into ->remount_fs() [xfs, btrfs, capifs, shmem don't need BKL, exempt] Signed-off-by: Alessio Igor Bogani <abogani@texware.it> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:11 -04:00
Christoph Hellwig	6cfd014842	push BKL down into ->put_super Move BKL into ->put_super from the only caller. A couple of filesystems had trivial enough ->put_super (only kfree and NULLing of s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs, hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most of them probably don't need it, but I'd rather sort that out individually. Preferably after all the other BKL pushdowns in that area. [AV: original used to move lock_super() down as well; these changes are removed since we don't do lock_super() at all in generic_shutdown_super() now] [AV: fuse, btrfs and xfs are known to need no damn BKL, exempt] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:07 -04:00
Christoph Hellwig	8c85e12512	remove ->write_super call in generic_shutdown_super We just did a full fs writeout using sync_filesystem before, and if that's not enough for the filesystem it can perform it's own writeout in ->put_super, which many filesystems already do. Move a call to foofs_write_super into every foofs_put_super for now to guarantee identical behaviour until it's cleaned up by the individual filesystem maintainers. Exceptions: - affs already has identical copy & pasted code at the beginning of affs_put_super so no need to do it twice. - xfs does the right thing without it and I have changes pending for the xfs tree touching this are so I don't really need conflicts here.. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:06 -04:00
Linus Torvalds	c9059598ea	Merge branch 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits) block: add request clone interface (v2) floppy: fix hibernation ramdisk: remove long-deprecated "ramdisk=" boot-time parameter fs/bio.c: add missing __user annotation block: prevent possible io_context->refcount overflow Add serial number support for virtio_blk, V4a block: Add missing bounce_pfn stacking and fix comments Revert "block: Fix bounce limit setting in DM" cciss: decode unit attention in SCSI error handling code cciss: Remove no longer needed sendcmd reject processing code cciss: change SCSI error handling routines to work with interrupts enabled. cciss: separate error processing and command retrying code in sendcmd_withirq_core() cciss: factor out fix target status processing code from sendcmd functions cciss: simplify interface of sendcmd() and sendcmd_withirq() cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code cciss: Use schedule_timeout_uninterruptible in SCSI error handling code block: needs to set the residual length of a bidi request Revert "block: implement blkdev_readpages" block: Fix bounce limit setting in DM Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt ... Manually fix conflicts with tracing updates in: block/blk-sysfs.c drivers/ide/ide-atapi.c drivers/ide/ide-cd.c drivers/ide/ide-floppy.c drivers/ide/ide-tape.c include/trace/events/block.h kernel/trace/blktrace.c	2009-06-11 11:10:35 -07:00
Ryusuke Konishi	c3a7abf06c	nilfs2: support contiguous lookup of blocks Although get_block() callback function can return extent of contiguous blocks with bh->b_size, nilfs_get_block() function did not support this feature. This adds contiguous lookup feature to the block mapping codes of nilfs, and allows the nilfs_get_blocks() function to return the extent information by applying the feature. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:12 +09:00
Ryusuke Konishi	fa032744ad	nilfs2: add sync_page method to page caches of meta data This applies block_sync_page() function to the sync_page method of page caches for meta data files, gc page caches, and btree node buffers. This is a companion patch of ("nilfs2: enable sync_page mothod") which applied the function for data pages. This allows lock_page() for those meta data to unplug pending bio requests. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:12 +09:00
Ryusuke Konishi	a53b4751ae	nilfs2: use device's backing_dev_info for btree node caches Previously, default_backing_dev_info was used for the mapping of btree node caches. This uses device dependent backing_dev_info to allow detailed control of the device for the btree node pages. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:12 +09:00
Ryusuke Konishi	30c25be71f	nilfs2: return EBUSY against delete request on snapshot This helps userland programs like the rmcp command to distinguish error codes returned against a checkpoint removal request. Previously -EPERM was returned, and not discriminable from real permission errors. This also allows removal of the latest checkpoint because the deletion leads to create a new checkpoint, and thus it's harmless for the filesystem. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:12 +09:00
Ryusuke Konishi	e85dc1d529	nilfs2: enable sync_page method This adds a missing sync_page method which unplugs bio requests when waiting for page locks. This will improve read performance of nilfs. Here is a measurement result using dd command. Without this patch: # mount -t nilfs2 /dev/sde1 /test # dd if=/test/aaa of=/dev/null bs=512k 1024+0 records in 1024+0 records out 536870912 bytes (537 MB) copied, 6.00688 seconds, 89.4 MB/s With this patch: # mount -t nilfs2 /dev/sde1 /test # dd if=/test/aaa of=/dev/null bs=512k 1024+0 records in 1024+0 records out 536870912 bytes (537 MB) copied, 3.54998 seconds, 151 MB/s Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:11 +09:00
Ryusuke Konishi	30bda0b8ae	nilfs2: set bio unplug flag for the last bio in segment This sets BIO_RW_UNPLUG flag on the last bio of each segment during write. The last bio should be unplugged immediately because the caller waits for the completion after the submission. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:11 +09:00
Ryusuke Konishi	003ff182fd	nilfs2: allow future expansion of metadata read out via get info ioctl Nilfs has some ioctl commands to read out metadata from meta data files: - NILFS_IOCTL_GET_CPINFO for checkpoint file, - NILFS_IOCTL_GET_SUINFO for segment usage file, and - NILFS_IOCTL_GET_VINFO for Disk Address Transalation (DAT) file, respectively. Every routine on these metadata files is implemented so that it allows future expansion of on-disk format. But, the above ioctl commands do not support expansion even though nilfs_argv structure can handle arbitrary size for data exchanged via ioctl. This allows future expansion of the following structures which give basic format of the "get information" ioctls: - struct nilfs_cpinfo - struct nilfs_suinfo - struct nilfs_vinfo So, this introduces forward compatility of such ioctl commands. In this patch, a sanity check in nilfs_ioctl_get_info() function is changed to accept larger data structure [1], and metadata read routines are rewritten so that they become compatible for larger structures; the routines will just ignore the remaining fields which the current version of nilfs doesn't know. [1] The ioctl function already has another upper limit (PAGE_SIZE against a structure, which appears in nilfs_ioctl_wrap_copy function), and this will not cause security problem. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:11 +09:00
Hisashi Hifumi	258ef67e24	NILFS2: Pagecache usage optimization on NILFS2 Hi, I introduced "is_partially_uptodate" aops for NILFS2. A page can have multiple buffers and even if a page is not uptodate, some buffers can be uptodate on pagesize != blocksize environment. This aops checks that all buffers which correspond to a part of a file that we want to read are uptodate. If so, we do not have to issue actual read IO to HDD even if a page is not uptodate because the portion we want to read are uptodate. "block_is_partially_uptodate" function is already used by ext2/3/4. With the following patch random read/write mixed workloads or random read after random write workloads can be optimized and we can get performance improvement. I did a performance test using the sysbench. 1 --file-block-size=8K --file-total-size=2G --file-test-mode=rndrw --file-fsync-freq=0 --fil e-rw-ratio=1 run -2.6.30-rc5 Test execution summary: total time: 151.2907s total number of events: 200000 total time taken by event execution: 2409.8387 per-request statistics: min: 0.0000s avg: 0.0120s max: 0.9306s approx. 95 percentile: 0.0439s Threads fairness: events (avg/stddev): 12500.0000/238.52 execution time (avg/stddev): 150.6149/0.01 -2.6.30-rc5-patched Test execution summary: total time: 140.8828s total number of events: 200000 total time taken by event execution: 2240.8577 per-request statistics: min: 0.0000s avg: 0.0112s max: 0.8750s approx. 95 percentile: 0.0418s Threads fairness: events (avg/stddev): 12500.0000/218.43 execution time (avg/stddev): 140.0536/0.01 arch: ia64 pagesize: 16k Thanks. Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:11 +09:00
Ryusuke Konishi	7cde31d7d6	nilfs2: remove nilfs_btree_operations from btree mapping will remove indirect function calls using nilfs_btree_operations table. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:11 +09:00
Ryusuke Konishi	355c6b6103	nilfs2: remove nilfs_direct_operations from direct mapping will remove indirect function calls using nilfs_direct_operations table. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:11 +09:00
Ryusuke Konishi	d4b961576d	nilfs2: remove bmap pointer operations Previously, the bmap codes of nilfs used three types of function tables. The abuse of indirect function calls decreased source readability and suffered many indirect jumps which would confuse branch prediction of processors. This eliminates one type of the function tables, nilfs_bmap_ptr_operations, which was used to dispatch low level pointer operations of the nilfs bmap. This adds a new integer variable "b_ptr_type" to nilfs_bmap struct, and uses the value to select the pointer operations. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:10 +09:00
Ryusuke Konishi	3033342a0b	nilfs2: remove useless b_low and b_high fields from nilfs_bmap struct This will cut off 16 bytes from the nilfs_bmap struct which is embedded in the on-memory inode of nilfs. The b_high field was never used, and the b_low field stores a constant value which can be determined by whether the inode uses btree for block mapping or not. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:10 +09:00
Ryusuke Konishi	e473c1f265	nilfs2: remove pointless NULL check of bpop_commit_alloc_ptr function This indirect function is set to NULL only for gc cache inodes, but the gc cache inodes never call this function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:10 +09:00
Ryusuke Konishi	f198dbb9cf	nilfs2: move get block functions in bmap.c into btree codes Two get block function for btree nodes, nilfs_bmap_get_block() and nilfs_bmap_get_new_block(), are called only from the btree codes. This relocation will increase opportunities of compiler optimization. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:10 +09:00
Ryusuke Konishi	9f098900ad	nilfs2: remove nilfs_bmap_delete_block nilfs_bmap_delete_block() is a wrapper function calling nilfs_btnode_delete(). This removes it for simplicity. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:10 +09:00
Ryusuke Konishi	087d01b425	nilfs2: remove nilfs_bmap_put_block nilfs_bmap_put_block() is a wrapper function calling brelse(). This eliminates the wrapper for simplicity. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:10 +09:00
Ryusuke Konishi	654137dd46	nilfs2: remove header file for segment list operations This will eliminate obsolete list operations of nilfs_segment_entry structure which has been used to handle mutiple segment numbers. The patch ("nilfs2: remove list of freeing segments") removed use of the structure from the segment constructor code, and this patch simplifies the remaining code by integrating it into recovery.c. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:09 +09:00
Ryusuke Konishi	071cb4b819	nilfs2: eliminate removal list of segments This will clean up the removal list of segments and the related functions from segment.c and ioctl.c, which have hurt code readability. This elimination is applied by using nilfs_sufile_updatev() previously introduced in the patch ("nilfs2: add sufile function that can modify multiple segment usages"). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:09 +09:00
Ryusuke Konishi	dda54f4b87	nilfs2: add sufile function that can modify multiple segment usages This is a preparation for the later cleanup patch ("nilfs2: remove list of freeing segments"). This adds nilfs_sufile_updatev() to sufile, which can modify multiple segment usages at a time. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:09 +09:00
Ryusuke Konishi	d97a51a7e3	nilfs2: unify bmap operations starting use of indirect block address This simplifies some low level functions of bmap. Three bmap pointer operations, nilfs_bmap_start_v(), nilfs_bmap_commit_v(), and nilfs_bmap_abort_v(), are unified into one nilfs_bmap_start_v() function. And the related indirect function calls are replaced with it. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:09 +09:00
Ryusuke Konishi	6582207064	nilfs2: remove nilfs_dat_prepare_free function This function is unused. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-06-10 23:41:09 +09:00
Ryusuke Konishi	62013ab5d5	nilfs2: fix bh leak in nilfs_cpfile_delete_checkpoints function The nilfs_cpfile_delete_checkpoints() wrongly skips brelse() for the header block of checkpoint file in case of errors. This fixes the leak bug. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-05-30 22:07:50 +09:00
Martin K. Petersen	e1defc4ff0	block: Do away with the notion of hardsect_size Until now we have had a 1:1 mapping between storage device physical block size and the logical block sized used when addressing the device. With SATA 4KB drives coming out that will no longer be the case. The sector size will be 4KB but the logical block size will remain 512-bytes. Hence we need to distinguish between the physical block size and the logical ditto. This patch renames hardsect_size to logical_block_size. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-22 23:22:54 +02:00
Ryusuke Konishi	d504685363	nilfs2: fix memory leak in nilfs_ioctl_clean_segments This fixes a new memory leak problem in garbage collection. The problem was brought by the bugfix patch ("nilfs2: fix lock order reversal in nilfs_clean_segments ioctl"). Thanks to Kentaro Suzuki for finding this problem. Reported-by: Kentaro Suzuki <k_suzuki@ms.sylc.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-05-22 20:49:04 +09:00
Ryusuke Konishi	83aca8f480	nilfs2: check size of array structured data exchanged via ioctls Although some ioctls of nilfs2 exchange data in the form of indirectly referenced array, some of them lack size check on the array elements. This inserts the missing checks and rejects requests if data of ioctl does not have a valid format. We usually don't have to check size of structures that we associated with ioctl commands because the size is tested implicitly for identifying ioctl command; the checks this patch adds are for the cases where the implicit check is not applied. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-05-12 01:48:54 +09:00
Ryusuke Konishi	4f6b828837	nilfs2: fix lock order reversal in nilfs_clean_segments ioctl This is a companion patch to ("nilfs2: fix possible circular locking for get information ioctls"). This corrects lock order reversal between mm->mmap_sem and nilfs->ns_segctor_sem in nilfs_clean_segments() which was detected by lockdep check: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.30-rc3-nilfs-00003-g360bdc1 #7 ------------------------------------------------------- mmap/5294 is trying to acquire lock: (&nilfs->ns_segctor_sem){++++.+}, at: [<d0d0e846>] nilfs_transaction_begin+0xb6/0x10c [nilfs2] but task is already holding lock: (&mm->mmap_sem){++++++}, at: [<c043700a>] do_page_fault+0x1d8/0x30a which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&mm->mmap_sem){++++++}: [<c01470a5>] __lock_acquire+0x1066/0x13b0 [<c01474a9>] lock_acquire+0xba/0xdd [<c01836bc>] might_fault+0x68/0x88 [<c023c61d>] copy_from_user+0x2a/0x111 [<d0d120d0>] nilfs_ioctl_prepare_clean_segments+0x1d/0xf1 [nilfs2] [<d0d0e2aa>] nilfs_clean_segments+0x6d/0x1b9 [nilfs2] [<d0d11f68>] nilfs_ioctl+0x2ad/0x318 [nilfs2] [<c01a3be7>] vfs_ioctl+0x22/0x69 [<c01a408e>] do_vfs_ioctl+0x460/0x499 [<c01a4107>] sys_ioctl+0x40/0x5a [<c01031a4>] sysenter_do_call+0x12/0x38 [<ffffffff>] 0xffffffff -> #0 (&nilfs->ns_segctor_sem){++++.+}: [<c0146e0b>] __lock_acquire+0xdcc/0x13b0 [<c01474a9>] lock_acquire+0xba/0xdd [<c0433f1d>] down_read+0x2a/0x3e [<d0d0e846>] nilfs_transaction_begin+0xb6/0x10c [nilfs2] [<d0cfe0e5>] nilfs_page_mkwrite+0xe7/0x154 [nilfs2] [<c0183b0b>] __do_fault+0x165/0x376 [<c01855cd>] handle_mm_fault+0x287/0x5d1 [<c043712d>] do_page_fault+0x2fb/0x30a [<c0435462>] error_code+0x72/0x78 [<ffffffff>] 0xffffffff where nilfs_clean_segments() holds: nilfs->ns_segctor_sem -> copy_from_user() --> page fault -> mm->mmap_sem And, page fault path may hold: page fault -> mm->mmap_sem --> nilfs_page_mkwrite() -> nilfs->ns_segctor_sem Even though nilfs_clean_segments() does not perform write access on given user pages, it may cause deadlock because nilfs->ns_segctor_sem is shared per device and mm->mmap_sem can be shared with other tasks. To avoid this problem, this patch moves all calls of copy_from_user() outside the nilfs->ns_segctor_sem lock in the ioctl. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-05-11 14:54:41 +09:00
Ryusuke Konishi	47eb6b9c8f	nilfs2: fix possible circular locking for get information ioctls This is one of two patches which are to correct possible circular locking between mm->mmap_sem and nilfs->ns_segctor_sem. The problem was detected by lockdep check as follows: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.30-rc3-nilfs-00002-g3552613 #6 ------------------------------------------------------- mmap/5418 is trying to acquire lock: (&nilfs->ns_segctor_sem){++++.+}, at: [<d0d0e852>] nilfs_transaction_begin+0xb6/0x10c [nilfs2] but task is already holding lock: (&mm->mmap_sem){++++++}, at: [<c043700a>] do_page_fault+0x1d8/0x30a which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&mm->mmap_sem){++++++}: [<c01470a5>] __lock_acquire+0x1066/0x13b0 [<c01474a9>] lock_acquire+0xba/0xdd [<c01836bc>] might_fault+0x68/0x88 [<c023c730>] copy_to_user+0x2c/0xfc [<d0d11b4f>] nilfs_ioctl_wrap_copy+0x103/0x160 [nilfs2] [<d0d11fa9>] nilfs_ioctl+0x30a/0x3b0 [nilfs2] [<c01a3be7>] vfs_ioctl+0x22/0x69 [<c01a408e>] do_vfs_ioctl+0x460/0x499 [<c01a4107>] sys_ioctl+0x40/0x5a [<c01031a4>] sysenter_do_call+0x12/0x38 [<ffffffff>] 0xffffffff -> #0 (&nilfs->ns_segctor_sem){++++.+}: [<c0146e0b>] __lock_acquire+0xdcc/0x13b0 [<c01474a9>] lock_acquire+0xba/0xdd [<c0433f1d>] down_read+0x2a/0x3e [<d0d0e852>] nilfs_transaction_begin+0xb6/0x10c [nilfs2] [<d0cfe0e5>] nilfs_page_mkwrite+0xe7/0x154 [nilfs2] [<c0183b0b>] __do_fault+0x165/0x376 [<c01855cd>] handle_mm_fault+0x287/0x5d1 [<c043712d>] do_page_fault+0x2fb/0x30a [<c0435462>] error_code+0x72/0x78 [<ffffffff>] 0xffffffff other info that might help us debug this: 1 lock held by mmap/5418: #0: (&mm->mmap_sem){++++++}, at: [<c043700a>] do_page_fault+0x1d8/0x30a stack backtrace: Pid: 5418, comm: mmap Not tainted 2.6.30-rc3-nilfs-00002-g3552613 #6 Call Trace: [<c0432145>] ? printk+0xf/0x12 [<c0145c48>] print_circular_bug_tail+0xaa/0xb5 [<c0146e0b>] __lock_acquire+0xdcc/0x13b0 [<d0d10149>] ? nilfs_sufile_get_stat+0x1e/0x105 [nilfs2] [<c013b59a>] ? up_read+0x16/0x2c [<d0d10225>] ? nilfs_sufile_get_stat+0xfa/0x105 [nilfs2] [<c01474a9>] lock_acquire+0xba/0xdd [<d0d0e852>] ? nilfs_transaction_begin+0xb6/0x10c [nilfs2] [<c0433f1d>] down_read+0x2a/0x3e [<d0d0e852>] ? nilfs_transaction_begin+0xb6/0x10c [nilfs2] [<d0d0e852>] nilfs_transaction_begin+0xb6/0x10c [nilfs2] [<d0cfe0e5>] nilfs_page_mkwrite+0xe7/0x154 [nilfs2] [<c0183b0b>] __do_fault+0x165/0x376 [<c01855cd>] handle_mm_fault+0x287/0x5d1 [<c043700a>] ? do_page_fault+0x1d8/0x30a [<c013b54f>] ? down_read_trylock+0x39/0x43 [<c043712d>] do_page_fault+0x2fb/0x30a [<c0436e32>] ? do_page_fault+0x0/0x30a [<c0435462>] error_code+0x72/0x78 [<c0436e32>] ? do_page_fault+0x0/0x30a This makes the lock granularity of nilfs->ns_segctor_sem finer than that of the mmap semaphore for ioctl commands except nilfs_clean_segments(). The successive patch ("nilfs2: fix lock order reversal in nilfs_clean_segments ioctl") is required to fully resolve the problem. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-05-11 12:57:46 +09:00
Ryusuke Konishi	843382370e	nilfs2: ensure to clear dirty state when deleting metadata file block This would fix the following failure during GC: nilfs_cpfile_delete_checkpoints: cannot delete block NILFS: GC failed during preparation: cannot delete checkpoints: err=-2 The problem was caused by a break in state consistency between page cache and btree; the above block was removed from the btree but the page buffering the block was remaining in the page cache in dirty state. This resolves the inconsistency by ensuring to clear dirty state of the page buffering the deleted block. Reported-by: David Arendt <admin@prnet.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-05-10 17:04:42 +09:00
Ryusuke Konishi	201913ed74	nilfs2: fix circular locking dependency of writer mutex This fixes the following circular locking dependency problem: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.30-rc3 #5 ------------------------------------------------------- segctord/3895 is trying to acquire lock: (&nilfs->ns_writer_mutex){+.+...}, at: [<d0d02172>] nilfs_mdt_get_block+0x89/0x20f [nilfs2] but task is already holding lock: (&bmap->b_sem){++++..}, at: [<d0d02d99>] nilfs_bmap_propagate+0x14/0x2e [nilfs2] which lock already depends on the new lock. The bugfix is done by replacing call sites of nilfs_get_writer() which are never called from read-only context with direct dereferencing of pointer to a writable FS-instance. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-05-09 13:36:57 +09:00
Ryusuke Konishi	85c2a74fab	nilfs2: fix possible recovery failure due to block creation without writer Some function calls in nilfs_prepare_segment_for_recovery() may fail because they can create blocks on meta data files without configuring a writable FS-instance. Concretely, nilfs_mdt_create_block() routine of meta data files will fail in that case. This fixes the problem by temporarily attaching a writable FS-instace during the function is called. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-05-09 13:36:56 +09:00
Ryusuke Konishi	c85399c2da	nilfs2: fix possible mismatch of sufile counters on recovery On-disk counters ndirtysegs and ncleansegs of sufile, can go wrong after roll-forward recovery because nilfs_prepare_segment_for_recovery() function marks segments dirty without adjusting value of these counters. This fixes the problem by adding a function to sufile which does the operation adjusting the counters, and by letting the recovery function use it. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:52 +09:00
Ryusuke Konishi	a703018f7b	nilfs2: segment usage file cleanups This will simplify sufile.c by sharing common code which repeatedly appears in routines updating a segment usage entry; a wrapper function nilfs_sufile_update() is introduced for the purpose, and counter modifications are integrated to a new function nilfs_sufile_mod_counter(). This is a preparation for the successive bugfix patch ("nilfs2: fix possible mismatch of sufile counters on recovery"). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:51 +09:00
Ryusuke Konishi	88072faf9a	nilfs2: fix wrong accounting and duplicate brelse in nilfs_sufile_set_error The nilfs_sufile_set_error() function wrongly adjusts the number of dirty segments instead of the number of clean segments. In addition, the function calls brelse() twice for the same buffer head. This fixes these bugs. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:51 +09:00
Ryusuke Konishi	3efb55b496	nilfs2: simplify handling of active state of segments fix This fixes a bug of ("nilfs2: simplify handling of active state of segments") patch. The patch did not take account that a base index is increased in nilfs_sufile_get_suinfo() function if requested entries go across block boundary on sufile. Due to this bug, the active flag sometimes appears on wrong segments and has induced malfunction of garbage collection. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:51 +09:00
Ryusuke Konishi	e7a7402c0d	nilfs2: remove module version A MODULE_VERSION() macro has been used in out-of-tree nilfs modules, but it's needless and not updated in tree. So, this removes it along with the version declaration. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:50 +09:00
Ryusuke Konishi	c2698e50e3	nilfs2: fix lockdep recursive locking warning on meta data files This fixes the following false detection of lockdep against nilfs meta data files: ============================================= [ INFO: possible recursive locking detected ] 2.6.29 #26 --------------------------------------------- mount.nilfs2/4185 is trying to acquire lock: (&mi->mi_sem){----}, at: [<d0c7925b>] nilfs_sufile_get_stat+0x1e/0x105 [nilfs2] but task is already holding lock: (&mi->mi_sem){----}, at: [<d0c72026>] nilfs_count_free_blocks+0x48/0x84 [nilfs2] Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:50 +09:00
Ryusuke Konishi	bcb48891b0	nilfs2: fix lockdep recursive locking warning on bmap The bmap semaphore of DAT file can be held while a bmap of other files is locked. This has caused the following false detection of lockdep check: mount.nilfs2/4667 is trying to acquire lock: (&bmap->b_sem){..--}, at: [<d0c6c4b4>] nilfs_bmap_lookup_at_level+0x1a/0x74 [nilfs2] but task is already holding lock: (&bmap->b_sem){..--}, at: [<d0c6c4b4>] nilfs_bmap_lookup_at_level+0x1a/0x74 [nilfs2] This will fix the false detection by distinguishing semaphores of the DAT and other files. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:49 +09:00
Ryusuke Konishi	c306af23e1	nilfs2: return f_fsid for statfs2 This follows the change of Coly Li's series ("fs: return f_fsid for statfs(2)"), and make nilfs2 return f_fsid info for statfs(2). Acked-by: Coly Li <coly.li@suse.de> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2009-04-13 09:53:49 +09:00
Ryusuke Konishi	612392307c	nilfs2: support nanosecond timestamp After a review of user's feedback for finding out other compatibility issues, I found nilfs improperly initializes timestamps in inode; CURRENT_TIME was used there instead of CURRENT_TIME_SEC even though nilfs didn't have nanosecond timestamps on disk. A few users gave us the report that the tar program sometimes failed to expand symbolic links on nilfs, and it turned out to be the cause. Instead of applying the above displacement, I've decided to support nanosecond timestamps on this occation. Fortunetaly, a needless 64-bit field was in the nilfs_inode struct, and I found it's available for this purpose without impact for the users. So, this will do the enhancement and resolve the tar problem. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:20 -07:00
Ryusuke Konishi	e339ad31f5	nilfs2: introduce secondary super block The former versions didn't have extra super blocks. This improves the weak point by introducing another super block at unused region in tail of the partition. This doesn't break disk format compatibility; older versions just ingore the secondary super block, and new versions just recover it if it doesn't exist. The partition created by an old mkfs may not have unused region, but in that case, the secondary super block will not be added. This doesn't make more redundant copies of the super block; it is a future work. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:20 -07:00
Ryusuke Konishi	cece552074	nilfs2: simplify handling of active state of segments will reduce some lines of segment constructor. Previously, the state was complexly controlled through a list of segments in order to keep consistency in meta data of usage state of segments. Instead, this presents ``calculated'' active flags to userland cleaner program and stop maintaining its real flag on disk. Only by this fake flag, the cleaner cannot exactly know if each segment is reclaimable or not. However, the recent extension of nilfs_sustat ioctl struct (nilfs2-extend-nilfs_sustat-ioctl-struct.patch) can prevent the cleaner from reclaiming in-use segment wrongly. So, now I can apply this for simplification. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:20 -07:00
Ryusuke Konishi	c96fa464a5	nilfs2: mark minor flag for checkpoint created by internal operation Nilfs creates checkpoints even for garbage collection or metadata updates such as checkpoint mode change. So, user often sees checkpoints created only by such internal operations. This is inconvenient in some situations. For example, application that monitors checkpoints and changes them to snapshots, will fall into an infinite loop because it cannot distinguish internally created checkpoints. This patch solves this sort of problem by adding a flag to checkpoint for identification. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:19 -07:00
Ryusuke Konishi	458c5b0822	nilfs2: clean up sketch file The sketch file is a file to mark checkpoints with user data. It was experimentally introduced in the original implementation, and now obsolete. The file was handled differently with regular files; the file size got truncated when a checkpoint was created. This stops the special treatment and will treat it as a regular file. Most users are not affected because mkfs.nilfs2 no longer makes this file. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:19 -07:00
Ryusuke Konishi	e626874685	nilfs2: super block operations fix endian bug This adds a missing endian conversion of checksum field in the super block. This fixes compatibility issue on big endian machines which will come to surface after supporting recovery of super block. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:19 -07:00
Ryusuke Konishi	1f5abe7e7d	nilfs2: replace BUG_ON and BUG calls triggerable from ioctl Pekka Enberg advised me: > It would be nice if BUG(), BUG_ON(), and panic() calls would be > converted to proper error handling using WARN_ON() calls. The BUG() > call in nilfs_cpfile_delete_checkpoints(), for example, looks to be > triggerable from user-space via the ioctl() system call. This will follow the comment and keep them to a minimum. Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:19 -07:00
Ryusuke Konishi	2c2e52fc4f	nilfs2: extend nilfs_sustat ioctl struct This adds a new argument to the nilfs_sustat structure. The extended field allows to delete volatile active state of segments, which was needed to protect freshly-created segments from garbage collection but has confused code dealing with segments. This extension alleviates the mess and gives room for further simplifications. The volatile active flag is not persistent, so it's eliminable on this occasion without affecting compatibility other than the ioctl change. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:19 -07:00
Ryusuke Konishi	7a9461939a	nilfs2: use unlocked_ioctl Pekka Enberg suggested converting ->ioctl operations to use ->unlocked_ioctl to avoid BKL. The conversion was verified to be safe, so I will take it on this occasion. Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:19 -07:00
Ryusuke Konishi	8082d36aed	nilfs2: remove compat ioctl code This removes compat code from the nilfs ioctls and applies the same function for both .ioctl and .compat_ioctl file operations. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:18 -07:00
Ryusuke Konishi	dc498d09be	nilfs2: use fixed sized types for ioctl structures Nilfs ioctl had structures not having fixed sized types such as: struct nilfs_argv { void *v_base; size_t v_nmembs; size_t v_size; int v_index; int v_flags; }; Further, some of them are wrongly aligned: e.g. struct nilfs_cpmode { __u64 cm_cno; int cm_mode; }; The size of wrongly aligned structures varies depending on architectures, and it breaks the identity of ioctl commands, which leads to arch dependent errors. Previously, these are compensated by using compat_ioctl. This fixes these problems and allows removal of compat ioctl. Since this will change sizes of those structures, binary compatibility for the past utilities will once break; new utilities have to be used instead. However, it would be helpful to avoid platform dependent problems in the long term. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:18 -07:00
Ryusuke Konishi	1088dcf4c3	nilfs2: remove timedwait ioctl command This removes NILFS_IOCTL_TIMEDWAIT command from ioctl interface along with the related flags and wait queue. The command is terrible because it just sleeps in the ioctl. I prefer to avoid this by devising means of event polling in userland program. By reconsidering the userland GC daemon, I found this is possible without changing behaviour of the daemon and sacrificing efficiency. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:18 -07:00
Ryusuke Konishi	76068c4ff1	nilfs2: fix buggy behavior seen in enumerating checkpoints This will fix the weird behavior of lscp command in listing continuously created checkpoints; the output of lscp is rewinded regularly for the recent nilfs. As a result of debugging, a defect was found in nilfs_cpfile_do_get_cpinfo() function. Though the function can be repeatedly called to enumerate checkpoints and it can skip invalid checkpoint entries, the index value was not carried between successive calls. The bug has long been present, and came to surface after applying a bugfix nilfs2-fix-problems-of-memory-allocation-in-ioctl.patch, which increased frequency of calling the function. The similar bugfix was already applied for ``snapshots'' by nilfs2-fix-gc-failure-on-volumes-keeping-numerous-snapshots.patch. This fixes the problem by making the index argument bidirectional on the function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:31:18 -07:00

... 5 6 7 8 9 ...

683 Commits