linux

Commit Graph

Author	SHA1	Message	Date
Frederic Weisbecker	175359f89d	reiserfs: Fix softlockup while waiting on an inode When we wait for an inode through reiserfs_iget(), we hold the reiserfs lock. And waiting for an inode may imply waiting for its writeback. But the inode writeback path may also require the reiserfs lock, which leads to a deadlock. We just need to release the reiserfs lock from reiserfs_iget() to fix this. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Christian Kujau <lists@nerdbynature.de> Cc: Chris Mason <chris.mason@oracle.com>	2010-02-14 19:07:56 +01:00
Linus Torvalds	82062e7b50	Merge branch 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing * 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing: reiserfs: Relax reiserfs_xattr_set_handle() while acquiring xattr locks reiserfs: Fix unreachable statement reiserfs: Don't call reiserfs_get_acl() with the reiserfs lock reiserfs: Relax lock on xattr removing reiserfs: Relax the lock before truncating pages reiserfs: Fix recursive lock on lchown reiserfs: Fix mistake in down_write() conversion	2010-01-08 14:03:55 -08:00
Frederic Weisbecker	108d3943c0	reiserfs: Relax the lock before truncating pages While truncating a file, reiserfs_setattr() calls inode_setattr() that will truncate the mapping for the given inode, but for that it needs the pages locks. In order to release these, the owners need the reiserfs lock to complete their jobs. But they can't, as we don't release it before calling inode_setattr(). We need to do that to fix the following softlockups: INFO: task flush-8:0:2149 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. flush-8:0 D f51af998 0 2149 2 0x00000000 f51af9ac 00000092 00000002 f51af998 c2803304 00000000 c1894ad0 010f3000 f51af9cc c1462604 c189ef80 f51af974 c1710304 f715b450 f715b5ec c2807c40 00000000 0005bb00 c2803320 c102c55b c1710304 c2807c50 c2803304 00000246 Call Trace: [<c1462604>] ? schedule+0x434/0xb20 [<c102c55b>] ? resched_task+0x4b/0x70 [<c106fa22>] ? mark_held_locks+0x62/0x80 [<c146414d>] ? mutex_lock_nested+0x1fd/0x350 [<c14640b9>] mutex_lock_nested+0x169/0x350 [<c1178cde>] ? reiserfs_write_lock+0x2e/0x40 [<c1178cde>] reiserfs_write_lock+0x2e/0x40 [<c11719a2>] do_journal_end+0xc2/0xe70 [<c1172912>] journal_end+0xb2/0x120 [<c11686b3>] ? pathrelse+0x33/0xb0 [<c11729e4>] reiserfs_end_persistent_transaction+0x64/0x70 [<c1153caa>] reiserfs_get_block+0x12ba/0x15f0 [<c106fa22>] ? mark_held_locks+0x62/0x80 [<c1154b24>] reiserfs_writepage+0xa74/0xe80 [<c1465a27>] ? _raw_spin_unlock_irq+0x27/0x50 [<c11f3d25>] ? radix_tree_gang_lookup_tag_slot+0x95/0xc0 [<c10b5377>] ? find_get_pages_tag+0x127/0x1a0 [<c106fa22>] ? mark_held_locks+0x62/0x80 [<c106fcd4>] ? trace_hardirqs_on_caller+0x124/0x170 [<c10bc1e0>] __writepage+0x10/0x40 [<c10bc9ab>] write_cache_pages+0x16b/0x320 [<c10bc1d0>] ? __writepage+0x0/0x40 [<c10bcb88>] generic_writepages+0x28/0x40 [<c10bcbd5>] do_writepages+0x35/0x40 [<c11059f7>] writeback_single_inode+0xc7/0x330 [<c11067b2>] writeback_inodes_wb+0x2c2/0x490 [<c1106a86>] wb_writeback+0x106/0x1b0 [<c1106cf6>] wb_do_writeback+0x106/0x1e0 [<c1106c18>] ? wb_do_writeback+0x28/0x1e0 [<c1106e0a>] bdi_writeback_task+0x3a/0xb0 [<c10cbb13>] bdi_start_fn+0x63/0xc0 [<c10cbab0>] ? bdi_start_fn+0x0/0xc0 [<c105d1f4>] kthread+0x74/0x80 [<c105d180>] ? kthread+0x0/0x80 [<c100327a>] kernel_thread_helper+0x6/0x10 3 locks held by flush-8:0/2149: #0: (&type->s_umount_key#30){+++++.}, at: [<c110676f>] writeback_inodes_wb+0x27f/0x490 #1: (&journal->j_mutex){+.+...}, at: [<c117199a>] do_journal_end+0xba/0xe70 #2: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1178cde>] reiserfs_write_lock+0x2e/0x40 INFO: task fstest:3813 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fstest D 00000002 0 3813 3812 0x00000000 f5103c94 00000082 f5103c40 00000002 f5ad5450 00000007 f5103c28 011f3000 00000006 f5ad5450 c10bb005 00000480 c1710304 f5ad5450 f5ad55ec c2907c40 00000001 f5ad5450 f5103c74 00000046 00000002 f5ad5450 00000007 f5103c6c Call Trace: [<c10bb005>] ? free_hot_cold_page+0x1d5/0x280 [<c1462d64>] io_schedule+0x74/0xc0 [<c10b5a45>] sync_page+0x35/0x60 [<c146325a>] __wait_on_bit_lock+0x4a/0x90 [<c10b5a10>] ? sync_page+0x0/0x60 [<c10b59e5>] __lock_page+0x85/0x90 [<c105d660>] ? wake_bit_function+0x0/0x60 [<c10bf654>] truncate_inode_pages_range+0x1e4/0x2d0 [<c10bf75f>] truncate_inode_pages+0x1f/0x30 [<c10bf7cf>] truncate_pagecache+0x5f/0xa0 [<c10bf86a>] vmtruncate+0x5a/0x70 [<c10fdb7d>] inode_setattr+0x5d/0x190 [<c1150117>] reiserfs_setattr+0x1f7/0x2f0 [<c1464569>] ? down_write+0x49/0x70 [<c10fde01>] notify_change+0x151/0x330 [<c10e6f3d>] do_truncate+0x6d/0xa0 [<c10f4ce2>] do_filp_open+0x9a2/0xcf0 [<c1465aec>] ? _raw_spin_unlock+0x2c/0x50 [<c10fec50>] ? alloc_fd+0xe0/0x100 [<c10e602d>] do_sys_open+0x6d/0x130 [<c1002cfb>] ? sysenter_exit+0xf/0x16 [<c10e615e>] sys_open+0x2e/0x40 [<c1002ccc>] sysenter_do_call+0x12/0x32 3 locks held by fstest/3813: #0: (&sb->s_type->i_mutex_key#4){+.+.+.}, at: [<c10e6f33>] do_truncate+0x63/0xa0 #1: (&sb->s_type->i_alloc_sem_key#3){+.+.+.}, at: [<c10fdf07>] notify_change+0x257/0x330 #2: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1178c8e>] reiserfs_write_lock_once+0x2e/0x50 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu>	2010-01-05 08:00:29 +01:00
Frederic Weisbecker	5fe1533fda	reiserfs: Fix recursive lock on lchown On chown, reiserfs will call reiserfs_setattr() to change the owner of the given inode, but it may also recursively call reiserfs_setattr() to propagate the owner change to the private xattr files for this inode. Hence, the reiserfs lock may be acquired twice which is not wanted as reiserfs_setattr() calls journal_begin() that is going to try to relax the lock in order to safely acquire the journal mutex. Using reiserfs_write_lock_once() from reiserfs_setattr() solves the problem. This fixes the following warning, that precedes a lockdep report. WARNING: at fs/reiserfs/lock.c:95 reiserfs_lock_check_recursive+0x3f/0x50() Hardware name: MS-7418 Unwanted recursive reiserfs lock! Pid: 4189, comm: fsstress Not tainted 2.6.33-rc2-tip-atom+ #195 Call Trace: [<c1178bff>] ? reiserfs_lock_check_recursive+0x3f/0x50 [<c1178bff>] ? reiserfs_lock_check_recursive+0x3f/0x50 [<c103f7ac>] warn_slowpath_common+0x6c/0xc0 [<c1178bff>] ? reiserfs_lock_check_recursive+0x3f/0x50 [<c103f84b>] warn_slowpath_fmt+0x2b/0x30 [<c1178bff>] reiserfs_lock_check_recursive+0x3f/0x50 [<c1172ae3>] do_journal_begin_r+0x83/0x350 [<c1172f2d>] journal_begin+0x7d/0x140 [<c106509a>] ? in_group_p+0x2a/0x30 [<c10fda71>] ? inode_change_ok+0x91/0x140 [<c115007d>] reiserfs_setattr+0x15d/0x2e0 [<c10f9bf3>] ? dput+0xe3/0x140 [<c1465adc>] ? _raw_spin_unlock+0x2c/0x50 [<c117831d>] chown_one_xattr+0xd/0x10 [<c11780a3>] reiserfs_for_each_xattr+0x113/0x2c0 [<c1178310>] ? chown_one_xattr+0x0/0x10 [<c14641e9>] ? mutex_lock_nested+0x2a9/0x350 [<c117826f>] reiserfs_chown_xattrs+0x1f/0x60 [<c106509a>] ? in_group_p+0x2a/0x30 [<c10fda71>] ? inode_change_ok+0x91/0x140 [<c1150046>] reiserfs_setattr+0x126/0x2e0 [<c1177c20>] ? reiserfs_getxattr+0x0/0x90 [<c11b0d57>] ? cap_inode_need_killpriv+0x37/0x50 [<c10fde01>] notify_change+0x151/0x330 [<c10e659f>] chown_common+0x6f/0x90 [<c10e67bd>] sys_lchown+0x6d/0x80 [<c1002ccc>] sysenter_do_call+0x12/0x32 ---[ end trace 7c2b77224c1442fc ]--- Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu>	2010-01-05 07:59:38 +01:00
Linus Torvalds	45d28b0972	Merge branch 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing * 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing: reiserfs: Safely acquire i_mutex from xattr_rmdir reiserfs: Safely acquire i_mutex from reiserfs_for_each_xattr reiserfs: Fix journal mutex <-> inode mutex lock inversion reiserfs: Fix unwanted recursive reiserfs lock in reiserfs_unlink() reiserfs: Relax lock before open xattr dir in reiserfs_xattr_set_handle() reiserfs: Relax reiserfs lock while freeing the journal reiserfs: Fix reiserfs lock <-> i_mutex dependency inversion on xattr reiserfs: Warn on lock relax if taken recursively reiserfs: Fix reiserfs lock <-> i_xattr_sem dependency inversion reiserfs: Fix remaining in-reclaim-fs <-> reclaim-fs-on locking inversion reiserfs: Fix reiserfs lock <-> inode mutex dependency inversion reiserfs: Fix reiserfs lock and journal lock inversion dependency reiserfs: Fix possible recursive lock	2010-01-02 11:17:05 -08:00
Jan Kara	ec8e2f7466	reiserfs: truncate blocks not used by a write It can happen that write does not use all the blocks allocated in write_begin either because of some filesystem error (like ENOSPC) or because page with data to write has been removed from memory. We truncate these blocks so that we don't have dangling blocks beyond i_size. Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-17 15:45:30 -08:00
Frederic Weisbecker	cb1c2e51c5	reiserfs: Fix reiserfs lock and journal lock inversion dependency When we were using the bkl, we didn't care about dependencies against other locks, but the mutex conversion created new ones, which is why we have reiserfs_mutex_lock_safe(), which unlocks the reiserfs lock before acquiring another mutex. But this trick actually fails if we have acquired the reiserfs lock recursively, as we try to unlock it to acquire the new mutex without inverted dependency, but we eventually only decrease its depth. This happens in the case of a nested inode creation/deletion. Say we have no space left on the device, we create an inode and tak the lock but fail to create its entry, then we release the inode using iput(), which calls reiserfs_delete_inode() that takes the reiserfs lock recursively. The path eventually ends up in journal_begin() where we try to take the journal safely but we fail because of the reiserfs lock recursion: [ INFO: possible circular locking dependency detected ] 2.6.32-06486-g053fe57 #2 ------------------------------------------------------- vi/23454 is trying to acquire lock: (&journal->j_mutex){+.+...}, at: [<c110dac4>] do_journal_begin_r+0x64/0x2f0 but task is already holding lock: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c11106a8>] reiserfs_write_lock+0x28/0x40 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&REISERFS_SB(s)->lock){+.+.+.}: [<c104f8f3>] validate_chain+0xa23/0xf70 [<c1050325>] __lock_acquire+0x4e5/0xa70 [<c105092a>] lock_acquire+0x7a/0xa0 [<c134c78f>] mutex_lock_nested+0x5f/0x2b0 [<c11106a8>] reiserfs_write_lock+0x28/0x40 [<c110dacb>] do_journal_begin_r+0x6b/0x2f0 [<c110ddcf>] journal_begin+0x7f/0x120 [<c10f76c2>] reiserfs_remount+0x212/0x4d0 [<c1093997>] do_remount_sb+0x67/0x140 [<c10a9ca6>] do_mount+0x436/0x6b0 [<c10a9f86>] sys_mount+0x66/0xa0 [<c1002c50>] sysenter_do_call+0x12/0x36 -> #0 (&journal->j_mutex){+.+...}: [<c104fe38>] validate_chain+0xf68/0xf70 [<c1050325>] __lock_acquire+0x4e5/0xa70 [<c105092a>] lock_acquire+0x7a/0xa0 [<c134c78f>] mutex_lock_nested+0x5f/0x2b0 [<c110dac4>] do_journal_begin_r+0x64/0x2f0 [<c110ddcf>] journal_begin+0x7f/0x120 [<c10ef52f>] reiserfs_delete_inode+0x9f/0x140 [<c10a55fc>] generic_delete_inode+0x9c/0x150 [<c10a56ed>] generic_drop_inode+0x3d/0x60 [<c10a4607>] iput+0x47/0x50 [<c10e915c>] reiserfs_create+0x16c/0x1c0 [<c109a9c1>] vfs_create+0xc1/0x130 [<c109dbec>] do_filp_open+0x81c/0x920 [<c109004f>] do_sys_open+0x4f/0x110 [<c1090179>] sys_open+0x29/0x40 [<c1002c50>] sysenter_do_call+0x12/0x36 other info that might help us debug this: 2 locks held by vi/23454: #0: (&sb->s_type->i_mutex_key#5){+.+.+.}, at: [<c109d64e>] do_filp_open+0x27e/0x920 #1: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c11106a8>] reiserfs_write_lock+0x28/0x40 stack backtrace: Pid: 23454, comm: vi Not tainted 2.6.32-06486-g053fe57 #2 Call Trace: [<c134b202>] ? printk+0x18/0x1e [<c104e960>] print_circular_bug+0xc0/0xd0 [<c104fe38>] validate_chain+0xf68/0xf70 [<c104ca9b>] ? trace_hardirqs_off+0xb/0x10 [<c1050325>] __lock_acquire+0x4e5/0xa70 [<c105092a>] lock_acquire+0x7a/0xa0 [<c110dac4>] ? do_journal_begin_r+0x64/0x2f0 [<c134c78f>] mutex_lock_nested+0x5f/0x2b0 [<c110dac4>] ? do_journal_begin_r+0x64/0x2f0 [<c110dac4>] ? do_journal_begin_r+0x64/0x2f0 [<c110ff80>] ? delete_one_xattr+0x0/0x1c0 [<c110dac4>] do_journal_begin_r+0x64/0x2f0 [<c110ddcf>] journal_begin+0x7f/0x120 [<c11105b5>] ? reiserfs_delete_xattrs+0x15/0x50 [<c10ef52f>] reiserfs_delete_inode+0x9f/0x140 [<c10a55bf>] ? generic_delete_inode+0x5f/0x150 [<c10ef490>] ? reiserfs_delete_inode+0x0/0x140 [<c10a55fc>] generic_delete_inode+0x9c/0x150 [<c10a56ed>] generic_drop_inode+0x3d/0x60 [<c10a4607>] iput+0x47/0x50 [<c10e915c>] reiserfs_create+0x16c/0x1c0 [<c1099a5d>] ? inode_permission+0x7d/0xa0 [<c109a9c1>] vfs_create+0xc1/0x130 [<c10e8ff0>] ? reiserfs_create+0x0/0x1c0 [<c109dbec>] do_filp_open+0x81c/0x920 [<c104ca9b>] ? trace_hardirqs_off+0xb/0x10 [<c134dc0d>] ? _spin_unlock+0x1d/0x20 [<c10a6eea>] ? alloc_fd+0xba/0xf0 [<c109004f>] do_sys_open+0x4f/0x110 [<c1090179>] sys_open+0x29/0x40 [<c1002c50>] sysenter_do_call+0x12/0x36 To fix this, use reiserfs_lock_once() from reiserfs_delete_inode() which prevents from adding reiserfs lock recursion. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de>	2009-12-14 11:47:11 +01:00
Frederic Weisbecker	1d2c6cfd40	kill-the-bkl/reiserfs: turn GFP_ATOMIC flag to GFP_NOFS in reiserfs_get_block() GFP_ATOMIC was used in reiserfs_get_block to not lose the Bkl so that nobody can modify the tree in the middle of its work. Now that we kicked out the bkl, we can use a more friendly flag. We use GFP_NOFS here because we already hold the reiserfs lock. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Laurent Riffard <laurent.riffard@free.fr> Cc: Thomas Gleixner <tglx@linutronix.de>	2009-11-20 18:25:02 +01:00
Frederic Weisbecker	27b3a5c51b	kill-the-bkl/reiserfs: drop the fs race watchdog from _get_block_create_0() We had a watchdog in _get_block_create_0() that jumped to a fixup retry path in case the bkl got relaxed while calling kmap(). This is not necessary anymore since we now have a reiserfs lock that is not implicitly relaxed while sleeping. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Laurent Riffard <laurent.riffard@free.fr> Cc: Thomas Gleixner <tglx@linutronix.de>	2009-10-14 23:34:31 +02:00
Frederic Weisbecker	7e94277050	kill-the-bkl/reiserfs: fix recursive reiserfs write lock in reiserfs_commit_write() reiserfs_commit_write() is always called with the write lock held. Thus the current calls to reiserfs_write_lock() in this function are acquiring the lock recursively. We can safely drop them. This also solves further assumptions for this lock to be really released while calling reiserfs_write_unlock(). Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Laurent Riffard <laurent.riffard@free.fr>	2009-09-14 07:18:29 +02:00
Frederic Weisbecker	d6f5b0aa08	kill-the-bkl/reiserfs: factorize the locking in reiserfs_write_end() reiserfs_write_end() is a hot path in reiserfs. We have two wasteful write lock lock/release inside that can be gathered without changing the code logic. This patch factorizes them out in a single protected section, reducing the number of contentions inside. [ Impact: reduce lock contention in a reiserfs hotpath ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-14 07:18:20 +02:00
Frederic Weisbecker	26931309a4	kill-the-bkl/reiserfs: lock only once on reiserfs_get_block() reiserfs_get_block() is one of these sites where the write lock might be acquired recursively. It's a particular problem because this function is called very often. It's a hot spot which needs to reschedule() periodically while converting direct items to indirect ones because it can take some time. Then if we are applying the write lock release/reacquire pattern on schedule() here, it may not produce the desired effect since we may have locked in more than one depth. The solution is to use reiserfs_write_lock_once() which won't try to reacquire the lock recursively. Then the lock will be really released before schedule(). Also, we only release the lock if TIF_NEED_RESCHED is set to not create wasteful numerous contentions. [ Impact: fix a too long holded lock case in reiserfs_get_block() ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-14 07:18:16 +02:00
Frederic Weisbecker	22c963addc	kill-the-BKL/reiserfs: lock only once in reiserfs_truncate_file Impact: fix a deadlock reiserfs_truncate_file() can be called from multiple context where the write lock can be already hold or not. This function also acquire (possibly recursively) the write lock. Subsequent releases before sleeping will not actually release the lock because we may be in more than one lock depth degree. A typical case is: reiserfs_file_release { acquire_the_lock() reiserfs_truncate_file() reacquire_the_lock() journal_begin() { do_journal_begin_r() { reiserfs_wait_on_write_block() { /* * Not released because still one * depth owned */ release_lock() wait_for_event() At this stage the event never happen because the one which provides it needs the write lock. We use reiserfs_write_lock_once() here to ensure that we don't acquire the write lock recursively. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@texware.it> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> LKML-Reference: <1239680065-25013-3-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-14 07:18:03 +02:00
Frederic Weisbecker	8ebc423238	reiserfs: kill-the-BKL This patch is an attempt to remove the Bkl based locking scheme from reiserfs and is intended. It is a bit inspired from an old attempt by Peter Zijlstra: http://lkml.indiana.edu/hypermail/linux/kernel/0704.2/2174.html The bkl is heavily used in this filesystem to prevent from concurrent write accesses on the filesystem. Reiserfs makes a deep use of the specific properties of the Bkl: - It can be acqquired recursively by a same task - It is released on the schedule() calls and reacquired when schedule() returns The two properties above are a roadmap for the reiserfs write locking so it's very hard to simply replace it with a common mutex. - We need a recursive-able locking unless we want to restructure several blocks of the code. - We need to identify the sites where the bkl was implictly relaxed (schedule, wait, sync, etc...) so that we can in turn release and reacquire our new lock explicitly. Such implicit releases of the lock are often required to let other resources producer/consumer do their job or we can suffer unexpected starvations or deadlocks. So the new lock that replaces the bkl here is a per superblock mutex with a specific property: it can be acquired recursively by a same task, like the bkl. For such purpose, we integrate a lock owner and a lock depth field on the superblock information structure. The first axis on this patch is to turn reiserfs_write_(un)lock() function into a wrapper to manage this mutex. Also some explicit calls to lock_kernel() have been converted to reiserfs_write_lock() helpers. The second axis is to find the important blocking sites (schedule...(), wait_on_buffer(), sync_dirty_buffer(), etc...) and then apply an explicit release of the write lock on these locations before blocking. Then we can safely wait for those who can give us resources or those who need some. Typically this is a fight between the current writer, the reiserfs workqueue (aka the async commiter) and the pdflush threads. The third axis is a consequence of the second. The write lock is usually on top of a lock dependency chain which can include the journal lock, the flush lock or the commit lock. So it's dangerous to release and trying to reacquire the write lock while we still hold other locks. This is fine with the bkl: T1 T2 lock_kernel() mutex_lock(A) unlock_kernel() // do something lock_kernel() mutex_lock(A) -> already locked by T1 schedule() (and then unlock_kernel()) lock_kernel() mutex_unlock(A) .... This is not fine with a mutex: T1 T2 mutex_lock(write) mutex_lock(A) mutex_unlock(write) // do something mutex_lock(write) mutex_lock(A) -> already locked by T1 schedule() mutex_lock(write) -> already locked by T2 deadlock The solution in this patch is to provide a helper which releases the write lock and sleep a bit if we can't lock a mutex that depend on it. It's another simulation of the bkl behaviour. The last axis is to locate the fs callbacks that are called with the bkl held, according to Documentation/filesystem/Locking. Those are: - reiserfs_remount - reiserfs_fill_super - reiserfs_put_super Reiserfs didn't need to explicitly lock because of the context of these callbacks. But now we must take care of that with the new locking. After this patch, reiserfs suffers from a slight performance regression (for now). On UP, a high volume write with dd reports an average of 27 MB/s instead of 30 MB/s without the patch applied. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Ingo Molnar <mingo@elte.hu> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Bron Gondwana <brong@fastmail.fm> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> LKML-Reference: <1239070789-13354-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-14 07:17:59 +02:00
Al Viro	281eede032	switch reiserfs to inode->i_acl Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:17:06 -04:00
Linus Torvalds	e1c5024828	Merge branch 'reiserfs-updates' from Jeff Mahoney * reiserfs-updates: (35 commits) reiserfs: rename [cn]_* variables reiserfs: rename p_._ variables reiserfs: rename p_s_tb to tb reiserfs: rename p_s_inode to inode reiserfs: rename p_s_bh to bh reiserfs: rename p_s_sb to sb reiserfs: strip trailing whitespace reiserfs: cleanup path functions reiserfs: factor out buffer_info initialization reiserfs: add atomic addition of selinux attributes during inode creation reiserfs: use generic readdir for operations across all xattrs reiserfs: journaled xattrs reiserfs: use generic xattr handlers reiserfs: remove i_has_xattr_dir reiserfs: make per-inode xattr locking more fine grained reiserfs: eliminate per-super xattr lock reiserfs: simplify xattr internal file lookups/opens reiserfs: Clean up xattrs when REISERFS_FS_XATTR is unset reiserfs: remove IS_PRIVATE helpers reiserfs: remove link detection code ... Fixed up conflicts manually due to: - quota name cleanups vs variable naming changes: fs/reiserfs/inode.c fs/reiserfs/namei.c fs/reiserfs/stree.c fs/reiserfs/xattr.c - exported include header cleanups include/linux/reiserfs_fs.h	2009-03-30 12:33:01 -07:00
Jeff Mahoney	995c762ea4	reiserfs: rename p_s_inode to inode This patch is a simple s/p_s_inode/inode/g to the reiserfs code. This is the third in a series of patches to rip out some of the awful variable naming in reiserfs. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:39 -07:00
Jeff Mahoney	0222e6571c	reiserfs: strip trailing whitespace This patch strips trailing whitespace from the reiserfs code. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:39 -07:00
Jeff Mahoney	57fe60df62	reiserfs: add atomic addition of selinux attributes during inode creation Some time ago, some changes were made to make security inode attributes be atomically written during inode creation. ReiserFS fell behind in this area, but with the reworking of the xattr code, it's now fairly easy to add. The following patch adds the ability for security attributes to be added automatically during inode creation. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:39 -07:00
Jeff Mahoney	0ab2621ebd	reiserfs: journaled xattrs Deadlocks are possible in the xattr code between the journal lock and the xattr sems. This patch implements journalling for xattr operations. The benefit is twofold: * It gets rid of the deadlock possibility by always ensuring that xattr write operations are initiated inside a transaction. * It corrects the problem where xattr backing files aren't considered any differently than normal files, despite the fact they are metadata. I discussed the added journal load with Chris Mason, and we decided that since xattrs (versus other journal activity) is fairly rare, the introduction of larger transactions to support journaled xattrs wouldn't be too big a deal. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:38 -07:00
Jeff Mahoney	d984561b32	reiserfs: eliminate per-super xattr lock With the switch to using inode->i_mutex locking during lookups/creation in the xattr root, the per-super xattr lock is no longer needed. This patch removes it. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:38 -07:00
Jeff Mahoney	6dfede6963	reiserfs: remove IS_PRIVATE helpers There are a number of helper functions for marking a reiserfs inode private that were leftover from reiserfs did its own thing wrt to private inodes. S_PRIVATE has been in the kernel for some time, so this patch removes the helpers and uses IS_PRIVATE instead. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:37 -07:00
Jeff Mahoney	0030b64570	reiserfs: use reiserfs_error() This patch makes many paths that are currently using warnings to handle the error. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:37 -07:00
Jeff Mahoney	c3a9c2109f	reiserfs: rework reiserfs_panic ReiserFS panics can be somewhat inconsistent. In some cases: * a unique identifier may be associated with it * the function name may be included * the device may be printed separately This patch aims to make warnings more consistent. reiserfs_warning() prints the device name, so printing it a second time is not required. The function name for a warning is always helpful in debugging, so it is now automatically inserted into the output. Hans has stated that every warning should have a unique identifier. Some cases lack them, others really shouldn't have them. reiserfs_warning() now expects an id associated with each message. In the rare case where one isn't needed, "" will suffice. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:36 -07:00
Jeff Mahoney	45b03d5e8e	reiserfs: rework reiserfs_warning ReiserFS warnings can be somewhat inconsistent. In some cases: * a unique identifier may be associated with it * the function name may be included * the device may be printed separately This patch aims to make warnings more consistent. reiserfs_warning() prints the device name, so printing it a second time is not required. The function name for a warning is always helpful in debugging, so it is now automatically inserted into the output. Hans has stated that every warning should have a unique identifier. Some cases lack them, others really shouldn't have them. reiserfs_warning() now expects an id associated with each message. In the rare case where one isn't needed, "" will suffice. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:36 -07:00
Jan Kara	77db4f25bc	reiserfs: Use lowercase names of quota functions Use lowercase names of quota functions instead of old uppercase ones. Signed-off-by: Jan Kara <jack@suse.cz> CC: reiserfs-devel@vger.kernel.org	2009-03-26 02:18:36 +01:00
Linus Torvalds	520c853466	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: inotify: fix type errors in interfaces fix breakage in reiserfs_new_inode() fix the treatment of jfs special inodes vfs: remove duplicate code in get_fs_type() add a vfs_fsync helper sys_execve and sys_uselib do not call into fsnotify zero i_uid/i_gid on inode allocation inode->i_op is never NULL ntfs: don't NULL i_op isofs check for NULL ->i_op in root directory is dead code affs: do not zero ->i_op kill suid bit only for regular files vfs: lseek(fd, 0, SEEK_CUR) race condition	2009-01-05 18:32:06 -08:00
Al Viro	2f1169e2dc	fix breakage in reiserfs_new_inode() now that we use ih.key earlier, we need to do all its setup early enough Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-01-05 11:54:29 -05:00
Nick Piggin	54566b2c15	fs: symlink write_begin allocation context fix With the write_begin/write_end aops, page_symlink was broken because it could no longer pass a GFP_NOFS type mask into the point where the allocations happened. They are done in write_begin, which would always assume that the filesystem can be entered from reclaim. This bug could cause filesystem deadlocks. The funny thing with having a gfp_t mask there is that it doesn't really allow the caller to arbitrarily tinker with the context in which it can be called. It couldn't ever be GFP_ATOMIC, for example, because it needs to take the page lock. The only thing any callers care about is __GFP_FS anyway, so turn that into a single flag. Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on this flag in their write_begin function. Change __grab_cache_page to accept a nofs argument as well, to honour that flag (while we're there, change the name to grab_cache_page_write_begin which is more instructive and does away with random leading underscores). This is really a more flexible way to go in the end anyway -- if a filesystem happens to want any extra allocations aside from the pagecache ones in ints write_begin function, it may now use GFP_KERNEL (rather than GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a random example). [kosaki.motohiro@jp.fujitsu.com: fix ubifs] [kosaki.motohiro@jp.fujitsu.com: fix fuse] Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <stable@kernel.org> [2.6.28.x] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Cleaned up the calling convention: just pass in the AOP flags untouched to the grab_cache_page_write_begin() function. That just simplifies everybody, and may even allow future expansion of the logic. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-04 13:33:20 -08:00
Al Viro	c1eaa26b67	nfsd race fixes: reiserfs ... and the same for reiserfs. The difference here is that we need insert_inode_locked4() to match iget5_locked(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-12-31 18:07:44 -05:00
Christoph Hellwig	440037287c	[PATCH] switch all filesystems over to d_obtain_alias Switch all users of d_alloc_anon to d_obtain_alias. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-23 05:13:01 -04:00
Nick Piggin	ca5de404ff	fs: rename buffer trylock Like the page lock change, this also requires name change, so convert the raw test_and_set bitop to a trylock. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-04 21:56:09 -07:00
Jeff Mahoney	eb35c218d8	reiserfs: discard prealloc in reiserfs_delete_inode With the removal of struct file from the xattr code, reiserfs_file_release() isn't used anymore, so the prealloc isn't discarded. This causes hangs later down the line. This patch adds it to reiserfs_delete_inode. In most cases it will be a no-op due to it already having been called, but will avoid hangs with xattrs. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-08 12:39:31 -07:00
David Howells	e231c2ee64	Convert ERR_PTR(PTR_ERR(p)) instances to ERR_CAST(p) Convert instances of ERR_PTR(PTR_ERR(p)) to ERR_CAST(p) using: perl -spi -e 's/ERR_PTR[(]PTR_ERR[(](.)[)][)]/ERR_CAST(\1)/' `grep -rl 'ERR_PTR[(]PTR_ERR' fs crypto net security` Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-07 08:42:26 -08:00
Christoph Lameter	eebd2aa355	Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user Simplify page cache zeroing of segments of pages through 3 functions zero_user_segments(page, start1, end1, start2, end2) Zeros two segments of the page. It takes the position where to start and end the zeroing which avoids length calculations and makes code clearer. zero_user_segment(page, start, end) Same for a single segment. zero_user(page, start, length) Length variant for the case where we know the length. We remove the zero_user_page macro. Issues: 1. Its a macro. Inline functions are preferable. 2. The KM_USER0 macro is only defined for HIGHMEM. Having to treat this special case everywhere makes the code needlessly complex. The parameter for zeroing is always KM_USER0 except in one single case that we open code. Avoiding KM_USER0 makes a lot of code not having to be dealing with the special casing for HIGHMEM anymore. Dealing with kmap is only necessary for HIGHMEM configurations. In those configurations we use KM_USER0 like we do for a series of other functions defined in highmem.h. Since KM_USER0 is depends on HIGHMEM the existing zero_user_page function could not be a macro. zero_user_* functions introduced here can be be inline because that constant is not used when these functions are called. Also extract the flushing of the caches to be outside of the kmap. [akpm@linux-foundation.org: fix nfs and ntfs build] [akpm@linux-foundation.org: fix ntfs build some more] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:13 -08:00
Christoph Hellwig	be55caf177	reiserfs: new export ops Another nice little cleanup by using the new methods. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Chris Mason <mason@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-22 08:13:20 -07:00
Jeff Mahoney	3ee1667042	reiserfs: fix usage of signed ints for block numbers Do a quick signedness check for block numbers. There are a number of places where signed integers are used for block numbers, which limits the usable file system size to 8 TiB. The disk format, excepting a problem which will be fixed in the following patch, supports file systems up to 16 TiB in size. This patch cleans up those sites so that we can enable the full usable size. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:35 -07:00
Jeff Layton	cdd6fe6e2f	reiserfs: turn of ATTR_KILL_S*ID at beginning of reiserfs_setattr reiserfs_setattr can call notify_change recursively using the same iattr struct. This could cause it to trip the BUG() in notify_change. Fix reiserfs to clear those bits near the beginning of the function. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:22 -07:00
Adrian Bunk	deba0f49b9	fs/reiserfs/: cleanups - remove the following no longer used functions: - bitmap.c: reiserfs_claim_blocks_to_be_allocated() - bitmap.c: reiserfs_release_claimed_blocks() - bitmap.c: reiserfs_can_fit_pages() - make the following functions static: - inode.c: restart_transaction() - journal.c: reiserfs_async_progress_wait() Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Vladimir V. Saveliev <vs@namesys.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:46 -07:00
Vladimir Saveliev	f7557e8f7f	reiserfs: use generic_cont_expand_simple This patch makes reiserfs to use AOP_FLAG_CONT_EXPAND in order to get rid of the special generic_cont_expand routine Signed-off-by: Vladimir Saveliev <vs@namesys.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:56 -07:00
Vladimir Saveliev	ba9d8cec6c	reiserfs: convert to new aops Convert reiserfs to new aops Signed-off-by: Vladimir Saveliev <vs@namesys.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:56 -07:00
Christoph Hellwig	a569425512	knfsd: exportfs: add exportfs.h header currently the export_operation structure and helpers related to it are in fs.h. fs.h is already far too large and there are very few places needing the export bits, so split them off into a separate header. [akpm@linux-foundation.org: fix cifs build] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Neil Brown <neilb@suse.de> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:06 -07:00
Nate Diller	f2fff59695	reiserfs: use zero_user_page Use zero_user_page() instead of open-coding it. Signed-off-by: Nate Diller <nate.diller@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:56 -07:00
Vladimir Saveliev	de14569f94	[PATCH] resierfs: avoid tail packing if an inode was ever mmapped This patch fixes a confusion reiserfs has for a long time. On release file operation reiserfs used to try to pack file data stored in last incomplete page of some files into metadata blocks. After packing the page got cleared with clear_page_dirty. It did not take into account that the page may be mmaped into other process's address space. Recent replacement for clear_page_dirty cancel_dirty_page found the confusion with sanity check that page has to be not mapped. The patch fixes the confusion by making reiserfs avoid tail packing if an inode was ever mmapped. reiserfs_mmap and reiserfs_file_release are serialized with mutex in reiserfs specific inode. reiserfs_mmap locks the mutex and sets a bit in reiserfs specific inode flags. reiserfs_file_release checks the bit having the mutex locked. If bit is set - tail packing is avoided. This eliminates a possibility that mmapped page gets cancel_page_dirty-ed. Signed-off-by: Vladimir Saveliev <vs@namesys.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <mason@suse.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-23 07:52:06 -08:00
Josef "Jeff" Sipek	fec6d055da	[PATCH] struct path: rename Reiserfs's struct path Rename Reiserfs's struct path to struct treepath to prevent name collision between it and struct path from fs/namei.c. Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu> Cc: <reiserfs-dev@namesys.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-08 08:28:40 -08:00
Yan Burman	01afb2134e	[PATCH] reiser: replace kmalloc+memset with kzalloc Replace kmalloc+memset with kzalloc Signed-off-by: Yan Burman <burman.yan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:41 -08:00
Suzuki K P	87b4126f10	[PATCH] fix reiserfs bad path release panic One of our test team hit a reiserfs_panic while running fsstress tests on 2.6.19-rc1. The message looks like : REISERFS: panic(device Null superblock): reiserfs[5676]: assertion !(p->path_length != 1 ) failed at fs/reiserfs/stree.c:397:reiserfs_check_path: path not properly relsed. The backtrace looked : kernel BUG in reiserfs_panic at fs/reiserfs/prints.c:361! .reiserfs_check_path+0x58/0x74 .reiserfs_get_block+0x1444/0x1508 .__block_prepare_write+0x1c8/0x558 .block_prepare_write+0x34/0x64 .reiserfs_prepare_write+0x118/0x1d0 .generic_file_buffered_write+0x314/0x82c .__generic_file_aio_write_nolock+0x350/0x3e0 .__generic_file_write_nolock+0x78/0xb0 .generic_file_write+0x60/0xf0 .reiserfs_file_write+0x198/0x2038 .vfs_write+0xd0/0x1b4 .sys_write+0x4c/0x8c syscall_exit+0x0/0x4 Upon debugging I found that the restart_transaction was not releasing the path if the th->refcount was > 1. /static/ int restart_transaction(struct reiserfs_transaction_handle th, struct inode inode, struct path path) { [...] / we cannot restart while nested / if (th->t_refcount > 1) { <<- Path is not released in this case! return 0; } pathrelse(path); <<- Path released here. [...] This could happen in such a situation : In reiserfs/inode.c: reiserfs_get_block() :: if (repeat == NO_DISK_SPACE \|\| repeat == QUOTA_EXCEEDED) { / restart the transaction to give the journal a chance to free some blocks. releases the path, so we have to go back to research if we succeed on the second try */ SB_JOURNAL(inode->i_sb)->j_next_async_flush = 1; -->> retval = restart_transaction(th, inode, &path); <<-- We are supposed to release the path, no matter we succeed or fail. But if the th->refcount is > 1, the path is still valid. And, if (retval) goto failure; repeat = _allocate_block(th, block, inode, &allocated_block_nr, NULL, create); If the above allocate_block fails with NO_DISK_SPACE or QUOTA_EXCEEDED, we would have path which is not released. if (repeat != NO_DISK_SPACE && repeat != QUOTA_EXCEEDED) { goto research; } if (repeat == QUOTA_EXCEEDED) retval = -EDQUOT; else retval = -ENOSPC; goto failure; [...] failure: [...] reiserfs_check_path(&path); << Panics here ! Attached here is a patch which could fix the issue. fix reiserfs/inode.c : restart_transaction() to release the path in all cases. The restart_transaction() doesn't release the path when the the journal handle has a refcount > 1. This would trigger a reiserfs_panic() if we encounter an -ENOSPC / -EDQUOT in reiserfs_get_block(). Signed-off-by: Suzuki K P <suzuki@in.ibm.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: <reiserfs-dev@namesys.com> Cc: Jeff Mahoney <jeffm@suse.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:32 -08:00
Eric Sesterhenn	585b7747d6	[PATCH] Remove unnecessary check in fs/reiserfs/inode.c Since all callers dereference dir, we dont need this check. Coverity id #337. Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Cc: Jeff Mahoney <jeffm@suse.com> Cc: <reiserfs-dev@namesys.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:14 -07:00
Alexey Dobriyan	cfe14677f2	[PATCH] reiserfs: ifdef ACL stuff from inode Shrink reiserfs inode more (by 8 bytes) for ACL non-users: -reiser_inode_cache 344 11 +reiser_inode_cache 336 11 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: <reiserfs-dev@namesys.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:11 -07:00
Alexey Dobriyan	068fbb315d	[PATCH] reiserfs: ifdef xattr_sem Shrink reiserfs inode by 12 bytes for xattr non-users (me). -reiser_inode_cache 356 11 +reiser_inode_cache 344 11 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: <reiserfs-dev@namesys.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:11 -07:00

1 2

76 Commits