linux

Commit Graph

Author	SHA1	Message	Date
Frederic Weisbecker	bbec919150	reiserfs: Fix vmalloc call under reiserfs lock Vmalloc is called to allocate journal->j_cnode_free_list but we hold the reiserfs lock at this time, which raises a {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} lock inversion. Just drop the reiserfs lock at this time, as it's not even needed but kept for paranoid reasons. This fixes: [ INFO: inconsistent lock state ] 2.6.33-rc5 #1 --------------------------------- inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. kswapd0/313 [HC0[0]:SC0[0]:HE1:SE1] takes: (&REISERFS_SB(s)->lock){+.+.?.}, at: [<c11118c8>] reiserfs_write_lock_once+0x28/0x50 {RECLAIM_FS-ON-W} state was registered at: [<c104ee32>] mark_held_locks+0x62/0x90 [<c104eefa>] lockdep_trace_alloc+0x9a/0xc0 [<c108f7b6>] kmem_cache_alloc+0x26/0xf0 [<c108621c>] __get_vm_area_node+0x6c/0xf0 [<c108690e>] __vmalloc_node+0x7e/0xa0 [<c1086aab>] vmalloc+0x2b/0x30 [<c110e1fb>] journal_init+0x6cb/0xa10 [<c10f90a2>] reiserfs_fill_super+0x342/0xb80 [<c1095665>] get_sb_bdev+0x145/0x180 [<c10f68e1>] get_super_block+0x21/0x30 [<c1094520>] vfs_kern_mount+0x40/0xd0 [<c1094609>] do_kern_mount+0x39/0xd0 [<c10aaa97>] do_mount+0x2c7/0x6d0 [<c10aaf06>] sys_mount+0x66/0xa0 [<c16198a7>] mount_block_root+0xc4/0x245 [<c1619a81>] mount_root+0x59/0x5f [<c1619b98>] prepare_namespace+0x111/0x14b [<c1619269>] kernel_init+0xcf/0xdb [<c100303a>] kernel_thread_helper+0x6/0x1c irq event stamp: 63236801 hardirqs last enabled at (63236801): [<c134e7fa>] __mutex_unlock_slowpath+0x9a/0x120 hardirqs last disabled at (63236800): [<c134e799>] __mutex_unlock_slowpath+0x39/0x120 softirqs last enabled at (63218800): [<c102f451>] __do_softirq+0xc1/0x110 softirqs last disabled at (63218789): [<c102f4ed>] do_softirq+0x4d/0x60 other info that might help us debug this: 2 locks held by kswapd0/313: #0: (shrinker_rwsem){++++..}, at: [<c1074bb4>] shrink_slab+0x24/0x170 #1: (&type->s_umount_key#19){++++..}, at: [<c10a2edd>] shrink_dcache_memory+0xfd/0x1a0 stack backtrace: Pid: 313, comm: kswapd0 Not tainted 2.6.33-rc5 #1 Call Trace: [<c134db2c>] ? printk+0x18/0x1c [<c104e7ef>] print_usage_bug+0x15f/0x1a0 [<c104ebcf>] mark_lock+0x39f/0x5a0 [<c104d66b>] ? trace_hardirqs_off+0xb/0x10 [<c1052c50>] ? check_usage_forwards+0x0/0xf0 [<c1050c24>] __lock_acquire+0x214/0xa70 [<c10438c5>] ? sched_clock_cpu+0x95/0x110 [<c10514fa>] lock_acquire+0x7a/0xa0 [<c11118c8>] ? reiserfs_write_lock_once+0x28/0x50 [<c134f03f>] mutex_lock_nested+0x5f/0x2b0 [<c11118c8>] ? reiserfs_write_lock_once+0x28/0x50 [<c11118c8>] ? reiserfs_write_lock_once+0x28/0x50 [<c11118c8>] reiserfs_write_lock_once+0x28/0x50 [<c10f05b0>] reiserfs_delete_inode+0x50/0x140 [<c10a653f>] ? generic_delete_inode+0x5f/0x150 [<c10f0560>] ? reiserfs_delete_inode+0x0/0x140 [<c10a657c>] generic_delete_inode+0x9c/0x150 [<c10a666d>] generic_drop_inode+0x3d/0x60 [<c10a5597>] iput+0x47/0x50 [<c10a2a4f>] dentry_iput+0x6f/0xf0 [<c10a2af4>] d_kill+0x24/0x50 [<c10a2d3d>] __shrink_dcache_sb+0x21d/0x2b0 [<c10a2f0f>] shrink_dcache_memory+0x12f/0x1a0 [<c1074c9e>] shrink_slab+0x10e/0x170 [<c1075177>] kswapd+0x477/0x6a0 [<c1072d10>] ? isolate_pages_global+0x0/0x1b0 [<c103e160>] ? autoremove_wake_function+0x0/0x40 [<c1074d00>] ? kswapd+0x0/0x6a0 [<c103de6c>] kthread+0x6c/0x80 [<c103de00>] ? kthread+0x0/0x80 [<c100303a>] kernel_thread_helper+0x6/0x1c Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Chris Mason <chris.mason@oracle.com>	2010-01-28 13:43:50 +01:00
Frederic Weisbecker	0523676d3f	reiserfs: Relax reiserfs lock while freeing the journal Keeping the reiserfs lock while freeing the journal on umount path triggers a lock inversion between bdev->bd_mutex and the reiserfs lock. We don't need the reiserfs lock at this stage. The filesystem is not usable anymore, and there are no more pending commits, everything got flushed (even this operation was done in parallel and didn't required the reiserfs lock from the current process). This fixes the following lockdep report: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-atom #172 ------------------------------------------------------- umount/3904 is trying to acquire lock: (&bdev->bd_mutex){+.+.+.}, at: [<c10de2c2>] __blkdev_put+0x22/0x160 but task is already holding lock: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1143279>] reiserfs_write_lock+0x29/0x40 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&REISERFS_SB(s)->lock){+.+.+.}: [<c105ea7f>] __lock_acquire+0x11ff/0x19e0 [<c105f2c8>] lock_acquire+0x68/0x90 [<c140199b>] mutex_lock_nested+0x5b/0x340 [<c1143229>] reiserfs_write_lock_once+0x29/0x50 [<c111c485>] reiserfs_get_block+0x85/0x1620 [<c10e1040>] do_mpage_readpage+0x1f0/0x6d0 [<c10e1640>] mpage_readpages+0xc0/0x100 [<c1119b89>] reiserfs_readpages+0x19/0x20 [<c108f1ec>] __do_page_cache_readahead+0x1bc/0x260 [<c108f2b8>] ra_submit+0x28/0x40 [<c1087e3e>] filemap_fault+0x40e/0x420 [<c109b5fd>] __do_fault+0x3d/0x430 [<c109d47e>] handle_mm_fault+0x12e/0x790 [<c1022a65>] do_page_fault+0x135/0x330 [<c1403663>] error_code+0x6b/0x70 [<c10ef9ca>] load_elf_binary+0x82a/0x1a10 [<c10ba130>] search_binary_handler+0x90/0x1d0 [<c10bb70f>] do_execve+0x1df/0x250 [<c1001746>] sys_execve+0x46/0x70 [<c1002fa5>] syscall_call+0x7/0xb -> #2 (&mm->mmap_sem){++++++}: [<c105ea7f>] __lock_acquire+0x11ff/0x19e0 [<c105f2c8>] lock_acquire+0x68/0x90 [<c109b1ab>] might_fault+0x8b/0xb0 [<c11b8f52>] copy_to_user+0x32/0x70 [<c10c3b94>] filldir64+0xa4/0xf0 [<c1109116>] sysfs_readdir+0x116/0x210 [<c10c3e1d>] vfs_readdir+0x8d/0xb0 [<c10c3ea9>] sys_getdents64+0x69/0xb0 [<c1002ec4>] sysenter_do_call+0x12/0x32 -> #1 (sysfs_mutex){+.+.+.}: [<c105ea7f>] __lock_acquire+0x11ff/0x19e0 [<c105f2c8>] lock_acquire+0x68/0x90 [<c140199b>] mutex_lock_nested+0x5b/0x340 [<c110951c>] sysfs_addrm_start+0x2c/0xb0 [<c1109aa0>] create_dir+0x40/0x90 [<c1109b1b>] sysfs_create_dir+0x2b/0x50 [<c11b2352>] kobject_add_internal+0xc2/0x1b0 [<c11b2531>] kobject_add_varg+0x31/0x50 [<c11b25ac>] kobject_add+0x2c/0x60 [<c1258294>] device_add+0x94/0x560 [<c11036ea>] add_partition+0x18a/0x2a0 [<c110418a>] rescan_partitions+0x33a/0x450 [<c10de5bf>] __blkdev_get+0x12f/0x2d0 [<c10de76a>] blkdev_get+0xa/0x10 [<c11034b8>] register_disk+0x108/0x130 [<c11a87a9>] add_disk+0xd9/0x130 [<c12998e5>] sd_probe_async+0x105/0x1d0 [<c10528af>] async_thread+0xcf/0x230 [<c104bfd4>] kthread+0x74/0x80 [<c1003aab>] kernel_thread_helper+0x7/0x3c -> #0 (&bdev->bd_mutex){+.+.+.}: [<c105f176>] __lock_acquire+0x18f6/0x19e0 [<c105f2c8>] lock_acquire+0x68/0x90 [<c140199b>] mutex_lock_nested+0x5b/0x340 [<c10de2c2>] __blkdev_put+0x22/0x160 [<c10de40a>] blkdev_put+0xa/0x10 [<c113ce22>] free_journal_ram+0xd2/0x130 [<c113ea18>] do_journal_release+0x98/0x190 [<c113eb2a>] journal_release+0xa/0x10 [<c1128eb6>] reiserfs_put_super+0x36/0x130 [<c10b776f>] generic_shutdown_super+0x4f/0xe0 [<c10b7825>] kill_block_super+0x25/0x40 [<c11255df>] reiserfs_kill_sb+0x7f/0x90 [<c10b7f4a>] deactivate_super+0x7a/0x90 [<c10cccd8>] mntput_no_expire+0x98/0xd0 [<c10ccfcc>] sys_umount+0x4c/0x310 [<c10cd2a9>] sys_oldumount+0x19/0x20 [<c1002ec4>] sysenter_do_call+0x12/0x32 other info that might help us debug this: 2 locks held by umount/3904: #0: (&type->s_umount_key#30){+++++.}, at: [<c10b7f45>] deactivate_super+0x75/0x90 #1: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1143279>] reiserfs_write_lock+0x29/0x40 stack backtrace: Pid: 3904, comm: umount Not tainted 2.6.32-atom #172 Call Trace: [<c13ff903>] ? printk+0x18/0x1a [<c105d33a>] print_circular_bug+0xca/0xd0 [<c105f176>] __lock_acquire+0x18f6/0x19e0 [<c108b66f>] ? free_pcppages_bulk+0x1f/0x250 [<c105f2c8>] lock_acquire+0x68/0x90 [<c10de2c2>] ? __blkdev_put+0x22/0x160 [<c10de2c2>] ? __blkdev_put+0x22/0x160 [<c140199b>] mutex_lock_nested+0x5b/0x340 [<c10de2c2>] ? __blkdev_put+0x22/0x160 [<c105c932>] ? mark_held_locks+0x62/0x80 [<c10afe12>] ? kfree+0x92/0xd0 [<c10de2c2>] __blkdev_put+0x22/0x160 [<c105cc3b>] ? trace_hardirqs_on+0xb/0x10 [<c10de40a>] blkdev_put+0xa/0x10 [<c113ce22>] free_journal_ram+0xd2/0x130 [<c113ea18>] do_journal_release+0x98/0x190 [<c113eb2a>] journal_release+0xa/0x10 [<c1128eb6>] reiserfs_put_super+0x36/0x130 [<c1050596>] ? up_write+0x16/0x30 [<c10b776f>] generic_shutdown_super+0x4f/0xe0 [<c10b7825>] kill_block_super+0x25/0x40 [<c10f41e0>] ? vfs_quota_off+0x0/0x20 [<c11255df>] reiserfs_kill_sb+0x7f/0x90 [<c10b7f4a>] deactivate_super+0x7a/0x90 [<c10cccd8>] mntput_no_expire+0x98/0xd0 [<c10ccfcc>] sys_umount+0x4c/0x310 [<c10cd2a9>] sys_oldumount+0x19/0x20 [<c1002ec4>] sysenter_do_call+0x12/0x32 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu>	2010-01-02 01:56:54 +01:00
Frederic Weisbecker	98ea3f50bc	reiserfs: Fix remaining in-reclaim-fs <-> reclaim-fs-on locking inversion Commit `500f5a0bf5` (reiserfs: Fix possible recursive lock) fixed a vmalloc under reiserfs lock that triggered a lockdep warning because of a IN-FS-RECLAIM <-> RECLAIM-FS-ON locking dependency inversion. But this patch has ommitted another vmalloc call in the same path that allocates the journal. Relax the lock for this one too. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu>	2009-12-29 22:34:59 +01:00
Frederic Weisbecker	48f6ba5e69	kill-the-bkl/reiserfs: fix reiserfs lock to cpu_add_remove_lock dependency While creating the reiserfs workqueue during the journal initialization, we are holding the reiserfs lock, but create_workqueue() also holds the cpu_add_remove_lock, creating then the following dependency: - reiserfs lock -> cpu_add_remove_lock But we also have the following existing dependencies: - mm->mmap_sem -> reiserfs lock - cpu_add_remove_lock -> cpu_hotplug.lock -> slub_lock -> sysfs_mutex The merged dependency chain then becomes: - mm->mmap_sem -> reiserfs lock -> cpu_add_remove_lock -> cpu_hotplug.lock -> slub_lock -> sysfs_mutex But when we fill a dir entry in sysfs_readir(), we are holding the sysfs_mutex and we also might fault while copying the directory entry to the user, leading to the following dependency: - sysfs_mutex -> mm->mmap_sem The end result is then a lock inversion between sysfs_mutex and mm->mmap_sem, as reported in the following lockdep warning: [ INFO: possible circular locking dependency detected ] 2.6.31-07095-g25a3912 #4 ------------------------------------------------------- udevadm/790 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<c1098942>] might_fault+0x72/0xc0 but task is already holding lock: (sysfs_mutex){+.+.+.}, at: [<c110813c>] sysfs_readdir+0x7c/0x260 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #5 (sysfs_mutex){+.+.+.}: [...] -> #4 (slub_lock){+++++.}: [...] -> #3 (cpu_hotplug.lock){+.+.+.}: [...] -> #2 (cpu_add_remove_lock){+.+.+.}: [...] -> #1 (&REISERFS_SB(s)->lock){+.+.+.}: [...] -> #0 (&mm->mmap_sem){++++++}: [...] This can be fixed by relaxing the reiserfs lock while creating the workqueue. This is fine to relax the lock here, we just keep it around to pass through reiserfs lock checks and for paranoid reasons. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Tested-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Laurent Riffard <laurent.riffard@free.fr>	2009-10-05 16:31:37 +02:00
Frederic Weisbecker	193be0ee17	kill-the-bkl/reiserfs: Fix induced mm->mmap_sem to sysfs_mutex dependency Alexander Beregalov reported the following warning: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.31-03149-gdcc030a #1 ------------------------------------------------------- udevadm/716 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<c107249a>] might_fault+0x4a/0xa0 but task is already holding lock: (sysfs_mutex){+.+.+.}, at: [<c10cb9aa>] sysfs_readdir+0x5a/0x200 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (sysfs_mutex){+.+.+.}: [...] -> #2 (&bdev->bd_mutex){+.+.+.}: [...] -> #1 (&REISERFS_SB(s)->lock){+.+.+.}: [...] -> #0 (&mm->mmap_sem){++++++}: [...] On reiserfs mount path, we take the reiserfs lock and while initializing the journal, we open the device, taking the bdev->bd_mutex. Then rescan_partition() may signal the change to sysfs. We have then the following dependency: reiserfs_lock -> bd_mutex -> sysfs_mutex Later, while entering reiserfs_readpage() after a pagefault in an mmaped reiserfs file, we are holding the mm->mmap_sem, and we are going to take the reiserfs lock too. We have then the following dependency: mm->mmap_sem -> reiserfs_lock which, expanded with the previous dependency gives us: mm->mmap_sem -> reiserfs_lock -> bd_mutex -> sysfs_mutex Now while entering the sysfs readdir path, we are holding the sysfs_mutex. And when we copy a directory entry to the user buffer, we might fault and then take the mm->mmap_sem lock. Which leads to the circular locking dependency reported. We can fix that by relaxing the reiserfs lock during the call to journal_init_dev(), which is the place where we open the mounted device. This is fine to relax the lock here because we are in the begining of the reiserfs mount path and there is nothing to protect at this time, the journal is not intialized. We just keep this lock around for paranoid reasons. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Tested-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Laurent Riffard <laurent.riffard@free.fr>	2009-09-17 05:31:37 +02:00
Frederic Weisbecker	c72e05756b	kill-the-bkl/reiserfs: acquire the inode mutex safely While searching a pathname, an inode mutex can be acquired in do_lookup() which calls reiserfs_lookup() which in turn acquires the write lock. On the other side reiserfs_fill_super() can acquire the write_lock and then call reiserfs_lookup_privroot() which can acquire an inode mutex (the root of the mount point). So we theoretically risk an AB - BA lock inversion that could lead to a deadlock. As for other lock dependencies found since the bkl to mutex conversion, the fix is to use reiserfs_mutex_lock_safe() which drops the lock dependency to the write lock. [ Impact: fix a possible deadlock with reiserfs ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-14 07:18:24 +02:00
Frederic Weisbecker	c63e3c0b24	kill-the-bkl/reiserfs: use mutex_lock in reiserfs_mutex_lock_safe reiserfs_mutex_lock_safe() is a hack to avoid any dependency between an internal reiserfs mutex and the write lock, it has been proposed to follow the old bkl logic. The code does the following: while (!mutex_trylock(m)) { reiserfs_write_unlock(s); schedule(); reiserfs_write_lock(s); } It then imitate the implicit behaviour of the lock when it was a Bkl and hadn't such dependency: mutex_lock(m) { if (fastpath) let's go else { wait_for_mutex() { schedule() { unlock_kernel() reacquire_lock_kernel() } } } } The problem is that by using such explicit schedule(), we don't benefit of the adaptive mutex spinning on owner. The logic in use now is: reiserfs_write_unlock(s); mutex_lock(m); // -> possible adaptive spinning reiserfs_write_lock(s); [ Impact: restore the use of adaptive spinning mutexes in reiserfs ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-14 07:18:21 +02:00
Frederic Weisbecker	6e3647acb4	kill-the-BKL/reiserfs: release the write lock on flush_commit_list() flush_commit_list() uses ll_rw_block() to commit the pending log blocks. ll_rw_block() might sleep, and the bkl was released at this point. Then we can also relax the write lock at this point. [ Impact: release the reiserfs write lock when it is not needed ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-14 07:18:13 +02:00
Frederic Weisbecker	e6950a4da3	kill-the-BKL/reiserfs: release the write lock before rescheduling on do_journal_end() When do_journal_end() copies data to the journal blocks buffers in memory, it reschedules if needed between each block copied and dirtyfied. We can also release the write lock at this rescheduling stage, like did the bkl implicitly. [ Impact: release the reiserfs write lock when it is not needed ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-14 07:18:08 +02:00
Frederic Weisbecker	a412f9efdd	reiserfs, kill-the-BKL: fix unsafe j_flush_mutex lock Impact: fix a deadlock The j_flush_mutex is acquired safely in journal.c: if we can't take it, we free the reiserfs per superblock lock and wait a bit. But we have a remaining place in kupdate_transactions() where j_flush_mutex is still acquired traditionnaly. Thus the following scenario (warned by lockdep) can happen: A B mutex_lock(&write_lock) mutex_lock(&write_lock) mutex_lock(&j_flush_mutex) mutex_lock(&j_flush_mutex) //block mutex_unlock(&write_lock) sleep... mutex_lock(&write_lock) //deadlock Fix this by using reiserfs_mutex_lock_safe() in kupdate_transactions(). Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@texware.it> Cc: Jeff Mahoney <jeffm@suse.com> LKML-Reference: <1239660635-12940-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-14 07:18:01 +02:00
Frederic Weisbecker	8ebc423238	reiserfs: kill-the-BKL This patch is an attempt to remove the Bkl based locking scheme from reiserfs and is intended. It is a bit inspired from an old attempt by Peter Zijlstra: http://lkml.indiana.edu/hypermail/linux/kernel/0704.2/2174.html The bkl is heavily used in this filesystem to prevent from concurrent write accesses on the filesystem. Reiserfs makes a deep use of the specific properties of the Bkl: - It can be acqquired recursively by a same task - It is released on the schedule() calls and reacquired when schedule() returns The two properties above are a roadmap for the reiserfs write locking so it's very hard to simply replace it with a common mutex. - We need a recursive-able locking unless we want to restructure several blocks of the code. - We need to identify the sites where the bkl was implictly relaxed (schedule, wait, sync, etc...) so that we can in turn release and reacquire our new lock explicitly. Such implicit releases of the lock are often required to let other resources producer/consumer do their job or we can suffer unexpected starvations or deadlocks. So the new lock that replaces the bkl here is a per superblock mutex with a specific property: it can be acquired recursively by a same task, like the bkl. For such purpose, we integrate a lock owner and a lock depth field on the superblock information structure. The first axis on this patch is to turn reiserfs_write_(un)lock() function into a wrapper to manage this mutex. Also some explicit calls to lock_kernel() have been converted to reiserfs_write_lock() helpers. The second axis is to find the important blocking sites (schedule...(), wait_on_buffer(), sync_dirty_buffer(), etc...) and then apply an explicit release of the write lock on these locations before blocking. Then we can safely wait for those who can give us resources or those who need some. Typically this is a fight between the current writer, the reiserfs workqueue (aka the async commiter) and the pdflush threads. The third axis is a consequence of the second. The write lock is usually on top of a lock dependency chain which can include the journal lock, the flush lock or the commit lock. So it's dangerous to release and trying to reacquire the write lock while we still hold other locks. This is fine with the bkl: T1 T2 lock_kernel() mutex_lock(A) unlock_kernel() // do something lock_kernel() mutex_lock(A) -> already locked by T1 schedule() (and then unlock_kernel()) lock_kernel() mutex_unlock(A) .... This is not fine with a mutex: T1 T2 mutex_lock(write) mutex_lock(A) mutex_unlock(write) // do something mutex_lock(write) mutex_lock(A) -> already locked by T1 schedule() mutex_lock(write) -> already locked by T2 deadlock The solution in this patch is to provide a helper which releases the write lock and sleep a bit if we can't lock a mutex that depend on it. It's another simulation of the bkl behaviour. The last axis is to locate the fs callbacks that are called with the bkl held, according to Documentation/filesystem/Locking. Those are: - reiserfs_remount - reiserfs_fill_super - reiserfs_put_super Reiserfs didn't need to explicitly lock because of the context of these callbacks. But now we must take care of that with the new locking. After this patch, reiserfs suffers from a slight performance regression (for now). On UP, a high volume write with dd reports an average of 27 MB/s instead of 30 MB/s without the patch applied. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Ingo Molnar <mingo@elte.hu> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Bron Gondwana <brong@fastmail.fm> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> LKML-Reference: <1239070789-13354-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-14 07:17:59 +02:00
Jens Axboe	8aa7e847d8	Fix congestion_wait() sync/async vs read/write confusion Commit `1faa16d228` accidentally broke the bdi congestion wait queue logic, causing us to wait on congestion for WRITE (== 1) when we really wanted BLK_RW_ASYNC (== 0) instead. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-07-10 20:31:53 +02:00
Jeff Mahoney	a9dd364358	reiserfs: rename p_s_sb to sb This patch is a simple s/p_s_sb/sb/g to the reiserfs code. This is the first in a series of patches to rip out some of the awful variable naming in reiserfs. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:39 -07:00
Jeff Mahoney	0222e6571c	reiserfs: strip trailing whitespace This patch strips trailing whitespace from the reiserfs code. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:39 -07:00
Jeff Mahoney	32e8b10629	reiserfs: rearrange journal abort This patch kills off reiserfs_journal_abort as it is never called, and combines __reiserfs_journal_abort_{soft,hard} into one function called reiserfs_abort_journal, which performs the same work. It is silent as opposed to the old version, since the message was always issued after a regular 'abort' message. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:36 -07:00
Jeff Mahoney	c3a9c2109f	reiserfs: rework reiserfs_panic ReiserFS panics can be somewhat inconsistent. In some cases: * a unique identifier may be associated with it * the function name may be included * the device may be printed separately This patch aims to make warnings more consistent. reiserfs_warning() prints the device name, so printing it a second time is not required. The function name for a warning is always helpful in debugging, so it is now automatically inserted into the output. Hans has stated that every warning should have a unique identifier. Some cases lack them, others really shouldn't have them. reiserfs_warning() now expects an id associated with each message. In the rare case where one isn't needed, "" will suffice. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:36 -07:00
Jeff Mahoney	45b03d5e8e	reiserfs: rework reiserfs_warning ReiserFS warnings can be somewhat inconsistent. In some cases: * a unique identifier may be associated with it * the function name may be included * the device may be printed separately This patch aims to make warnings more consistent. reiserfs_warning() prints the device name, so printing it a second time is not required. The function name for a warning is always helpful in debugging, so it is now automatically inserted into the output. Hans has stated that every warning should have a unique identifier. Some cases lack them, others really shouldn't have them. reiserfs_warning() now expects an id associated with each message. In the rare case where one isn't needed, "" will suffice. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:36 -07:00
Jeff Mahoney	600ed41675	reiserfs: audit transaction ids to always be unsigned ints This patch fixes up the reiserfs code such that transaction ids are always unsigned ints. In places they can currently be signed ints or unsigned longs. The former just causes an annoying clm-2200 warning and may join a transaction when it should wait. The latter is just for correctness since the disk format uses a 32-bit transaction id. There aren't any runtime problems that result from it not wrapping at the correct location since the value is truncated correctly even on big endian systems. The 0 value might make it to disk, but the mount-time checks will bump it to 10 itself. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-30 12:16:35 -07:00
Al Viro	e5eb8caa83	[PATCH] remember mode of reiserfs journal Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:49:04 -04:00
Al Viro	30c40d2c01	[PATCH] propagate mode through open_bdev_excl/close_bdev_excl replace open_bdev_excl/close_bdev_excl with variants taking fmode_t. superblock gets the value used to mount it stored in sb->s_mode Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:49:00 -04:00
Al Viro	9a1c354276	[PATCH] pass fmode_t to blkdev_put() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:48:58 -04:00
Al Viro	aeb5d72706	[PATCH] introduce fmode_t, do annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:06 -04:00
Nick Piggin	ca5de404ff	fs: rename buffer trylock Like the page lock change, this also requires name change, so convert the raw test_and_set bitop to a trylock. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-04 21:56:09 -07:00
Nick Piggin	529ae9aaa0	mm: rename page trylock Converting page lock to new locking bitops requires a change of page flag operation naming, so we might as well convert it to something nicer (!TestSetPageLocked_Lock => trylock_page, SetPageLocked => set_page_locked). This also facilitates lockdeping of page lock. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-04 21:31:34 -07:00
Jeff Mahoney	90415deac7	reiserfs: convert j_commit_lock to mutex j_commit_lock is a semaphore but uses it as if it were a mutex. This patch converts it to a mutex. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Chris Mason <chris.mason@oracle.com> Cc: Edward Shishkin <edward.shishkin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:33 -07:00
Jeff Mahoney	afe7025907	reiserfs: convert j_flush_sem to mutex j_flush_sem is a semaphore but uses it as if it were a mutex. This patch converts it to a mutex. [akpm@linux-foundation.org: fix mutex_trylock retval treatment] Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Chris Mason <chris.mason@oracle.com> Cc: Edward Shishkin <edward.shishkin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:33 -07:00
Jeff Mahoney	f68215c464	reiserfs: convert j_lock to mutex j_lock is a semaphore but uses it as if it were a mutex. This patch converts it to a mutex. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Chris Mason <chris.mason@oracle.com> Cc: Edward Shishkin <edward.shishkin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:33 -07:00
Christoph Hellwig	86098fa011	reiserfs: use open_bdev_excl Use the proper helper to open a blockdevice by name for filesystem use, this makes sure it's properly claimed (also added for open-by-number) and gets rid of the struct file abuse. Tested by mounting a reiserfs filesystem with external journal. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jeff Mahoney <jeffm@suse.com> Acked-by: Edward Shishkin <edward.shishkin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-30 08:29:51 -07:00
Harvey Harrison	fbe5498b3d	reiserfs: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:46 -07:00
Harvey Harrison	e13601bc6a	reiserfs: fix sparse warning in journal.c fs/reiserfs/journal.c:4319:2: warning: returning void-valued expression Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:46 -07:00
Matthew Wilcox	6188e10d38	Convert asm/semaphore.h users to linux/semaphore.h Signed-off-by: Matthew Wilcox <willy@linux.intel.com>	2008-04-18 22:22:54 -04:00
Jeff Mahoney	cb680c1be6	reiserfs: ignore on disk s_bmap_nr value Implement support for file systems larger than 8 TiB. The reiserfs superblock contains a 16 bit value for counting the number of bitmap blocks. The rest of the disk format supports file systems up to 2^32 blocks, but the bitmap block limitation artificially limits this to 8 TiB with a 4KiB block size. Rather than trust the superblock's 16-bit bitmap block count, we calculate it dynamically based on the number of blocks in the file system. When an incorrect value is observed in the superblock, it is zeroed out, ensuring that older kernels will not be able to mount the file system. Userspace support has already been implemented and shipped in reiserfsprogs 3.6.20. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:35 -07:00
Jeff Mahoney	3ee1667042	reiserfs: fix usage of signed ints for block numbers Do a quick signedness check for block numbers. There are a number of places where signed integers are used for block numbers, which limits the usable file system size to 8 TiB. The disk format, excepting a problem which will be fixed in the following patch, supports file systems up to 16 TiB in size. This patch cleans up those sites so that we can enable the full usable size. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:35 -07:00
Edward Shishkin	cf3d0b8182	reiserfs: do not repair wrong journal params When mounting a file system with wrong journal params do not try to repair them, suggest fsck instead. Signed-off-by: Edward Shishkin <edward@namesys.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:01 -07:00
Chris Mason	398c95bdf2	try to reap reiserfs pages left around by invalidatepage reiserfs_invalidatepage will refuse to free pages if they have been logged in data=journal mode, or were pinned down by a data=ordered operation. For data=journal, this is fairly easy to trigger just with fsx-linux, and it results in a large number of pages hanging around on the LRUs with page->mapping == NULL. Calling try_to_free_buffers when reiserfs decides it is done with the page allows it to be freed earlier, and with much less VM thrashing. Lock ordering rules mean that reiserfs can't call lock_page when it is releasing the buffers, so TestSetPageLocked is used instead. Contention on these pages should be rare, so it should be sufficient most of the time. Signed-off-by: Chris Mason <chris.mason@oracle.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:57 -07:00
Adrian Bunk	deba0f49b9	fs/reiserfs/: cleanups - remove the following no longer used functions: - bitmap.c: reiserfs_claim_blocks_to_be_allocated() - bitmap.c: reiserfs_release_claimed_blocks() - bitmap.c: reiserfs_can_fit_pages() - make the following functions static: - inode.c: restart_transaction() - journal.c: reiserfs_async_progress_wait() Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Vladimir V. Saveliev <vs@namesys.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:46 -07:00
Robert P. J. Day	beb7dd86a1	Fix misspellings collected by members of KJ list. Fix the misspellings of "propogate", "writting" and (oh, the shame :-) "kenrel" in the source tree. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-05-09 07:14:03 +02:00
Milind Arun Choudhary	5ab2f7e0fd	reiserfs: use __set_current_state() use __set_current_state(TASK_) instead of current->state = TASK_, in fs/reiserfs Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com> Cc: <reiserfs-dev@namesys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:13 -07:00
David Howells	4c1ac1b491	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/infiniband/core/iwcm.c drivers/net/chelsio/cxgb2.c drivers/net/wireless/bcm43xx/bcm43xx_main.c drivers/net/wireless/prism54/islpci_eth.c drivers/usb/core/hub.h drivers/usb/input/hid-core.c net/core/netpoll.c Fix up merge failures with Linus's head and fix new compilation failures. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-12-05 14:37:56 +00:00
Matt LaPlante	0779bf2d2e	Fix misc .c/.h comment typos Fix various .c/.h typos in comments (no code changes). Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-11-30 05:24:39 +01:00
David Howells	c4028958b6	WorkStruct: make allyesconfig Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:57:56 +00:00
Andrew Morton	3fcfab16c5	[PATCH] separate bdi congestion functions from queue congestion functions Separate out the concept of "queue congestion" from "backing-dev congestion". Congestion is a backing-dev concept, not a queue concept. The blk_* congestion functions are retained, as wrappers around the core backing-dev congestion functions. This proper layering is needed so that NFS can cleanly use the congestion functions, and so that CONFIG_BLOCK=n actually links. Cc: "Thomas Maier" <balagi@justmail.de> Cc: "Jens Axboe" <jens.axboe@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: David Howells <dhowells@redhat.com> Cc: Peter Osterlund <petero2@telia.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-20 10:26:35 -07:00
Eric Sesterhenn	14a61442c2	BUG_ON conversion for fs/reiserfs This patch converts several if () BUG(); construct to BUG_ON(); which occupies less space, uses unlikely and is safer when BUG() is disabled. S_ISREG() has no side effects, so the conversion is safe. Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-10-03 23:36:38 +02:00
Chris Mason	a317202714	[PATCH] Fix reiserfs latencies caused by data=ordered ReiserFS does periodic cleanup of old transactions in order to limit the length of time a journal replay may take after a crash. Sometimes, writing metadata from an old (already committed) transaction may require committing a newer transaction, which also requires writing all data=ordered buffers. This can cause very long stalls on journal_begin. This patch makes sure new transactions will not need to be committed before trying a periodic reclaim of an old transaction. It is low risk because if a bad decision is made, it just means a slightly longer journal replay after a crash. Signed-off-by: Chris Mason <mason@suse.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:11 -07:00
Jörn Engel	6ab3d5624e	Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-30 19:25:36 +02:00
Akinobu Mita	f116629d03	[PATCH] fs: use list_move() This patch converts the combination of list_del(A) and list_add(A, B) to list_move(A, B) under fs/. Cc: Ian Kent <raven@themaw.net> Acked-by: Joel Becker <joel.becker@oracle.com> Cc: Neil Brown <neilb@cse.unsw.edu.au> Cc: Hans Reiser <reiserfs-dev@namesys.com> Cc: Urban Widmark <urban@teststation.com> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:18 -07:00
Alexander Zarochentsev	a44c94a7b8	[PATCH] reiserfs: handle trans_id overflow Reiserfs does not handle transaction ID overflow correctly. Transaction ID == 0 causes reiserfs to crash. The patch fixes all places where the transaction ID is incremented. Signed-off-by: Alexander Zarochentsev <zam@namesys.com> Signed-off-by: Hans Reiser <reiser@namesys.com> Cc: Chris Mason <mason@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-25 08:22:51 -08:00
Vladimir V. Saveliev	c499ec24c3	[PATCH] reiserfs: do not check if unsigned < 0 This patch fixes bugs in reiserfs where unsigned integers were checked whether they are less then 0. Signed-off-by: Vladimir V. Saveliev <vs@namesys.com> Cc: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Hans Reiser <reiser@namesys.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-02 08:33:08 -08:00
Chris Mason	6ae1ea447d	[PATCH] reiserfs: reiserfs fix journal accounting in journal_transaction_should_end reiserfs: journal_transaction_should_end should increase the count of blocks allocated so the transaction subsystem can keep new writers from creating a transaction that is too large. Signed-off-by: Chris Mason <mason@suse.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-02-01 08:53:26 -08:00
Chris Mason	3d4492f81d	[PATCH] reiserfs: reiserfs write_ordered_buffers should not oops on dirty non-uptodate bh write_ordered_buffers should handle dirty non-uptodate buffers without a BUG() Signed-off-by: Chris Mason <mason@suse.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-02-01 08:53:26 -08:00

1 2

66 Commits