wake_up_interruptible/poll_wait provide sufficient barriers;
just use ACCESS_ONCE() to fetch ns->event and that's it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If the underlying dentry doesn't have ->d_revalidate(), there's no need to
force dropping out of RCU mode. All we need for that is to make freeing
ecryptfs_dentry_info RCU-delayed.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull btrfs fixes from Chris Mason:
"This is a small collection of fixes, including a regression fix from
Liu Bo that solves rare crashes with compression on.
I've merged my for-linus up to 3.12-rc3 because the top commit is only
meant for 3.12. The rest of the fixes are also available in my master
branch on top of my last 3.11 based pull"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: Fix crash due to not allocating integrity data for a bioset
Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing
Btrfs: eliminate races in worker stopping code
Btrfs: fix crash of compressed writes
Btrfs: fix transid verify errors when recovering log tree
Pull CIFS fixes from Steve French:
"Small set of cifs fixes. Most important is Jeff's fix that works
around disconnection problems which can be caused by simultaneous use
of user space tools (starting a long running smbclient backup then
doing a cifs kernel mount) or multiple cifs mounts through a NAT, and
Jim's fix to deal with reexport of cifs share.
I expect to send two more cifs fixes next week (being tested now) -
fixes to address an SMB2 unmount hang when server dies and a fix for
cifs symlink handling of Windows "NFS" symlinks"
* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
[CIFS] update cifs.ko version
[CIFS] Remove ext2 flags that have been moved to fs.h
[CIFS] Provide sane values for nlink
cifs: stop trying to use virtual circuits
CIFS: FS-Cache: Uncache unread pages in cifs_readpages() before freeing them
- lockdep fix for project quotas
- fix for dirent dtype support on v4 filesystems
- fix for a memory leak in recovery
- fix for build failure due to the recovery fix
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJSTw/vAAoJENaLyazVq6ZOo2EP/RjhwaDqDZHB5bm/axZrtxP6
g31TGvJ+nCUT6JjYX2wnoFuJDT2SDcs5+2gtjk1DRLb3JRQI2uJ+MtHLjDIZJSvE
sMAADOgWvTuzx3TsnR4U0MM1/XVnv99k1vinedD6mGq16QtT0OWYsA9AKkMKWd1o
OiTGyX4AMCNtfAZkiH9+OR8+BqH1xEEzv28H/Bf7yLSsQHM+v9uKPC5+f7I8bWvB
YK8fAxeGmiAfDGR4tQ+tQVoIj3qrJmPyj45ElwAvGCKbOh0LG4/N+dwaCQme0teW
xFfXMF+C/94qDom3z0gYAWzSOixgTFmy6gxt+3Mqw7uZ/dNzO+KeKE5Fm8cG11yD
y3vxqwav/fLHv1fRUvl5abrAzl5VU8nRAbeQqZBM0xjzgfilMp5Jk2Jvix8OHcO5
edmb7+CkkGdiYD15cSUl2242qKaukB3K1vrHoOlFte42vxELmcHWBRBxuZe8rgV1
czf2xCHkWWjdwUrFeZoxVSEFydfoGIW0clAz8tHPQpVyvnSjRTuugJ8wuN92NyNF
xGS5er0lyCqlBCBVCOZX/xTcwSQZ4UNG8qgdzDT26VN1VpTFeaaJlMRwD2GhYMYk
8eYX3Ie/XdECLn5ZaG4xWEJHLarXLcqUI6eMobjkVs+qt/FQl/PzH76qOcZWKKbf
kEOhPA1Gh97SZ66+vqaw
=eNZa
-----END PGP SIGNATURE-----
Merge tag 'xfs-for-linus-v3.12-rc4' of git://oss.sgi.com/xfs/xfs
Pull xfs bugfixes from Ben Myers:
"There are lockdep annotations for project quotas, a fix for dirent
dtype support on v4 filesystems, a fix for a memory leak in recovery,
and a fix for the build error that resulted from it. D'oh"
* tag 'xfs-for-linus-v3.12-rc4' of git://oss.sgi.com/xfs/xfs:
xfs: Use kmem_free() instead of free()
xfs: fix memory leak in xlog_recover_add_to_trans
xfs: dirent dtype presence is dependent on directory magic numbers
xfs: lockdep needs to know about 3 dquot-deep nesting
free_device rcu callback, scheduled from btrfs_rm_dev_replace_srcdev,
can be processed before btrfs_scratch_superblock is called, which would
result in a use-after-free on btrfs_device contents. Fix this by
zeroing the superblock before the rcu callback is registered.
Cc: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
The current implementation of worker threads in Btrfs has races in
worker stopping code, which cause all kinds of panics and lockups when
running btrfs/011 xfstest in a loop. The problem is that
btrfs_stop_workers is unsynchronized with respect to check_idle_worker,
check_busy_worker and __btrfs_start_workers.
E.g., check_idle_worker race flow:
btrfs_stop_workers(): check_idle_worker(aworker):
- grabs the lock
- splices the idle list into the
working list
- removes the first worker from the
working list
- releases the lock to wait for
its kthread's completion
- grabs the lock
- if aworker is on the working list,
moves aworker from the working list
to the idle list
- releases the lock
- grabs the lock
- puts the worker
- removes the second worker from the
working list
......
btrfs_stop_workers returns, aworker is on the idle list
FS is umounted, memory is freed
......
aworker is waken up, fireworks ensue
With this applied, I wasn't able to trigger the problem in 48 hours,
whereas previously I could reliably reproduce at least one of these
races within an hour.
Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
The crash[1] is found by xfstests/generic/208 with "-o compress",
it's not reproduced everytime, but it does panic.
The bug is quite interesting, it's actually introduced by a recent commit
(573aecafca,
Btrfs: actually limit the size of delalloc range).
Btrfs implements delay allocation, so during writeback, we
(1) get a page A and lock it
(2) search the state tree for delalloc bytes and lock all pages within the range
(3) process the delalloc range, including find disk space and create
ordered extent and so on.
(4) submit the page A.
It runs well in normal cases, but if we're in a racy case, eg.
buffered compressed writes and aio-dio writes,
sometimes we may fail to lock all pages in the 'delalloc' range,
in which case, we need to fall back to search the state tree again with
a smaller range limit(max_bytes = PAGE_CACHE_SIZE - offset).
The mentioned commit has a side effect, that is, in the fallback case,
we can find delalloc bytes before the index of the page we already have locked,
so we're in the case of (delalloc_end <= *start) and return with (found > 0).
This ends with not locking delalloc pages but making ->writepage still
process them, and the crash happens.
This fixes it by just thinking that we find nothing and returning to caller
as the caller knows how to deal with it properly.
[1]:
------------[ cut here ]------------
kernel BUG at mm/page-writeback.c:2170!
[...]
CPU: 2 PID: 11755 Comm: btrfs-delalloc- Tainted: G O 3.11.0+ #8
[...]
RIP: 0010:[<ffffffff810f5093>] [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83
[...]
[ 4934.248731] Stack:
[ 4934.248731] ffff8801477e5dc8 ffffea00049b9f00 ffff8801869f9ce8 ffffffffa02b841a
[ 4934.248731] 0000000000000000 0000000000000000 0000000000000fff 0000000000000620
[ 4934.248731] ffff88018db59c78 ffffea0005da8d40 ffffffffa02ff860 00000001810016c0
[ 4934.248731] Call Trace:
[ 4934.248731] [<ffffffffa02b841a>] extent_range_clear_dirty_for_io+0xcf/0xf5 [btrfs]
[ 4934.248731] [<ffffffffa02a8889>] compress_file_range+0x1dc/0x4cb [btrfs]
[ 4934.248731] [<ffffffff8104f7af>] ? detach_if_pending+0x22/0x4b
[ 4934.248731] [<ffffffffa02a8bad>] async_cow_start+0x35/0x53 [btrfs]
[ 4934.248731] [<ffffffffa02c694b>] worker_loop+0x14b/0x48c [btrfs]
[ 4934.248731] [<ffffffffa02c6800>] ? btrfs_queue_worker+0x25c/0x25c [btrfs]
[ 4934.248731] [<ffffffff810608f5>] kthread+0x8d/0x95
[ 4934.248731] [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43
[ 4934.248731] [<ffffffff814fe09c>] ret_from_fork+0x7c/0xb0
[ 4934.248731] [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43
[ 4934.248731] Code: ff 85 c0 0f 94 c0 0f b6 c0 59 5b 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 2c de 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 52 49 8b 84 24 80 00 00 00 f6 40 20 01 75 44
[ 4934.248731] RIP [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83
[ 4934.248731] RSP <ffff8801869f9c48>
[ 4934.280307] ---[ end trace 36f06d3f8750236a ]---
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
If we crash with a log, remount and recover that log, and then crash before we
can commit another transaction we will get transid verify errors on the next
mount. This is because we were not zero'ing out the log when we committed the
transaction after recovery. This is ok as long as we commit another transaction
at some point in the future, but if you abort or something else goes wrong you
can end up in this weird state because the recovery stuff says that the tree log
should have a generation+1 of the super generation, which won't be the case of
the transaction that was started for recovery. Fix this by removing the check
and _always_ zero out the log portion of the super when we commit a transaction.
This fixes the transid verify issues I was seeing with my force errors tests.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
This fixes a build failure caused by calling the free() function which
does not exist in the Linux kernel.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit aaaae98022)
Free the memory in error path of xlog_recover_add_to_trans().
Normally this memory is freed in recovery pass2, but is leaked
in the error path.
Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 519ccb81ac)
The determination of whether a directory entry contains a dtype
field originally was dependent on the filesystem having CRCs
enabled. This meant that the format for dtype beign enabled could be
determined by checking the directory block magic number rather than
doing a feature bit check. This was useful in that it meant that we
didn't need to pass a struct xfs_mount around to functions that
were already supplied with a directory block header.
Unfortunately, the introduction of dtype fields into the v4
structure via a feature bit meant this "use the directory block
magic number" method of discriminating the dirent entry sizes is
broken. Hence we need to convert the places that use magic number
checks to use feature bit checks so that they work correctly and not
by chance.
The current code works on v4 filesystems only because the dirent
size roundup covers the extra byte needed by the dtype field in the
places where this problem occurs.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 367993e7c6)
Michael Semon reported that xfs/299 generated this lockdep warning:
=============================================
[ INFO: possible recursive locking detected ]
3.12.0-rc2+ #2 Not tainted
---------------------------------------------
touch/21072 is trying to acquire lock:
(&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
but task is already holding lock:
(&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&xfs_dquot_other_class);
lock(&xfs_dquot_other_class);
*** DEADLOCK ***
May be due to missing lock nesting notation
7 locks held by touch/21072:
#0: (sb_writers#10){++++.+}, at: [<c11185b6>] mnt_want_write+0x1e/0x3e
#1: (&type->i_mutex_dir_key#4){+.+.+.}, at: [<c11078ee>] do_last+0x245/0xe40
#2: (sb_internal#2){++++.+}, at: [<c122c9e0>] xfs_trans_alloc+0x1f/0x35
#3: (&(&ip->i_lock)->mr_lock/1){+.+...}, at: [<c126cd1b>] xfs_ilock+0x100/0x1f1
#4: (&(&ip->i_lock)->mr_lock){++++-.}, at: [<c126cf52>] xfs_ilock_nowait+0x105/0x22f
#5: (&dqp->q_qlock){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
#6: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
The lockdep annotation for dquot lock nesting only understands
locking for user and "other" dquots, not user, group and quota
dquots. Fix the annotations to match the locking heirarchy we now
have.
Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit f112a04971)
Pull fuse bugfixes from Miklos Szeredi:
"This contains two more fixes by Maxim for writeback/truncate races and
fixes for RCU walk in fuse_dentry_revalidate()"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: no RCU mode in fuse_access()
fuse: readdirplus: fix RCU walk
fuse: don't check_submounts_and_drop() in RCU walk
fuse: fix fallocate vs. ftruncate race
fuse: wait for writeback in fuse_file_fallocate()
Pull vfs lru leak fix from Al Viro:
"The fix in "super: fix for destroy lrus" didn't - they need to be
destroyed, all right, but that's the wrong place..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs/super.c: fix lru_list leak for real
Freeing ->s_{inode,dentry}_lru in deactivate_locked_super() is wrong;
the right place is destroy_super(). As it is, we leak them if sget()
decides that new superblock it has allocated (and never shown to
anybody) isn't needed and should be freed.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Doing dput(parent) is not valid in RCU walk mode. In RCU mode it would
probably be okay to update the parent flags, but it's actually not
necessary most of the time...
So only set the FUSE_I_ADVISE_RDPLUS flag on the parent when the entry was
recently initialized by READDIRPLUS.
This is achieved by setting FUSE_I_INIT_RDPLUS on entries added by
READDIRPLUS and only dropping out of RCU mode if this flag is set.
FUSE_I_INIT_RDPLUS is cleared once the FUSE_I_ADVISE_RDPLUS flag is set in
the parent.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: stable@vger.kernel.org
If revalidate finds an invalid dentry in RCU walk mode, let the VFS deal
with it instead of calling check_submounts_and_drop() which is not prepared
for being called from RCU walk.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: stable@vger.kernel.org
- Stable fix for Oopses in the pNFS files layout driver
- Fix a regression when doing a non-exclusive file create on NFSv4.x
- NFSv4.1 security negotiation fixes when looking up the root filesystem
- Fix a memory ordering issue in the pNFS files layout driver
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJSSfNNAAoJEGcL54qWCgDybGYQAJGm4/vd7/rWZ49KIjGFGkFo
sCt0UOK6Y6ALhUOIlIreXsQ+Iwn9aAoIIRgx8UwnB+hO6PGnSyFuJZZx1KE8V2kj
6JlE5FbsWV+3uFQzNJQsNcoj7NZMzIRZT7x+7QansBOdSQjgQc3ig2sAMWREZjn8
GxMOl8FNRrnP8gRom30ZScgMp1YDM8J1ql80S/nbxh2NOLBsvgg9VapzJhhqkMyl
b7WKX4Qbg4AeSaxIAIrIwcZ7L2YS09JGC40VSybQARs0/7J8fjOZPs7CmrUCoB5F
DmT5vfEC4+dqDf8PMyoFVfxK5ua5Sb/FGQmagYYa8bSgY7Uq03akYI++co+4PZU1
f3SN6CSvVffzGMdXAhUupOZQbkKvKFxR2MTGy8s7dxdkQudd4RioYPDmLfCHlbmb
VY5kFh/Duqso1FCrcfvZoC88ElrWUz5yoVzZyECOEwCs1wjI6bjmGdSqCSbU75Lm
Z0XOAn1cStwFvGwCbGZPUzlvueji3coDdCFPBXAOFHzisLYoo/Lxenw7l5D1qM5b
02iZllcIo340vw8wxHZxVebecFo33P90X1gjv0HQQkV/6EeNgq4D47SWTPxRq3Ai
Dl9MFjTPl51oseDLrH6I/hBvcqjksB1M1+WjifT0bCIi3Y0HAea2U0wgweHS3vAd
QHqIpIJxNHDjPBMDWEZW
=ScfI
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- Stable fix for Oopses in the pNFS files layout driver
- Fix a regression when doing a non-exclusive file create on NFSv4.x
- NFSv4.1 security negotiation fixes when looking up the root
filesystem
- Fix a memory ordering issue in the pNFS files layout driver
* tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Give "flavor" an initial value to fix a compile warning
NFSv4.1: try SECINFO_NO_NAME flavs until one works
NFSv4.1: Ensure memory ordering between nfs4_ds_connect and nfs4_fl_prepare_ds
NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt fails
NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem method
Merge misc fixes from Andrew Morton.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
pidns: fix free_pid() to handle the first fork failure
ipc,msg: prevent race with rmid in msgsnd,msgrcv
ipc/sem.c: update sem_otime for all operations
mm/hwpoison: fix the lack of one reference count against poisoned page
mm/hwpoison: fix false report on 2nd attempt at page recovery
mm/hwpoison: fix test for a transparent huge page
mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood
block: change config option name for cmdline partition parsing
mm/mlock.c: prevent walking off the end of a pagetable in no-pmd configuration
mm: avoid reinserting isolated balloon pages into LRU lists
arch/parisc/mm/fault.c: fix uninitialized variable usage
include/asm-generic/vtime.h: avoid zero-length file
nilfs2: fix issue with race condition of competition between segments for dirty blocks
Documentation/kernel-parameters.txt: replace kernelcore with Movable
mm/bounce.c: fix a regression where MS_SNAP_STABLE (stable pages snapshotting) was ignored
kernel/kmod.c: check for NULL in call_usermodehelper_exec()
ipc/sem.c: synchronize the proc interface
ipc/sem.c: optimize sem_lock()
ipc/sem.c: fix race in sem_lock()
mm/compaction.c: periodically schedule when freeing pages
...
Many NILFS2 users were reported about strange file system corruption
(for example):
NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768
NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540)
But such error messages are consequence of file system's issue that takes
place more earlier. Fortunately, Jerome Poulin <jeromepoulin@gmail.com>
and Anton Eliasson <devel@antoneliasson.se> were reported about another
issue not so recently. These reports describe the issue with segctor
thread's crash:
BUG: unable to handle kernel paging request at 0000000000004c83
IP: nilfs_end_page_io+0x12/0xd0 [nilfs2]
Call Trace:
nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
nilfs_segctor_construct+0x17b/0x290 [nilfs2]
nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
kthread+0xc0/0xd0
ret_from_fork+0x7c/0xb0
These two issues have one reason. This reason can raise third issue
too. Third issue results in hanging of segctor thread with eating of
100% CPU.
REPRODUCING PATH:
One of the possible way or the issue reproducing was described by
Jermoe me Poulin <jeromepoulin@gmail.com>:
1. init S to get to single user mode.
2. sysrq+E to make sure only my shell is running
3. start network-manager to get my wifi connection up
4. login as root and launch "screen"
5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies.
6. lscp | xz -9e > lscp.txt.xz
7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs
8. start a screen to dump /proc/kmsg to text file since rsyslog is killed
9. start a screen and launch strace -f -o find-cat.log -t find
/mnt/nilfs -type f -exec cat {} > /dev/null \;
10. start a screen and launch strace -f -o apt-get.log -t apt-get update
11. launch the last command again as it did not crash the first time
12. apt-get crashes
13. ps aux > ps-aux-crashed.log
13. sysrq+W
14. sysrq+E wait for everything to terminate
15. sysrq+SUSB
Simplified way of the issue reproducing is starting kernel compilation
task and "apt-get update" in parallel.
REPRODUCIBILITY:
The issue is reproduced not stable [60% - 80%]. It is very important to
have proper environment for the issue reproducing. The critical
conditions for successful reproducing:
(1) It should have big modified file by mmap() way.
(2) This file should have the count of dirty blocks are greater that
several segments in size (for example, two or three) from time to time
during processing.
(3) It should be intensive background activity of files modification
in another thread.
INVESTIGATION:
First of all, it is possible to see that the reason of crash is not valid
page address:
NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
NILFS [nilfs_segctor_complete_write]:2101 segbuf->sb_segnum 6783
Moreover, value of b_page (0x1a82) is 6786. This value looks like segment
number. And b_blocknr with b_size values look like block numbers. So,
buffer_head's pointer points on not proper address value.
Detailed investigation of the issue is discovered such picture:
[-----------------------------SEGMENT 6783-------------------------------]
NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111149024, segbuf->sb_segnum 6783
[-----------------------------SEGMENT 6784-------------------------------]
NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff8802174a6798, bh->b_assoc_buffers.prev ffff880221cffee8
NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6784
NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111150080, segbuf->sb_segnum 6784, segbuf->sb_nbio 0
[----------] ditto
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111164416, segbuf->sb_segnum 6784, segbuf->sb_nbio 15
[-----------------------------SEGMENT 6785-------------------------------]
NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff880219277e80, bh->b_assoc_buffers.prev ffff880221cffc88
NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6785
NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111165440, segbuf->sb_segnum 6785, segbuf->sb_nbio 0
[----------] ditto
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111177728, segbuf->sb_segnum 6785, segbuf->sb_nbio 12
NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait
NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6783
NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6784
NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6785
NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
BUG: unable to handle kernel paging request at 0000000000001a82
IP: [<ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2]
Usually, for every segment we collect dirty files in list. Then, dirty
blocks are gathered for every dirty file, prepared for write and
submitted by means of nilfs_segbuf_submit_bh() call. Finally, it takes
place complete write phase after calling nilfs_end_bio_write() on the
block layer. Buffers/pages are marked as not dirty on final phase and
processed files removed from the list of dirty files.
It is possible to see that we had three prepare_write and submit_bio
phases before segbuf_wait and complete_write phase. Moreover, segments
compete between each other for dirty blocks because on every iteration
of segments processing dirty buffer_heads are added in several lists of
payload_buffers:
[SEGMENT 6784]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
[SEGMENT 6785]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
The next pointer is the same but prev pointer has changed. It means
that buffer_head has next pointer from one list but prev pointer from
another. Such modification can be made several times. And, finally, it
can be resulted in various issues: (1) segctor hanging, (2) segctor
crashing, (3) file system metadata corruption.
FIX:
This patch adds:
(1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write()
for every proccessed dirty block;
(2) checking of BH_Async_Write flag in
nilfs_lookup_dirty_data_buffers() and
nilfs_lookup_dirty_node_buffers();
(3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(),
nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page().
Reported-by: Jerome Poulin <jeromepoulin@gmail.com>
Reported-by: Anton Eliasson <devel@antoneliasson.se>
Cc: Paul Fertser <fercerpav@gmail.com>
Cc: ARAI Shun-ichi <hermes@ceres.dti.ne.jp>
Cc: Piotr Szymaniak <szarpaj@grubelek.pl>
Cc: Juan Barry Manuel Canham <Linux@riotingpacifist.net>
Cc: Zahid Chowdhury <zahid.chowdhury@starsolutions.com>
Cc: Elmer Zhang <freeboy6716@gmail.com>
Cc: Kenneth Langga <klangga@gmail.com>
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A high setting of max_map_count, and a process core-dumping with a large
enough vm_map_count could result in an NT_FILE note not being written,
and the kernel crashing immediately later because it has assumed
otherwise.
Reproduction of the oops-causing bug described here:
https://lkml.org/lkml/2013/8/30/50
Rge ussue originated in commit 2aa362c49c ("coredump: extend core dump
note section to contain file names of mapped file") from Oct 4, 2012.
This patch make that section optional in that case. fill_files_note()
should signify the error, and also let the info struct in
elf_core_dump() be zero-initialized so that we can check for the
optionally written note.
[akpm@linux-foundation.org: avoid abusing E2BIG, remove a couple of not-really-needed local variables]
[akpm@linux-foundation.org: fix sparse warning]
Signed-off-by: Dan Aloni <alonid@stratoscale.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Reported-by: Martin MOKREJS <mmokrejs@gmail.com>
Tested-by: Martin MOKREJS <mmokrejs@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>