(This is being reposted. The first one had a problem because it
erroneously added a similar change elsewhere; that change has been
dropped.)
The next patch in this series points out that the calculation for
the number of pages in an osd request is getting done twice. It
is not obvious, but the result of both calculations is identical.
This patch simplifies one of them--as a separate step--to make
it clear that the transformation in the next patch is valid.
In ceph_sync_write() there is some magic that computes page_align
for an osd request. But a little analysis shows it can be
simplified.
First, we have:
io_align = pos & ~PAGE_MASK;
which is used here:
page_align = (pos - io_align + buf_align) & ~PAGE_MASK;
Note (pos - io_align) simply rounds "pos" down to the nearest multiple
of the page size.
We also have:
buf_align = (unsigned long)data & ~PAGE_MASK;
Adding buf_align to that rounded-down "pos" value will stay within
the same page; the result will just be offset by the page offset for
the "data" pointer. The final mask therefore leaves just the value
of "buf_align".
One more simplification. Note that the result of calc_pages_for()
is invariant of which page the offset starts in--the only thing that
matters is the offset within the starting page. We will have
put the proper page offset to use into "page_align", so just use
that in calculating num_pages.
This resolves:
http://tracker.ceph.com/issues/4166
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
There's a spot that computes the number of pages to allocate for a
page-aligned length by just shifting it. Use calc_pages_for()
instead, to be consistent with usage everywhere else. The result
is the same.
The reason for this is to make it clearer in an upcoming patch that
this calculation is duplicated.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Currently, incoming mds messages never use page data, which means
there is no need to set the page_alignment field in the message.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
The only user of the ceph messenger that doesn't define an alloc_msg
method is the mds client. Define one, such that it works just like
it did before, and simplify ceph_con_in_msg_alloc() by assuming the
alloc_msg method is always present.
This and the next patch resolve:
http://tracker.ceph.com/issues/4322
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
The purpose of ceph_calc_object_layout() is to fill in the pool
number and seed for a ceph_pg structure provided, based on a given
osd map and target object id.
Currently that function takes a file layout parameter, but the only
thing used out of that is its pool number.
Change the function so it takes a pool number rather than the full
file layout structure. Only update the ceph_pg if the pool is found
in the osd map. Get rid of few useless lines of code from the
function while there.
Since the function now very clearly just fills in the ceph_pg
structure it's provided, rename it ceph_calc_ceph_pg().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The pagelist_count field is never actually used, so get rid of it.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
make __ceph_do_pending_vmtruncate() acquire the i_mutex if the caller
does not hold the i_mutex, so ceph_aio_read() can call safely.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
ceph_aio_write() has an optimization that marks CEPH_CAP_FILE_WR
cap dirty before data is copied to page cache and inode size is
updated. The optimization avoids slow cap revocation caused by
balance_dirty_pages(), but introduces inode size update race. If
ceph_check_caps() flushes the dirty cap before the inode size is
updated, MDS can miss the new inode size. So just remove the
optimization.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
commit 22cddde104 breaks the atomicity of write operation, it also
introduces a deadlock between write and truncate.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Conflicts:
fs/ceph/addr.c
commit c6ffe10015 moved the flag that tracks if the dcache contents
for a directory are complete to dentry. The problem is there are
lots of places that use ceph_dir_{set,clear,test}_complete() while
holding i_ceph_lock. but ceph_dir_{set,clear,test}_complete() may
sleep because they call dput().
This patch basically reverts that commit. For ceph_d_prune(), it's
called with both the dentry to prune and the parent dentry are
locked. So it's safe to access the parent dentry's d_inode and
clear I_COMPLETE flag.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
MDS ignores cap update message if migrate_seq mismatch, so when
receiving a cap import message with higher migrate_seq, set mds_want
according to the cap import message.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Use distinct fields for tracking the number of pages in a message's
page array and in a message's page list. Currently only one or the
other is used at a time, but that will be changing soon.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
This reverts commit 3a366e614d.
Wanlong Gao reports that it causes a kernel panic on his machine several
minutes after boot. Reverting it removes the panic.
Jens says:
"It's not quite clear why that is yet, so I think we should just revert
the commit for 3.9 final (which I'm assuming is pretty close).
The wifi is crap at the LSF hotel, so sending this email instead of
queueing up a revert and pull request."
Reported-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Requested-by: Jens Axboe <axboe@kernel.dk>
Cc: Tejun Heo <tj@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Documentation/filesystems/proc.txt says about coredump_filter bitmask,
Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only
effected by bit 5-6.
However current code can go into the subsequent flag checks of bit 0-4
for vma(VM_HUGETLB). So this patch inserts 'return' and makes it work
as written in the document.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org> [3.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we fail to include any data on hugepages into coredump,
because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was
recently introduced by commit 314e51b985 ("mm: kill vma flag
VM_RESERVED and mm->reserved_vm counter").
This looks to me a serious regression, so let's fix it.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org> [3.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull {timer,irq,core} fixes from Thomas Gleixner:
- timer: bug fix for a cpu hotplug race.
- irq: single bugfix for a wrong return value, which prevents the
calling function to invoke the software fallback.
- core: bugfix which plugs two race confitions which can cause hotplug
per cpu threads to end up on the wrong cpu.
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimer: Don't reinitialize a cpu_base lock on CPU_UP
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: gic: fix irq_trigger return
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kthread: Prevent unpark race which puts threads on the wrong cpu
Pull one more btrfs fix from Chris Mason:
"This has a recent fix from Josef for our tree log replay code. It
fixes problems where the inode counter for the number of bytes in the
file wasn't getting updated properly during fsync replay.
The commit did get rebased this morning, but it was only to clean up
the subject line. The code hasn't changed."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: make sure nbytes are right after log replay
Revert commit 62a3ddef61 ("vfs: fix spinning prevention in prune_icache_sb").
This commit doesn't look right: since we are looking at the tail of the
list (sb->s_inode_lru.prev) if we want to skip an inode, we should put
it back at the head of the list instead of the tail, otherwise we will
keep spinning on it.
Discovered when investigating why prune_icache_sb came top in perf
reports of a swapping load.
Signed-off-by: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v3.2+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While trying to track down a tree log replay bug I noticed that fsck was always
complaining about nbytes not being right for our fsynced file. That is because
the new fsync stuff doesn't wait for ordered extents to complete, so the inodes
nbytes are not necessarily updated properly when we log it. So to fix this we
need to set nbytes to whatever it is on the inode that is on disk, so when we
replay the extents we can just add the bytes that are being added as we replay
the extent. This makes it work for the case that we have the wrong nbytes or
the case that we logged everything and nbytes is actually correct. With this
I'm no longer getting nbytes errors out of btrfsck.
Cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Pull CIFS fix from Steve French:
"Fixes a regression in cifs in which a password which begins with a
comma is parsed incorrectly as a blank password"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Allow passwords which begin with a delimitor
The smpboot threads rely on the park/unpark mechanism which binds per
cpu threads on a particular core. Though the functionality is racy:
CPU0 CPU1 CPU2
unpark(T) wake_up_process(T)
clear(SHOULD_PARK) T runs
leave parkme() due to !SHOULD_PARK
bind_to(CPU2) BUG_ON(wrong CPU)
We cannot let the tasks move themself to the target CPU as one of
those tasks is actually the migration thread itself, which requires
that it starts running on the target cpu right away.
The solution to this problem is to prevent wakeups in park mode which
are not from unpark(). That way we can guarantee that the association
of the task to the target cpu is working correctly.
Add a new task state (TASK_PARKED) which prevents other wakeups and
use this state explicitly for the unpark wakeup.
Peter noticed: Also, since the task state is visible to userspace and
all the parked tasks are still in the PID space, its a good hint in ps
and friends that these tasks aren't really there for the moment.
The migration thread has another related issue.
CPU0 CPU1
Bring up CPU2
create_thread(T)
park(T)
wait_for_completion()
parkme()
complete()
sched_set_stop_task()
schedule(TASK_PARKED)
The sched_set_stop_task() call is issued while the task is on the
runqueue of CPU1 and that confuses the hell out of the stop_task class
on that cpu. So we need the same synchronizaion before
sched_set_stop_task().
Reported-by: Dave Jones <davej@redhat.com>
Reported-and-tested-by: Dave Hansen <dave@sr71.net>
Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Acked-by: Peter Ziljstra <peterz@infradead.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: dhillf@gmail.com
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304091635430.21884@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fixes a regression in cifs_parse_mount_options where a password
which begins with a delimitor is parsed incorrectly as being a blank
password.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
- Fix a brain fart in nfs41_walk_client_list
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJRZZq5AAoJEGcL54qWCgDypU8P/0daWpe+a8TNpXDA0KdYZKYN
KNXvZkNNk/TtSiQo5gPzRnD4CgZIZ4n+EX9U94gmdNr/UQz7xiL+bHZY4zFtQ574
i+QMiLbf687anY7vLBL1eKOhKHeBMoIrk2G3iineEUhfzF97cqtgqIou1pSS/BCa
2kk/w/LRWPOaMpr802y2p9R/mejRtDbTIwaPURTKA3Pw+odwiVib3FXMIoXDI5Iq
QzH2fl+Q0me/Z2c5Y+KRs5X3gY1MWdhpZUbEpKy3iLAxlgl3gfp7Mxpb61dw5gBz
Jl2F1lDOzYmU1Uqe88G7w38RnBD0Q7RWtlQzZFMeIQsk1TqPsx9ymFRxaZu1Q6HZ
+hdpfVsFDhGNTvLZF4YSP4c7AS9s1yEj8erT8Ro90Ar/PuZi15N6HpDzHHAiIQWK
HsqSLQBrW24cFk2Ybed7YVcFdNxHdR3DDYVVstodnhIw9VwDSvQfPBlhlPqF+Q/9
onnAMsc6SqHnLhFV7yCF6tB0Of4ZPO0oIeW8C0Hrxo+sPly03BvasAvaSWr3uheh
wqEtawNm9QQVMdWSA1hA0LV6P887yTRXruT83uC14doPlz5g0hxlvAZQfDC3Ld3J
ae4HARv3LLFj7Dk9/9yyM6FELyTIe8YvqvH8u9QenPQEmW0VlaPVp73vPEhL5yPA
TxWSJtquxq5ajpH5lBeI
=G1ZG
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.9-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull another nfs fixlet from Trond Myklebust:
"I suddenly noticed that a one-line issue that I _thought_ I had fixed
with the nfs41_walk_client_list patch was apparently still there in
the pull request I sent earlier today. I'm very sorry for not
catching that in time.
- Fix a brain fart in nfs41_walk_client_list"
* tag 'nfs-for-3.9-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Doh! Typo in the fix to nfs41_walk_client_list
Make sure that we set the status to 0 on success. Missed in testing
because it never appears when doing multiple mounts to _different_
servers.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: <stable@vger.kernel.org> # 3.7.x: 7b1f1fd: NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list
- Stable fix for memory corruption issues in nfs4[01]_walk_client_list
- Stable fix for an Oopsable bug in rpc_clone_client
- Another state manager deadlock in the NFSv4 open code
- Memory leaks in nfs4_discover_server_trunking and rpc_new_client
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJRZYu9AAoJEGcL54qWCgDySfwP/R2IdO2nfRzmDCPtvD6pPg8T
l8Gf97Z/8A3g6WwfvmKNt48D1fKnhAcOaKTZQIZuZePAjI/Yy74DFMof6paiDmsO
8hMcZgvunZotPwmBmhIwmLOxDYgbpdizDBlITsimnUQLrv78bMw2F/cNCcThYgTI
Q4sNpZsl4kk1nmOYK/tGBCCkq6mIQhc95QeQPgnl2B/NozpZiIqgzrpWpSWMofn2
cuSLiuEdmpCdJbgQaPEjSWf+doo/nBn720+Xj2RjmLhTTnWUtAsouElAdMs96Jjz
cEhSll3nLIygr1xdFF7CD8qFjpbtg/YNhKw3HBCFAgHjrAjr+a3N+eHQOz9QQ6W4
5OL3Mj0VEkvMrK1Sy76smynQJMJhrsn852Zo2wK2mCp+mHNZlBlML529Y4PJy2Ba
Up4MteIaOTpKGSnBdzWmqPqro9glqlhrUk/o3XipCzIziWC8yDYjl2J9Ez8B7Ren
uzvBeevYRX9AmQlmZUAPvx8+xVqA6cr0X2q8/6PqPnrNXP6Ff8+rm6gvH4VozyzJ
qd/r7Bf1ozFXxoKQOztSiGjI5YiBp4DRXycR5td6eF3nZJipmbxY+WKllhaAakn6
UY2NsGX2zfxkJMltqd2/xRmHtN+Eif1Uoo35pvzNxzBtPsRxBMIiPhGLglQu98Yj
2NuwfT4//UNfS6JlBe6E
=kBf2
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.9-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- fix for memory corruption issues in nfs4[01]_walk_client_list (stable)
- fix for an Oopsable bug in rpc_clone_client (stable)
- another state manager deadlock in the NFSv4 open code
- memory leaks in nfs4_discover_server_trunking and rpc_new_client
* tag 'nfs-for-3.9-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Fix another potential state manager deadlock
SUNRPC: Fix a potential memory leak in rpc_new_client
NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list
NFSv4: Fix a memory leak in nfs4_discover_server_trunking
SUNRPC: Remove extra xprt_put()
Pull vfs fixes from Al Viro:
"A nasty bug in fs/namespace.c caught by Andrey + a couple of less
serious unpleasantness - ecryptfs misc device playing hopeless games
with try_module_get() and palinfo procfs support being... not quite
correctly done, to be polite."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
mnt: release locks on error path in do_loopback
palinfo fixes
procfs: add proc_remove_subtree()
ecryptfs: close rmmod race
do_loopback calls lock_mount(path) and forget to unlock_mount
if clone_mnt or copy_mnt fails.
[ 77.661566] ================================================
[ 77.662939] [ BUG: lock held when returning to user space! ]
[ 77.664104] 3.9.0-rc5+ #17 Not tainted
[ 77.664982] ------------------------------------------------
[ 77.666488] mount/514 is leaving the kernel with locks still held!
[ 77.668027] 2 locks held by mount/514:
[ 77.668817] #0: (&sb->s_type->i_mutex_key#7){+.+.+.}, at: [<ffffffff811cca22>] lock_mount+0x32/0xe0
[ 77.671755] #1: (&namespace_sem){+++++.}, at: [<ffffffff811cca3a>] lock_mount+0x4a/0xe0
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
just what it sounds like; do that only to procfs subtrees you've
created - doing that to something shared with another driver is
not only antisocial, but might cause interesting races with
proc_create() and its ilk.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Don't hold the NFSv4 sequence id while we check for open permission.
The call to ACCESS may block due to reboot recovery.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
It is unsafe to use list_for_each_entry_safe() here, because
when we drop the nn->nfs_client_lock, we pin the _current_ list
entry and ensure that it stays in the list, but we don't do the
same for the _next_ list entry. Use of list_for_each_entry() is
therefore the correct thing to do.
Also fix the refcounting in nfs41_walk_client_list().
Finally, ensure that the nfs_client has finished being initialised
and, in the case of NFSv4.1, that the session is set up.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Bryan Schumaker <bjschuma@netapp.com>
Cc: stable@vger.kernel.org [>= 3.7]
When we assign a new rpc_client to clp->cl_rpcclient, we need to destroy
the old one.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org [>=3.7]
Pull GFS2 fixes from Steven Whitehouse:
"There are two patches which fix up a couple of minor issues in the DLM
interface code, a missing error path in gfs2_rs_alloc(), one patch
which fixes a problem during "withdraw" and a fix for discards/FITRIM
when using 4k sector sized devices."
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
GFS2: Issue discards in 512b sectors
GFS2: Fix unlock of fcntl locks during withdrawn state
GFS2: return error if malloc failed in gfs2_rs_alloc()
GFS2: use memchr_inv
GFS2: use kmalloc for lvb bitmap
This patch changes GFS2's discard issuing code so that it calls
function sb_issue_discard rather than blkdev_issue_discard. The
code was calling blkdev_issue_discard and specifying the correct
sector offset and sector size, but blkdev_issue_discard expects
these values to be in terms of 512 byte sectors, even if the native
sector size for the device is different. Calling sb_issue_discard
with the BLOCK size instead ensures the correct block-to-512b-sector
translation. I verified that "minlen" is specified in blocks, so
comparing it to a number of blocks is correct.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
mounted R/O and then remounted R/W.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRXZxUAAoJECmIfjd9wqK0jTUP/2QvNLRxMwKp984D0M6euIPg
fMGsz73wwB+d0P1AlOIP2y5DG787SGDmmpP9SPFWiI9QH+vuJUlp01Di2MxJFGiL
yh9iuhJ0MHGQFIKbSuolGiooJIABnQi9629L29Li8wbGrbwWK7WI+bQfb7EaTLSN
1c4PX+42fAi6UP84IXtkFHl3hbGSuZ9+dSPJ0U6VAuLl0zQRv6PxIxwR+Fqi1Wqq
VJXrU6bkUbbTFndm7UfkQGQ+Z4DQ5gnXnSdUHkd6dsPoLqNyIor7AjW5/IKvTPkN
5OBpLv7Eo4WBiozlJdu2I26HBgyyQKIgL9HA2CYSoFzopl8Pa+lhoNPOseA6axMq
abXK2nRGAxmMGkGdUGOlugNylVDpsJJ1cX8mjwX0G3L4aZZBLGflflYo+X8pm1c4
TV+MlloSv4SwKrgpgfiJS7q0kzOMEZNIyoIIPYeMf7VcLsbbDCv2bOTvR3LxL9Bt
TlVESqSlcImsgTG0fMK/YFefpEAkLVJPTw3T25yJ/vtoZsbw4HVa30/A5mleDEUk
b4r43KWW9Nodz81klQUj9WF5aK/7yl2oyNzyIg8CdCY7b2sDyf6ixrkS51mYY3Jm
1PagVOcJZ4CBBrerP13+dc5/9m+rsHkRw9aVvvw2U5cqqVdJnd8EdvHNRCETgTZ6
REd95pyaBsjqBUwkHUVc
=79wn
-----END PGP SIGNATURE-----
Merge tag 'upstream-3.9-rc6' of git://git.infradead.org/linux-ubifs
Pull UBIFS fix from Artem Bityutskiy:
"Make the space fixup feature work in the case when the file-system is
first mounted R/O and then remounted R/W."
* tag 'upstream-3.9-rc6' of git://git.infradead.org/linux-ubifs:
UBIFS: make space fixup work in the remount case
When withdraw occurs, we need to continue to allow unlocks of fcntl
locks to occur, however these will only be local, since the node has
withdrawn from the cluster. This prevents triggering a VFS level
bug trap due to locks remaining when a file is closed.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The error code in gfs2_rs_alloc() is set to ENOMEM when error
but never be used, instead, gfs2_rs_alloc() always return 0.
Fix to return 'error'.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Use memchr_inv to verify that the specified memory range is cleared.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: cluster-devel@redhat.com
Cc: Christine Caulfield <ccaulfie@redhat.com>
Cc: David Teigland <teigland@redhat.com>
The temp lvb bitmap was on the stack, which could
be an alignment problem for __set_bit_le. Use
kmalloc for it instead.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
merge window. Fortunately, Cai and Christian noticed before 3.9
shipped.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJRXG7FAAoJENNvdpvBGATw7GAQALbl5LxmVmGR6JRQzNoINP+H
v29ulo1Kly4j2vt+3b0rXKv9axWl0C/dItFlC/9WqmwuB/0BptBKIVnkKH+6zu2v
F+cO41gfpJo3ozcgsCrjvWfdkTWbjbPTQ4XiQDFILkwiB4R9KdpynKcVcjDY+gQE
umwJpXwDDd+fdr4FNQiFFPqd8rCC8fEeClWTtOFx7UidKl8v18iZ0/OPiAr+jBOY
rlcaZ9F8nmOJTwgriGbod4X827xEDj7Jwe7/C6oy/lKLOTLhaahgHPDW/l0O4KZA
4eJLj/5nxmYling4Y+rQvglVhNJ4LNv+IAXu5IpqRxosPYFnxQq+JYn8D5BlXifd
0/hG+BwTkhm4RLJ8uQvUxxglZNQEWeSuIma4dnZX3Xf9AzsvNW9x3Iilj3F7dhUS
6h9aeoYKv9y7GY9Out1P/UZYVi4HmB3jHiOcdTNCK4plQ3Sn2NYMw6RK1z4cXvE+
Pokc0a9KNyusNSI83tDtjRjan9NzsRbTggoGVf19RVoIVqIjkyXzUGasO/y+mKhp
LENAjkABdbLB1Re8B/99KwgIloUTvxGcojLKzkEbgcobruvEwKvxIrTi+fgNOiu6
GqJOh8TwZtx3SGJujsyOSBBrdPfjPHReBWrX0VRHl/Wsd4RWCaDT8H1EdNONQ+to
lQ+JvTZgFwQB2GABjNB6
=n1ir
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Unfortunately, we introduced some big-endian bugs during the last
merge window. Fortunately, Cai and Christian noticed before 3.9
shipped."
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix big-endian bugs which could cause fs corruptions
Pull reiserfs fix from Jan Kara:
"A fix for reiserfs xattr bug exposed by changes to lookup_one_len()"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
reiserfs: Fix warning and inode leak when deleting inode with xattrs
When an extent was zeroed out, we forgot to do convert from cpu to le16.
It could make us hit a BUG_ON when we try to write dirty pages out. So
fix it.
[ Also fix a bug found by Dmitry Monakhov where we were missing
le32_to_cpu() calls in the new indirect punch hole code.
There are a number of other big endian warnings found by static code
analyzers, but we'll wait for the next merge window to fix them all
up. These fixes are designed to be Obviously Correct by code
inspection, and easy to demonstrate that it won't make any
difference (and hence, won't introduce any bugs) on little endian
architectures such as x86. --tytso ]
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: CAI Qian <caiqian@redhat.com>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>
struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
block_device allocated first time we access /dev/loopXX and deallocated on
bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
we want that block_device stay alive until we destroy the loop device
with "losetup -d".
But because we do not hold /dev/loopXX inode its counter goes 0, and
inode/bdev can be destroyed at any moment. Usually it happens at memory
pressure or when user drops inode cache (like in the test below). When later in
loop_clr_fd() we want to use bdev we have use-after-free error with following
stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
bd_set_size+0x10/0xa0
loop_clr_fd+0x1f8/0x420 [loop]
lo_ioctl+0x200/0x7e0 [loop]
lo_compat_ioctl+0x47/0xe0 [loop]
compat_blkdev_ioctl+0x341/0x1290
do_filp_open+0x42/0xa0
compat_sys_ioctl+0xc1/0xf20
do_sys_open+0x16e/0x1d0
sysenter_dispatch+0x7/0x1a
To prevent use-after-free we need to grab the device in loop_set_fd()
and put it later in loop_clr_fd().
The issue is reprodusible on current Linus head and v3.3. Here is the test:
dd if=/dev/zero of=loop.file bs=1M count=1
while [ true ]; do
losetup /dev/loop0 loop.file
echo 2 > /proc/sys/vm/drop_caches
losetup -d /dev/loop0
done
[ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every
time we call loop_set_fd() we check that loop_device->lo_state is
Lo_unbound and set it to Lo_bound If somebody will try to set_fd again
it will get EBUSY. And if we try to loop_clr_fd() on unbound loop
device we'll get ENXIO.
loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
loop_device->lo_ctl_mutex. ]
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull btrfs fixes from Chris Mason:
"We've had a busy two weeks of bug fixing. The biggest patches in here
are some long standing early-enospc problems (Josef) and a very old
race where compression and mmap combine forces to lose writes (me).
I'm fairly sure the mmap bug goes all the way back to the introduction
of the compression code, which is proof that fsx doesn't trigger every
possible mmap corner after all.
I'm sure you'll notice one of these is from this morning, it's a small
and isolated use-after-free fix in our scrub error reporting. I
double checked it here."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: don't drop path when printing out tree errors in scrub
Btrfs: fix wrong return value of btrfs_lookup_csum()
Btrfs: fix wrong reservation of csums
Btrfs: fix double free in the btrfs_qgroup_account_ref()
Btrfs: limit the global reserve to 512mb
Btrfs: hold the ordered operations mutex when waiting on ordered extents
Btrfs: fix space accounting for unlink and rename
Btrfs: fix space leak when we fail to reserve metadata space
Btrfs: fix EIO from btrfs send in is_extent_unchanged for punched holes
Btrfs: fix race between mmap writes and compression
Btrfs: fix memory leak in btrfs_create_tree()
Btrfs: fix locking on ROOT_REPLACE operations in tree mod log
Btrfs: fix missing qgroup reservation before fallocating
Btrfs: handle a bogus chunk tree nicely
Btrfs: update to use fs_state bit
After commit 21d8a15a (lookup_one_len: don't accept . and ..) reiserfs
started failing to delete xattrs from inode. This was due to a buggy
test for '.' and '..' in fill_with_dentries() which resulted in passing
'.' and '..' entries to lookup_one_len() in some cases. That returned
error and so we failed to iterate over all xattrs of and inode.
Fix the test in fill_with_dentries() along the lines of the one in
lookup_one_len().
Reported-by: Pawel Zawora <pzawora@gmail.com>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>