Commit Graph

31007 Commits

Author SHA1 Message Date
Alex Elder 3a42b6c43e ceph: simplify ceph_sync_write() page_align calculation
(This is being reposted.  The first one had a problem because it
erroneously added a similar change elsewhere; that change has been
dropped.)

The next patch in this series points out that the calculation for
the number of pages in an osd request is getting done twice.  It
is not obvious, but the result of both calculations is identical.
This patch simplifies one of them--as a separate step--to make
it clear that the transformation in the next patch is valid.

In ceph_sync_write() there is some magic that computes page_align
for an osd request.  But a little analysis shows it can be
simplified.

First, we have:
 	io_align = pos & ~PAGE_MASK;
which is used here:
	page_align = (pos - io_align + buf_align) & ~PAGE_MASK;

Note (pos - io_align) simply rounds "pos" down to the nearest multiple
of the page size.

We also have:
 	buf_align = (unsigned long)data & ~PAGE_MASK;

Adding buf_align to that rounded-down "pos" value will stay within
the same page; the result will just be offset by the page offset for
the "data" pointer.  The final mask therefore leaves just the value
of "buf_align".

One more simplification.  Note that the result of calc_pages_for()
is invariant of which page the offset starts in--the only thing that
matters is the offset within the starting page.  We will have
put the proper page offset to use into "page_align", so just use
that in calculating num_pages.

This resolves:
    http://tracker.ceph.com/issues/4166

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:22 -07:00
Alex Elder cf7b7e1492 ceph: use calc_pages_for() in start_read()
There's a spot that computes the number of pages to allocate for a
page-aligned length by just shifting it.  Use calc_pages_for()
instead, to be consistent with usage everywhere else.  The result
is the same.

The reason for this is to make it clearer in an upcoming patch that
this calculation is duplicated.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:21 -07:00
Alex Elder 54ae0756e3 libceph: no need for alignment for mds message
Currently, incoming mds messages never use page data, which means
there is no need to set the page_alignment field in the message.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-01 21:16:20 -07:00
Alex Elder 53ded495c6 libceph: define mds_alloc_msg() method
The only user of the ceph messenger that doesn't define an alloc_msg
method is the mds client.  Define one, such that it works just like
it did before, and simplify ceph_con_in_msg_alloc() by assuming the
alloc_msg method is always present.

This and the next patch resolve:
    http://tracker.ceph.com/issues/4322

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-01 21:16:19 -07:00
Alex Elder 41766f87f5 libceph: rename ceph_calc_object_layout()
The purpose of ceph_calc_object_layout() is to fill in the pool
number and seed for a ceph_pg structure provided, based on a given
osd map and target object id.

Currently that function takes a file layout parameter, but the only
thing used out of that is its pool number.

Change the function so it takes a pool number rather than the full
file layout structure.  Only update the ceph_pg if the pool is found
in the osd map.  Get rid of few useless lines of code from the
function while there.

Since the function now very clearly just fills in the ceph_pg
structure it's provided, rename it ceph_calc_ceph_pg().

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:17 -07:00
Alex Elder ec02a2f2ff libceph: kill ceph_msg->pagelist_count
The pagelist_count field is never actually used, so get rid of it.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:16 -07:00
Yan, Zheng 3f99969f42 ceph: acquire i_mutex in __ceph_do_pending_vmtruncate
make __ceph_do_pending_vmtruncate() acquire the i_mutex if the caller
does not hold the i_mutex, so ceph_aio_read() can call safely.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-01 21:16:11 -07:00
Yan, Zheng 6070e0c1e2 ceph: don't early drop Fw cap
ceph_aio_write() has an optimization that marks CEPH_CAP_FILE_WR
cap dirty before data is copied to page cache and inode size is
updated. The optimization avoids slow cap revocation caused by
balance_dirty_pages(), but introduces inode size update race. If
ceph_check_caps() flushes the dirty cap before the inode size is
updated, MDS can miss the new inode size. So just remove the
optimization.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-01 21:16:10 -07:00
Sage Weil 7971bd92ba ceph: revert commit 22cddde104
commit 22cddde104 breaks the atomicity of write operation, it also
introduces a deadlock between write and truncate.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>

Conflicts:
	fs/ceph/addr.c
2013-05-01 21:15:58 -07:00
Yan, Zheng a8673d61ad ceph: use I_COMPLETE inode flag instead of D_COMPLETE flag
commit c6ffe10015 moved the flag that tracks if the dcache contents
for a directory are complete to dentry. The problem is there are
lots of places that use ceph_dir_{set,clear,test}_complete() while
holding i_ceph_lock. but ceph_dir_{set,clear,test}_complete() may
sleep because they call dput().

This patch basically reverts that commit. For ceph_d_prune(), it's
called with both the dentry to prune and the parent dentry are
locked. So it's safe to access the parent dentry's d_inode and
clear I_COMPLETE flag.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-05-01 21:14:33 -07:00
Yan, Zheng 964266cce9 ceph: set mds_want according to cap import message
MDS ignores cap update message if migrate_seq mismatch, so when
receiving a cap import message with higher migrate_seq, set mds_want
according to the cap import message.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-01 21:14:32 -07:00
Yan, Zheng d40ee0dcc1 ceph: queue cap release when trimming cap
So the client will later send cap release message to MDS

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-01 21:14:31 -07:00
Yan, Zheng 8a03449700 ceph: fix LSSNAP regression
commit 6e8575faa8 makes parse_reply_info_extra() return -EIO for LSSNAP

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-01 21:14:30 -07:00
Alex Elder d4b515fa10 libceph: distinguish page array and pagelist count
Use distinct fields for tracking the number of pages in a message's
page array and in a message's page list.  Currently only one or the
other is used at a time, but that will be changing soon.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:14:28 -07:00
Zhao Hongjiang 91d80a84bb aio: fix possible invalid memory access when DEBUG is enabled
dprintk() shouldn't access @ring after it's unmapped.

Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-26 07:56:18 -07:00
Linus Torvalds 0a82a8d132 Revert "block: add missing block_bio_complete() tracepoint"
This reverts commit 3a366e614d.

Wanlong Gao reports that it causes a kernel panic on his machine several
minutes after boot. Reverting it removes the panic.

Jens says:
 "It's not quite clear why that is yet, so I think we should just revert
  the commit for 3.9 final (which I'm assuming is pretty close).

  The wifi is crap at the LSF hotel, so sending this email instead of
  queueing up a revert and pull request."

Reported-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Requested-by: Jens Axboe <axboe@kernel.dk>
Cc: Tejun Heo <tj@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-18 09:00:26 -07:00
Vyacheslav Dubeyko 12f267a20a hfsplus: fix potential overflow in hfsplus_file_truncate()
Change a u32 to loff_t hfsplus_file_truncate().

Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-17 16:10:45 -07:00
Naoya Horiguchi 23d9e48213 fs/binfmt_elf.c: fix hugetlb memory check in vma_dump_size()
Documentation/filesystems/proc.txt says about coredump_filter bitmask,

  Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only
  effected by bit 5-6.

However current code can go into the subsequent flag checks of bit 0-4
for vma(VM_HUGETLB). So this patch inserts 'return' and makes it work
as written in the document.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>	[3.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-17 16:10:44 -07:00
Naoya Horiguchi a2fce91430 hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)
Currently we fail to include any data on hugepages into coredump,
because VM_DONTDUMP is set on hugetlbfs's vma.  This behavior was
recently introduced by commit 314e51b985 ("mm: kill vma flag
VM_RESERVED and mm->reserved_vm counter").

This looks to me a serious regression, so let's fix it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>	[3.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-17 16:10:44 -07:00
Linus Torvalds bb33db7a07 Merge branches 'timers-urgent-for-linus', 'irq-urgent-for-linus' and 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull {timer,irq,core} fixes from Thomas Gleixner:

 - timer: bug fix for a cpu hotplug race.

 - irq: single bugfix for a wrong return value, which prevents the
   calling function to invoke the software fallback.

 - core: bugfix which plugs two race confitions which can cause hotplug
   per cpu threads to end up on the wrong cpu.

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hrtimer: Don't reinitialize a cpu_base lock on CPU_UP

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip: gic: fix irq_trigger return

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kthread: Prevent unpark race which puts threads on the wrong cpu
2013-04-15 07:03:01 -07:00
Linus Torvalds 3792a64fde Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull one more btrfs fix from Chris Mason:
 "This has a recent fix from Josef for our tree log replay code.  It
  fixes problems where the inode counter for the number of bytes in the
  file wasn't getting updated properly during fsync replay.

  The commit did get rebased this morning, but it was only to clean up
  the subject line.  The code hasn't changed."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: make sure nbytes are right after log replay
2013-04-14 10:52:54 -07:00
Suleiman Souhlal 5b55d70833 vfs: Revert spurious fix to spinning prevention in prune_icache_sb
Revert commit 62a3ddef61 ("vfs: fix spinning prevention in prune_icache_sb").

This commit doesn't look right: since we are looking at the tail of the
list (sb->s_inode_lru.prev) if we want to skip an inode, we should put
it back at the head of the list instead of the tail, otherwise we will
keep spinning on it.

Discovered when investigating why prune_icache_sb came top in perf
reports of a swapping load.

Signed-off-by: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v3.2+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-13 16:13:55 -07:00
Josef Bacik 4bc4bee459 Btrfs: make sure nbytes are right after log replay
While trying to track down a tree log replay bug I noticed that fsck was always
complaining about nbytes not being right for our fsynced file.  That is because
the new fsync stuff doesn't wait for ordered extents to complete, so the inodes
nbytes are not necessarily updated properly when we log it.  So to fix this we
need to set nbytes to whatever it is on the inode that is on disk, so when we
replay the extents we can just add the bytes that are being added as we replay
the extent.  This makes it work for the case that we have the wrong nbytes or
the case that we logged everything and nbytes is actually correct.  With this
I'm no longer getting nbytes errors out of btrfsck.

Cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-04-13 07:35:06 -04:00
Linus Torvalds 0b1fd266bf Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fix from Steve French:
 "Fixes a regression in cifs in which a password which begins with a
  comma is parsed incorrectly as a blank password"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Allow passwords which begin with a delimitor
2013-04-12 15:18:20 -07:00
Thomas Gleixner f2530dc71c kthread: Prevent unpark race which puts threads on the wrong cpu
The smpboot threads rely on the park/unpark mechanism which binds per
cpu threads on a particular core. Though the functionality is racy:

CPU0	       	 	CPU1  	     	    CPU2
unpark(T)				    wake_up_process(T)
  clear(SHOULD_PARK)	T runs
			leave parkme() due to !SHOULD_PARK  
  bind_to(CPU2)		BUG_ON(wrong CPU)						    

We cannot let the tasks move themself to the target CPU as one of
those tasks is actually the migration thread itself, which requires
that it starts running on the target cpu right away.

The solution to this problem is to prevent wakeups in park mode which
are not from unpark(). That way we can guarantee that the association
of the task to the target cpu is working correctly.

Add a new task state (TASK_PARKED) which prevents other wakeups and
use this state explicitly for the unpark wakeup.

Peter noticed: Also, since the task state is visible to userspace and
all the parked tasks are still in the PID space, its a good hint in ps
and friends that these tasks aren't really there for the moment.

The migration thread has another related issue.

CPU0	      	     	 CPU1
Bring up CPU2
create_thread(T)
park(T)
 wait_for_completion()
			 parkme()
			 complete()
sched_set_stop_task()
			 schedule(TASK_PARKED)

The sched_set_stop_task() call is issued while the task is on the
runqueue of CPU1 and that confuses the hell out of the stop_task class
on that cpu. So we need the same synchronizaion before
sched_set_stop_task().

Reported-by: Dave Jones <davej@redhat.com>
Reported-and-tested-by: Dave Hansen <dave@sr71.net>
Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Acked-by: Peter Ziljstra <peterz@infradead.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: dhillf@gmail.com
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304091635430.21884@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-04-12 14:18:43 +02:00
Sachin Prabhu c369c9a4a7 cifs: Allow passwords which begin with a delimitor
Fixes a regression in cifs_parse_mount_options where a password
which begins with a delimitor is parsed incorrectly as being a blank
password.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-04-10 15:54:14 -05:00
Linus Torvalds 51de017007 NFS client bugfixes for Linux 3.9
- Fix a brain fart in nfs41_walk_client_list
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRZZq5AAoJEGcL54qWCgDypU8P/0daWpe+a8TNpXDA0KdYZKYN
 KNXvZkNNk/TtSiQo5gPzRnD4CgZIZ4n+EX9U94gmdNr/UQz7xiL+bHZY4zFtQ574
 i+QMiLbf687anY7vLBL1eKOhKHeBMoIrk2G3iineEUhfzF97cqtgqIou1pSS/BCa
 2kk/w/LRWPOaMpr802y2p9R/mejRtDbTIwaPURTKA3Pw+odwiVib3FXMIoXDI5Iq
 QzH2fl+Q0me/Z2c5Y+KRs5X3gY1MWdhpZUbEpKy3iLAxlgl3gfp7Mxpb61dw5gBz
 Jl2F1lDOzYmU1Uqe88G7w38RnBD0Q7RWtlQzZFMeIQsk1TqPsx9ymFRxaZu1Q6HZ
 +hdpfVsFDhGNTvLZF4YSP4c7AS9s1yEj8erT8Ro90Ar/PuZi15N6HpDzHHAiIQWK
 HsqSLQBrW24cFk2Ybed7YVcFdNxHdR3DDYVVstodnhIw9VwDSvQfPBlhlPqF+Q/9
 onnAMsc6SqHnLhFV7yCF6tB0Of4ZPO0oIeW8C0Hrxo+sPly03BvasAvaSWr3uheh
 wqEtawNm9QQVMdWSA1hA0LV6P887yTRXruT83uC14doPlz5g0hxlvAZQfDC3Ld3J
 ae4HARv3LLFj7Dk9/9yyM6FELyTIe8YvqvH8u9QenPQEmW0VlaPVp73vPEhL5yPA
 TxWSJtquxq5ajpH5lBeI
 =G1ZG
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.9-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull another nfs fixlet from Trond Myklebust:
 "I suddenly noticed that a one-line issue that I _thought_ I had fixed
  with the nfs41_walk_client_list patch was apparently still there in
  the pull request I sent earlier today.  I'm very sorry for not
  catching that in time.

   - Fix a brain fart in nfs41_walk_client_list"

* tag 'nfs-for-3.9-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Doh! Typo in the fix to nfs41_walk_client_list
2013-04-10 10:26:49 -07:00
Trond Myklebust eb04e0ac19 NFSv4: Doh! Typo in the fix to nfs41_walk_client_list
Make sure that we set the status to 0 on success. Missed in testing
because it never appears when doing multiple mounts to _different_
servers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: <stable@vger.kernel.org> # 3.7.x: 7b1f1fd: NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list
2013-04-10 12:57:29 -04:00
Linus Torvalds f94eeb423b NFS client bugfixes for Linux 3.9
- Stable fix for memory corruption issues in nfs4[01]_walk_client_list
 - Stable fix for an Oopsable bug in rpc_clone_client
 - Another state manager deadlock in the NFSv4 open code
 - Memory leaks in nfs4_discover_server_trunking and rpc_new_client
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRZYu9AAoJEGcL54qWCgDySfwP/R2IdO2nfRzmDCPtvD6pPg8T
 l8Gf97Z/8A3g6WwfvmKNt48D1fKnhAcOaKTZQIZuZePAjI/Yy74DFMof6paiDmsO
 8hMcZgvunZotPwmBmhIwmLOxDYgbpdizDBlITsimnUQLrv78bMw2F/cNCcThYgTI
 Q4sNpZsl4kk1nmOYK/tGBCCkq6mIQhc95QeQPgnl2B/NozpZiIqgzrpWpSWMofn2
 cuSLiuEdmpCdJbgQaPEjSWf+doo/nBn720+Xj2RjmLhTTnWUtAsouElAdMs96Jjz
 cEhSll3nLIygr1xdFF7CD8qFjpbtg/YNhKw3HBCFAgHjrAjr+a3N+eHQOz9QQ6W4
 5OL3Mj0VEkvMrK1Sy76smynQJMJhrsn852Zo2wK2mCp+mHNZlBlML529Y4PJy2Ba
 Up4MteIaOTpKGSnBdzWmqPqro9glqlhrUk/o3XipCzIziWC8yDYjl2J9Ez8B7Ren
 uzvBeevYRX9AmQlmZUAPvx8+xVqA6cr0X2q8/6PqPnrNXP6Ff8+rm6gvH4VozyzJ
 qd/r7Bf1ozFXxoKQOztSiGjI5YiBp4DRXycR5td6eF3nZJipmbxY+WKllhaAakn6
 UY2NsGX2zfxkJMltqd2/xRmHtN+Eif1Uoo35pvzNxzBtPsRxBMIiPhGLglQu98Yj
 2NuwfT4//UNfS6JlBe6E
 =kBf2
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.9-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - fix for memory corruption issues in nfs4[01]_walk_client_list (stable)
 - fix for an Oopsable bug in rpc_clone_client (stable)
 - another state manager deadlock in the NFSv4 open code
 - memory leaks in nfs4_discover_server_trunking and rpc_new_client

* tag 'nfs-for-3.9-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Fix another potential state manager deadlock
  SUNRPC: Fix a potential memory leak in rpc_new_client
  NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list
  NFSv4: Fix a memory leak in nfs4_discover_server_trunking
  SUNRPC: Remove extra xprt_put()
2013-04-10 09:00:51 -07:00
Linus Torvalds e8f2b548de Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "A nasty bug in fs/namespace.c caught by Andrey + a couple of less
  serious unpleasantness - ecryptfs misc device playing hopeless games
  with try_module_get() and palinfo procfs support being...  not quite
  correctly done, to be polite."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  mnt: release locks on error path in do_loopback
  palinfo fixes
  procfs: add proc_remove_subtree()
  ecryptfs: close rmmod race
2013-04-09 12:22:49 -07:00
Andrey Vagin e9c5d8a562 mnt: release locks on error path in do_loopback
do_loopback calls lock_mount(path) and forget to unlock_mount
if clone_mnt or copy_mnt fails.

[   77.661566] ================================================
[   77.662939] [ BUG: lock held when returning to user space! ]
[   77.664104] 3.9.0-rc5+ #17 Not tainted
[   77.664982] ------------------------------------------------
[   77.666488] mount/514 is leaving the kernel with locks still held!
[   77.668027] 2 locks held by mount/514:
[   77.668817]  #0:  (&sb->s_type->i_mutex_key#7){+.+.+.}, at: [<ffffffff811cca22>] lock_mount+0x32/0xe0
[   77.671755]  #1:  (&namespace_sem){+++++.}, at: [<ffffffff811cca3a>] lock_mount+0x4a/0xe0

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 14:09:50 -04:00
Al Viro 8ce584c741 procfs: add proc_remove_subtree()
just what it sounds like; do that only to procfs subtrees you've
created - doing that to something shared with another driver is
not only antisocial, but might cause interesting races with
proc_create() and its ilk.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 14:09:17 -04:00
Al Viro 52f21999c7 ecryptfs: close rmmod race
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 14:08:16 -04:00
Trond Myklebust fa332941c0 NFSv4: Fix another potential state manager deadlock
Don't hold the NFSv4 sequence id while we check for open permission.
The call to ACCESS may block due to reboot recovery.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-09 13:19:35 -04:00
Trond Myklebust 7b1f1fd184 NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list
It is unsafe to use list_for_each_entry_safe() here, because
when we drop the nn->nfs_client_lock, we pin the _current_ list
entry and ensure that it stays in the list, but we don't do the
same for the _next_ list entry. Use of list_for_each_entry() is
therefore the correct thing to do.

Also fix the refcounting in nfs41_walk_client_list().

Finally, ensure that the nfs_client has finished being initialised
and, in the case of NFSv4.1, that the session is set up.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Bryan Schumaker <bjschuma@netapp.com>
Cc: stable@vger.kernel.org [>= 3.7]
2013-04-05 16:59:19 -04:00
Trond Myklebust b193d59a48 NFSv4: Fix a memory leak in nfs4_discover_server_trunking
When we assign a new rpc_client to clp->cl_rpcclient, we need to destroy
the old one.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org [>=3.7]
2013-04-05 16:59:15 -04:00
Linus Torvalds 00fa6fe963 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes
Pull GFS2 fixes from Steven Whitehouse:
 "There are two patches which fix up a couple of minor issues in the DLM
  interface code, a missing error path in gfs2_rs_alloc(), one patch
  which fixes a problem during "withdraw" and a fix for discards/FITRIM
  when using 4k sector sized devices."

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
  GFS2: Issue discards in 512b sectors
  GFS2: Fix unlock of fcntl locks during withdrawn state
  GFS2: return error if malloc failed in gfs2_rs_alloc()
  GFS2: use memchr_inv
  GFS2: use kmalloc for lvb bitmap
2013-04-05 12:22:02 -07:00
Bob Peterson b2c87cae0e GFS2: Issue discards in 512b sectors
This patch changes GFS2's discard issuing code so that it calls
function sb_issue_discard rather than blkdev_issue_discard. The
code was calling blkdev_issue_discard and specifying the correct
sector offset and sector size, but blkdev_issue_discard expects
these values to be in terms of 512 byte sectors, even if the native
sector size for the device is different. Calling sb_issue_discard
with the BLOCK size instead ensures the correct block-to-512b-sector
translation. I verified that "minlen" is specified in blocks, so
comparing it to a number of blocks is correct.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-05 17:55:13 +01:00
Linus Torvalds 22d1e6f4c5 Make the space fixup feature work in the case when the file-system is first
mounted R/O and then remounted R/W.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRXZxUAAoJECmIfjd9wqK0jTUP/2QvNLRxMwKp984D0M6euIPg
 fMGsz73wwB+d0P1AlOIP2y5DG787SGDmmpP9SPFWiI9QH+vuJUlp01Di2MxJFGiL
 yh9iuhJ0MHGQFIKbSuolGiooJIABnQi9629L29Li8wbGrbwWK7WI+bQfb7EaTLSN
 1c4PX+42fAi6UP84IXtkFHl3hbGSuZ9+dSPJ0U6VAuLl0zQRv6PxIxwR+Fqi1Wqq
 VJXrU6bkUbbTFndm7UfkQGQ+Z4DQ5gnXnSdUHkd6dsPoLqNyIor7AjW5/IKvTPkN
 5OBpLv7Eo4WBiozlJdu2I26HBgyyQKIgL9HA2CYSoFzopl8Pa+lhoNPOseA6axMq
 abXK2nRGAxmMGkGdUGOlugNylVDpsJJ1cX8mjwX0G3L4aZZBLGflflYo+X8pm1c4
 TV+MlloSv4SwKrgpgfiJS7q0kzOMEZNIyoIIPYeMf7VcLsbbDCv2bOTvR3LxL9Bt
 TlVESqSlcImsgTG0fMK/YFefpEAkLVJPTw3T25yJ/vtoZsbw4HVa30/A5mleDEUk
 b4r43KWW9Nodz81klQUj9WF5aK/7yl2oyNzyIg8CdCY7b2sDyf6ixrkS51mYY3Jm
 1PagVOcJZ4CBBrerP13+dc5/9m+rsHkRw9aVvvw2U5cqqVdJnd8EdvHNRCETgTZ6
 REd95pyaBsjqBUwkHUVc
 =79wn
 -----END PGP SIGNATURE-----

Merge tag 'upstream-3.9-rc6' of git://git.infradead.org/linux-ubifs

Pull UBIFS fix from Artem Bityutskiy:
 "Make the space fixup feature work in the case when the file-system is
  first mounted R/O and then remounted R/W."

* tag 'upstream-3.9-rc6' of git://git.infradead.org/linux-ubifs:
  UBIFS: make space fixup work in the remount case
2013-04-04 08:41:43 -07:00
Steven Whitehouse c2952d202f GFS2: Fix unlock of fcntl locks during withdrawn state
When withdraw occurs, we need to continue to allow unlocks of fcntl
locks to occur, however these will only be local, since the node has
withdrawn from the cluster. This prevents triggering a VFS level
bug trap due to locks remaining when a file is closed.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-04 09:53:46 +01:00
Wei Yongjun 441362d06b GFS2: return error if malloc failed in gfs2_rs_alloc()
The error code in gfs2_rs_alloc() is set to ENOMEM when error
but never be used, instead, gfs2_rs_alloc() always return 0.
Fix to return 'error'.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-04 09:53:10 +01:00
Akinobu Mita 4146c3d469 GFS2: use memchr_inv
Use memchr_inv to verify that the specified memory range is cleared.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: cluster-devel@redhat.com
Cc: Christine Caulfield <ccaulfie@redhat.com>
Cc: David Teigland <teigland@redhat.com>
2013-04-04 09:52:50 +01:00
David Teigland 57c7310b8e GFS2: use kmalloc for lvb bitmap
The temp lvb bitmap was on the stack, which could
be an alignment problem for __set_bit_le.  Use
kmalloc for it instead.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-04 09:52:14 +01:00
Linus Torvalds cbfa0e7204 Unfortunately, we introduced some big-endian bugs during the last
merge window.  Fortunately, Cai and Christian noticed before 3.9
 shipped.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJRXG7FAAoJENNvdpvBGATw7GAQALbl5LxmVmGR6JRQzNoINP+H
 v29ulo1Kly4j2vt+3b0rXKv9axWl0C/dItFlC/9WqmwuB/0BptBKIVnkKH+6zu2v
 F+cO41gfpJo3ozcgsCrjvWfdkTWbjbPTQ4XiQDFILkwiB4R9KdpynKcVcjDY+gQE
 umwJpXwDDd+fdr4FNQiFFPqd8rCC8fEeClWTtOFx7UidKl8v18iZ0/OPiAr+jBOY
 rlcaZ9F8nmOJTwgriGbod4X827xEDj7Jwe7/C6oy/lKLOTLhaahgHPDW/l0O4KZA
 4eJLj/5nxmYling4Y+rQvglVhNJ4LNv+IAXu5IpqRxosPYFnxQq+JYn8D5BlXifd
 0/hG+BwTkhm4RLJ8uQvUxxglZNQEWeSuIma4dnZX3Xf9AzsvNW9x3Iilj3F7dhUS
 6h9aeoYKv9y7GY9Out1P/UZYVi4HmB3jHiOcdTNCK4plQ3Sn2NYMw6RK1z4cXvE+
 Pokc0a9KNyusNSI83tDtjRjan9NzsRbTggoGVf19RVoIVqIjkyXzUGasO/y+mKhp
 LENAjkABdbLB1Re8B/99KwgIloUTvxGcojLKzkEbgcobruvEwKvxIrTi+fgNOiu6
 GqJOh8TwZtx3SGJujsyOSBBrdPfjPHReBWrX0VRHl/Wsd4RWCaDT8H1EdNONQ+to
 lQ+JvTZgFwQB2GABjNB6
 =n1ir
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Unfortunately, we introduced some big-endian bugs during the last
  merge window.  Fortunately, Cai and Christian noticed before 3.9
  shipped."

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix big-endian bugs which could cause fs corruptions
2013-04-03 11:21:13 -07:00
Linus Torvalds cd0e4a9dd4 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull reiserfs fix from Jan Kara:
 "A fix for reiserfs xattr bug exposed by changes to lookup_one_len()"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  reiserfs: Fix warning and inode leak when deleting inode with xattrs
2013-04-03 10:49:27 -07:00
Zheng Liu 8cde7ad17e ext4: fix big-endian bugs which could cause fs corruptions
When an extent was zeroed out, we forgot to do convert from cpu to le16.
It could make us hit a BUG_ON when we try to write dirty pages out.  So
fix it.

[ Also fix a bug found by Dmitry Monakhov where we were missing
  le32_to_cpu() calls in the new indirect punch hole code.

  There are a number of other big endian warnings found by static code
  analyzers, but we'll wait for the next merge window to fix them all
  up.  These fixes are designed to be Obviously Correct by code
  inspection, and easy to demonstrate that it won't make any
  difference (and hence, won't introduce any bugs) on little endian
  architectures such as x86.  --tytso ]

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: CAI Qian <caiqian@redhat.com>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>
2013-04-03 12:37:17 -04:00
Linus Torvalds f8e9248dbb Merge branch 'for-3.9' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfix from J Bruce Fields:
 "An xdr decoding error--thanks, Toralf Förster, and Trinity!"

* 'for-3.9' of git://linux-nfs.org/~bfields/linux:
  nfsd4: reject "negative" acl lengths
2013-04-02 07:56:20 -07:00
Anatol Pomozov c1681bf8a7 loop: prevent bdev freeing while device in use
struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
block_device allocated first time we access /dev/loopXX and deallocated on
bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
we want that block_device stay alive until we destroy the loop device
with "losetup -d".

But because we do not hold /dev/loopXX inode its counter goes 0, and
inode/bdev can be destroyed at any moment. Usually it happens at memory
pressure or when user drops inode cache (like in the test below). When later in
loop_clr_fd() we want to use bdev we have use-after-free error with following
stack:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
  bd_set_size+0x10/0xa0
  loop_clr_fd+0x1f8/0x420 [loop]
  lo_ioctl+0x200/0x7e0 [loop]
  lo_compat_ioctl+0x47/0xe0 [loop]
  compat_blkdev_ioctl+0x341/0x1290
  do_filp_open+0x42/0xa0
  compat_sys_ioctl+0xc1/0xf20
  do_sys_open+0x16e/0x1d0
  sysenter_dispatch+0x7/0x1a

To prevent use-after-free we need to grab the device in loop_set_fd()
and put it later in loop_clr_fd().

The issue is reprodusible on current Linus head and v3.3. Here is the test:

  dd if=/dev/zero of=loop.file bs=1M count=1
  while [ true ]; do
    losetup /dev/loop0 loop.file
    echo 2 > /proc/sys/vm/drop_caches
    losetup -d /dev/loop0
  done

[ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every
  time we call loop_set_fd() we check that loop_device->lo_state is
  Lo_unbound and set it to Lo_bound If somebody will try to set_fd again
  it will get EBUSY.  And if we try to loop_clr_fd() on unbound loop
  device we'll get ENXIO.

  loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
  loop_device->lo_ctl_mutex. ]

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-01 15:48:47 -07:00
Linus Torvalds 3615db41c4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "We've had a busy two weeks of bug fixing.  The biggest patches in here
  are some long standing early-enospc problems (Josef) and a very old
  race where compression and mmap combine forces to lose writes (me).
  I'm fairly sure the mmap bug goes all the way back to the introduction
  of the compression code, which is proof that fsx doesn't trigger every
  possible mmap corner after all.

  I'm sure you'll notice one of these is from this morning, it's a small
  and isolated use-after-free fix in our scrub error reporting.  I
  double checked it here."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: don't drop path when printing out tree errors in scrub
  Btrfs: fix wrong return value of btrfs_lookup_csum()
  Btrfs: fix wrong reservation of csums
  Btrfs: fix double free in the btrfs_qgroup_account_ref()
  Btrfs: limit the global reserve to 512mb
  Btrfs: hold the ordered operations mutex when waiting on ordered extents
  Btrfs: fix space accounting for unlink and rename
  Btrfs: fix space leak when we fail to reserve metadata space
  Btrfs: fix EIO from btrfs send in is_extent_unchanged for punched holes
  Btrfs: fix race between mmap writes and compression
  Btrfs: fix memory leak in btrfs_create_tree()
  Btrfs: fix locking on ROOT_REPLACE operations in tree mod log
  Btrfs: fix missing qgroup reservation before fallocating
  Btrfs: handle a bogus chunk tree nicely
  Btrfs: update to use fs_state bit
2013-03-29 11:13:25 -07:00
Jan Kara 35e5cbc0af reiserfs: Fix warning and inode leak when deleting inode with xattrs
After commit 21d8a15a (lookup_one_len: don't accept . and ..) reiserfs
started failing to delete xattrs from inode. This was due to a buggy
test for '.' and '..' in fill_with_dentries() which resulted in passing
'.' and '..' entries to lookup_one_len() in some cases. That returned
error and so we failed to iterate over all xattrs of and inode.

Fix the test in fill_with_dentries() along the lines of the one in
lookup_one_len().

Reported-by: Pawel Zawora <pzawora@gmail.com>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-29 17:08:43 +01:00