Commit Graph

3368 Commits

Author SHA1 Message Date
Linus Torvalds 29be6345bb NFS client bugfixes
- Stable fix for a NFSv4.1 delegation and state recovery deadlock
 - Stable fix for a loop on irrecoverable errors when returning delegations
 - Fix a 3-way deadlock between layoutreturn, open, and state recovery
 - Update the MAINTAINERS file with contact information for Trond Myklebust
 - Close needs to handle NFS4ERR_ADMIN_REVOKED
 - Enabling v4.2 should not recompile nfsd and lockd
 - Fix a couple of compile warnings
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJSoLTpAAoJEGcL54qWCgDy2dgQAIKkKAXccg3OG2b1SxJmiaja
 PcrovNmgg3HvYQ7clUMqtrMByiXEpSybl6tAeXYUWE3sS1DISSBVEwO3MoOiASiM
 951Ssx+CoyhsHYo5aH83sUIiWFl/YsRhpKmSr2cdQd13DQTFbPq896k64Inf6L2/
 9fngoqOD7FunQHn8AiVPoDOQzObB0OuKhYCwuwLt47oPiwgmm12JQNCDxU1i4sxb
 lkGUBLkPMs6D5IyI8XHaMyX3+8MvmPiIsjIKaNJRdhkuX/k7ollucTJXyvyEQKK0
 PhBIWyUULmKcAXYwCfHf9UoyGZFvmj47YggyKcBd26OZUEFekcWrULfym46F1xak
 EcO6D4mlTy5i5W0RBqYCj1oGud57rixZBmhLTbeq6sSJaiqBfGEs225Q17H7rsEB
 YIghHiEFNnBmVWELhHxbJHQoY6HOugmZOuc0dxopaikN/7to8gnYoVyTIVlMfe/t
 UNXZoer6GOOohJGtZ7s7v4Al7EzvwnVnBCBklEAKFJ7Ca2LEmq+b58oQW3nJ1mPn
 y4TnihxYXsSEbqy+Lds9rumRhJLG1oVTpwficAm7N3HdK3abzCIPEt6iOHoCmXQz
 J1B4gmwOKsDqVlCSpBsnc3ZiBlSJGOn6MmVQUCNFpzv/DetWn/BxEUPE8cNm8DaI
 WioD0grC0/9bR8oD1m+w
 =UZ51
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.13-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - Stable fix for a NFSv4.1 delegation and state recovery deadlock
 - Stable fix for a loop on irrecoverable errors when returning
   delegations
 - Fix a 3-way deadlock between layoutreturn, open, and state recovery
 - Update the MAINTAINERS file with contact information for Trond
   Myklebust
 - Close needs to handle NFS4ERR_ADMIN_REVOKED
 - Enabling v4.2 should not recompile nfsd and lockd
 - Fix a couple of compile warnings

* tag 'nfs-for-3.13-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: fix do_div() warning by instead using sector_div()
  MAINTAINERS: Update contact information for Trond Myklebust
  NFSv4.1: Prevent a 3-way deadlock between layoutreturn, open and state recovery
  SUNRPC: do not fail gss proc NULL calls with EACCES
  NFSv4: close needs to handle NFS4ERR_ADMIN_REVOKED
  NFSv4: Update list of irrecoverable errors on DELEGRETURN
  NFSv4 wait on recovery for async session errors
  NFS: Fix a warning in nfs_setsecurity
  NFS: Enabling v4.2 should not recompile nfsd and lockd
2013-12-05 13:05:48 -08:00
Helge Deller 3873d064b8 nfs: fix do_div() warning by instead using sector_div()
When compiling a 32bit kernel with CONFIG_LBDAF=n the compiler complains like
shown below.  Fix this warning by instead using sector_div() which is provided
by the kernel.h header file.

fs/nfs/blocklayout/extents.c: In function ‘normalize’:
include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer types lacks a cast [enabled by default]
fs/nfs/blocklayout/extents.c:47:13: note: in expansion of macro ‘do_div’
nfs/blocklayout/extents.c:47:2: warning: right shift count >= width of type [enabled by default]
fs/nfs/blocklayout/extents.c:47:2: warning: passing argument 1 of ‘__div64_32’ from incompatible pointer type [enabled by default]
include/asm-generic/div64.h:35:17: note: expected ‘uint64_t *’ but argument is of type ‘sector_t *’
 extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor);

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-12-04 12:57:37 -05:00
Trond Myklebust f22e5edd22 NFSv4.1: Prevent a 3-way deadlock between layoutreturn, open and state recovery
Andy Adamson reports:

The state manager is recovering expired state and recovery OPENs are being
processed. If kswapd is pruning inodes at the same time, a deadlock can occur
when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the
resultant layoutreturn gets an error that the state mangager is to handle,
causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq.

At the same time an open is waiting for the inode deletion to complete in
__wait_on_freeing_inode.

If the open is either the open called by the state manager, or an open from
the same open owner that is holding the NFSv4 sequence id which causes the
OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue,
then the state is deadlocked with kswapd.

The fix is simply to have layoutreturn ignore all errors except NFS4ERR_DELAY.
We already know that layouts are dropped on all server reboots, and that
it has to be coded to deal with the "forgetful client model" that doesn't
send layoutreturns.

Reported-by: Andy Adamson <andros@netapp.com>
Link: http://lkml.kernel.org/r/1385402270-14284-1-git-send-email-andros@netapp.com
Signed-off-by: Trond Myklebust <Trond.Myklebust@primarydata.com>
2013-12-04 12:32:19 -05:00
Kent Overstreet 4f024f3797 block: Abstract out bvec iterator
Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Alex Elder <elder@inktank.com>
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris <josh.h.morris@us.ibm.com>
Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Cc: Boaz Harrosh <bharrosh@panasas.com>
Cc: Benny Halevy <bhalevy@tonian.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchand@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Peng Tao <tao.peng@emc.com>
Cc: Andy Adamson <andros@netapp.com>
Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
Cc: Jie Liu <jeff.liu@oracle.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Pankaj Kumar <pankaj.km@samsung.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Mel Gorman <mgorman@suse.de>6
2013-11-23 22:33:47 -08:00
Kent Overstreet 2c30c71bd6 block: Convert various code to bio_for_each_segment()
With immutable biovecs we don't want code accessing bi_io_vec directly -
the uses this patch changes weren't incorrect since they all own the
bio, but it makes the code harder to audit for no good reason - also,
this will help with multipage bvecs later.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-23 22:33:46 -08:00
Trond Myklebust 69794ad70c NFSv4: close needs to handle NFS4ERR_ADMIN_REVOKED
Also ensure that we zero out the stateid mode when exiting

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-20 15:55:39 -05:00
Trond Myklebust c97cf606e4 NFSv4: Update list of irrecoverable errors on DELEGRETURN
If the DELEGRETURN errors out with something like NFS4ERR_BAD_STATEID
then there is no recovery possible. Just quit without returning an error.

Also, note that the client must not assume that the NFSv4 lease has been
renewed when it sees an error on DELEGRETURN.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-11-20 15:54:27 -05:00
Andy Adamson 4a82fd7c4e NFSv4 wait on recovery for async session errors
When the state manager is processing the NFS4CLNT_DELEGRETURN flag, session
draining is off, but DELEGRETURN can still get a session error.
The async handler calls nfs4_schedule_session_recovery returns -EAGAIN, and
the DELEGRETURN done then restarts the RPC task in the prepare state.
With the state manager still processing the NFS4CLNT_DELEGRETURN flag with
session draining off, these DELEGRETURNs will cycle with errors filling up the
session slots.

This prevents OPEN reclaims (from nfs_delegation_claim_opens) required by the
NFS4CLNT_DELEGRETURN state manager processing from completing, hanging the
state manager in the __rpc_wait_for_completion_task in nfs4_run_open_task
as seen in this kernel thread dump:

kernel: 4.12.32.53-ma D 0000000000000000     0  3393      2 0x00000000
kernel: ffff88013995fb60 0000000000000046 ffff880138cc5400 ffff88013a9df140
kernel: ffff8800000265c0 ffffffff8116eef0 ffff88013fc10080 0000000300000001
kernel: ffff88013a4ad058 ffff88013995ffd8 000000000000fbc8 ffff88013a4ad058
kernel: Call Trace:
kernel: [<ffffffff8116eef0>] ? cache_alloc_refill+0x1c0/0x240
kernel: [<ffffffffa0358110>] ? rpc_wait_bit_killable+0x0/0xa0 [sunrpc]
kernel: [<ffffffffa0358152>] rpc_wait_bit_killable+0x42/0xa0 [sunrpc]
kernel: [<ffffffff8152914f>] __wait_on_bit+0x5f/0x90
kernel: [<ffffffffa0358110>] ? rpc_wait_bit_killable+0x0/0xa0 [sunrpc]
kernel: [<ffffffff815291f8>] out_of_line_wait_on_bit+0x78/0x90
kernel: [<ffffffff8109b520>] ? wake_bit_function+0x0/0x50
kernel: [<ffffffffa035810d>] __rpc_wait_for_completion_task+0x2d/0x30 [sunrpc]
kernel: [<ffffffffa040d44c>] nfs4_run_open_task+0x11c/0x160 [nfs]
kernel: [<ffffffffa04114e7>] nfs4_open_recover_helper+0x87/0x120 [nfs]
kernel: [<ffffffffa0411646>] nfs4_open_recover+0xc6/0x150 [nfs]
kernel: [<ffffffffa040cc6f>] ? nfs4_open_recoverdata_alloc+0x2f/0x60 [nfs]
kernel: [<ffffffffa0414e1a>] nfs4_open_delegation_recall+0x6a/0xa0 [nfs]
kernel: [<ffffffffa0424020>] nfs_end_delegation_return+0x120/0x2e0 [nfs]
kernel: [<ffffffff8109580f>] ? queue_work+0x1f/0x30
kernel: [<ffffffffa0424347>] nfs_client_return_marked_delegations+0xd7/0x110 [nfs]
kernel: [<ffffffffa04225d8>] nfs4_run_state_manager+0x548/0x620 [nfs]
kernel: [<ffffffffa0422090>] ? nfs4_run_state_manager+0x0/0x620 [nfs]
kernel: [<ffffffff8109b0f6>] kthread+0x96/0xa0
kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
kernel: [<ffffffff8109b060>] ? kthread+0x0/0xa0
kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20

The state manager can not therefore process the DELEGRETURN session errors.
Change the async handler to wait for recovery on session errors.

Signed-off-by: Andy Adamson <andros@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-20 15:54:08 -05:00
Trond Myklebust 829e57d7c5 NFS: Fix a warning in nfs_setsecurity
Fix the following warning:

linux-nfs/fs/nfs/inode.c:315:1: warning: ‘inline’ is not at
beginning of declaration [-Wold-style-declaration]

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-19 16:20:41 -05:00
Anna Schumaker 694e096fd7 NFS: Enabling v4.2 should not recompile nfsd and lockd
When CONFIG_NFS_V4_2 is toggled nfsd and lockd will be recompiled,
instead of only the nfs client.  This patch moves a small amount of code
into the client directory to avoid unnecessary recompiles.

Signed-off-by: Anna Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-19 16:20:40 -05:00
Linus Torvalds 673fdfe3f0 NFS client bugfixes:
- Stable fix for data corruption when retransmitting O_DIRECT writes
 - Stable fix for a deep recursion/stack overflow bug in rpc_release_client
 - Stable fix for infinite looping when mounting a NFSv4.x volume
 - Fix a typo in the nfs mount option parser
 - Allow pNFS layouts to be compiled into the kernel when NFSv4.1 is
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJSh95hAAoJEGcL54qWCgDy1wgP/1zc4C7sMBQFWpIo676MHT4n
 m5v4bWgYhRBC0dne5GG8dC4+Q2cPkua4H7cWHCJKQmMuDmbzgOB33RVyQdwU/YNp
 ItLIZLz2EySCKo8OOKvbf4l5jDFeoBYEbheB2bmcE42BgixaTbiHKXpgCtoHr5pT
 qOX0JI29QtstAY3heiLW52bA3OqNJGwfE595KKEHXZwcD0n8izjqOU7Vrqj0E8/Q
 S+Xw9a613fo7chzbdcugR+iW6kkr7qtjxXiI5OXvplGyHycbBJRfvAqHkg01Z69k
 At9Y43cTEFiEx/zfKflmiFkn+IF9xFhABYNCKvpTtLFvQkwJDfYHa6h2jrFac/87
 mTRZHIzJ0nghhE1VxOEjA2zvIE3Hd5Xk4By+2BKJaB/Tp0RPbSsHs7t0s8t7RdHi
 ZwP/bNDynZY3S+HlbMor3A3900bUXLQBpCpRt/0+Hvc5bGLRszA5/Jinv+EqwOT9
 LHXTE/CsQGJCOz72SjDZT4Gsa0t11UKdRpznk4XCEvH9tflK78nS32XUktZEC9u/
 bCycLbvX+LrquxjQ9WN2TCmwnwyEiv45tSK2b8gf8JS1zJmePDKdnQ1dpHbiZAIO
 uhEhAqDwAY64+T2+AGncITh8ZfthZhU6wkfGoepqYvC1/5AaeSWrFidDvE1NJUGh
 xjcsGH6Ym8NnnT3rt/qp
 =uOIM
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes:
 - Stable fix for data corruption when retransmitting O_DIRECT writes
 - Stable fix for a deep recursion/stack overflow bug in rpc_release_client
 - Stable fix for infinite looping when mounting a NFSv4.x volume
 - Fix a typo in the nfs mount option parser
 - Allow pNFS layouts to be compiled into the kernel when NFSv4.1 is

* tag 'nfs-for-3.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: fix pnfs Kconfig defaults
  NFS: correctly report misuse of "migration" mount option.
  nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than once
  SUNRPC: Avoid deep recursion in rpc_release_client
  SUNRPC: Fix a data corruption issue when retransmitting RPC calls
2013-11-16 13:14:56 -08:00
Christoph Hellwig 8c2fabc654 nfs: fix pnfs Kconfig defaults
Defaulting to m seem to prevent building the pnfs layout modules into the
kernel.  Default to the value of CONFIG_NFS_V4 make sure they are
built in for built-in NFSv4 support and modular for a modular NFSv4.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-15 13:41:43 -05:00
NeilBrown 9e08ef1afb NFS: correctly report misuse of "migration" mount option.
The current test on valid use of the "migration" mount option can never
report an error as it will only do so if
    mnt->version !=4 && mnt->minor_version != 0
(and some other condition), but if that test would succeed, then the previous
test has already gone-to  out_minorversion_mismatch.

So change the && to an || to get correct semantics.

Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-15 13:41:43 -05:00
Wolfram Sang 16735d022f tree-wide: use reinit_completion instead of INIT_COMPLETION
Use this new function to make code more comprehensible, since we are
reinitialzing the completion, not initializing.

[akpm@linux-foundation.org: linux-next resyncs]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org> (personally at LCE13)
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15 09:32:21 +09:00
Jeff Layton 6d769f1e14 nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than once
Currently, when we try to mount and get back NFS4ERR_CLID_IN_USE or
NFS4ERR_WRONGSEC, we create a new rpc_clnt and then try the call again.
There is no guarantee that doing so will work however, so we can end up
retrying the call in an infinite loop.

Worse yet, we create the new client using rpc_clone_client_set_auth,
which creates the new client as a child of the old one. Thus, we can end
up with a *very* long lineage of rpc_clnts. When we go to put all of the
references to them, we can end up with a long call chain that can smash
the stack as each rpc_free_client() call can recurse back into itself.

This patch fixes this by simply ensuring that the SETCLIENTID call will
only be retried in this situation if the last attempt did not use
RPC_AUTH_UNIX.

Note too that with this change, we don't need the (i > 2) check in the
-EACCES case since we now have a more reliable test as to whether we
should reattempt.

Cc: stable@vger.kernel.org # v3.10+
Cc: Chuck Lever <chuck.lever@oracle.com>
Tested-by/Acked-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-13 19:21:05 -05:00
Linus Torvalds 9bc9ccd7db Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "All kinds of stuff this time around; some more notable parts:

   - RCU'd vfsmounts handling
   - new primitives for coredump handling
   - files_lock is gone
   - Bruce's delegations handling series
   - exportfs fixes

  plus misc stuff all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
  ecryptfs: ->f_op is never NULL
  locks: break delegations on any attribute modification
  locks: break delegations on link
  locks: break delegations on rename
  locks: helper functions for delegation breaking
  locks: break delegations on unlink
  namei: minor vfs_unlink cleanup
  locks: implement delegations
  locks: introduce new FL_DELEG lock flag
  vfs: take i_mutex on renamed file
  vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
  vfs: don't use PARENT/CHILD lock classes for non-directories
  vfs: pull ext4's double-i_mutex-locking into common code
  exportfs: fix quadratic behavior in filehandle lookup
  exportfs: better variable name
  exportfs: move most of reconnect_path to helper function
  exportfs: eliminate unused "noprogress" counter
  exportfs: stop retrying once we race with rename/remove
  exportfs: clear DISCONNECTED on all parents sooner
  exportfs: more detailed comment for path_reconnect
  ...
2013-11-13 15:34:18 +09:00
Trond Myklebust fab99ebe39 NFSv4.2: Remove redundant checks in nfs_setsecurity+nfs4_label_init_security
We already check for nfs_server_capable(inode, NFS_CAP_SECURITY_LABEL)
in nfs4_label_alloc()
We check the minor version in _nfs4_server_capabilities before setting
NFS_CAP_SECURITY_LABEL.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-04 16:42:52 -05:00
Trond Myklebust b944dba31d NFSv4: Sanity check the server reply in _nfs4_server_capabilities
We don't want to be setting capabilities and/or requesting attributes
that are not appropriate for the NFSv4 minor version.

- Ensure that we clear the NFS_CAP_SECURITY_LABEL capability when appropriate
- Ensure that we limit the attribute bitmasks to the mounted_on_fileid
  attribute and less for NFSv4.0
- Ensure that we limit the attribute bitmasks to suppattr_exclcreat and
  less for NFSv4.1
- Ensure that we limit it to change_sec_label or less for NFSv4.2

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-04 16:42:52 -05:00
Trond Myklebust d204c5d2b8 NFSv4.2: encode_readdir - only ask for labels when doing readdirplus
Currently, if the server is doing NFSv4.2 and supports labeled NFS, then
our on-the-wire READDIR request ends up asking for the label information,
which is then ignored unless we're doing readdirplus.
This patch ensures that READDIR doesn't ask the server for label information
at all unless the readdir->bitmask contains the FATTR4_WORD2_SECURITY_LABEL
attribute, and the readdir->plus flag is set.

While we're at it, optimise away the 3rd bitmap field if it is zero.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-04 16:42:51 -05:00
Jeff Layton 3da580aab9 nfs: set security label when revalidating inode
Currently, we fetch the security label when revalidating an inode's
attributes, but don't apply it. This is in contrast to the readdir()
codepath where we do apply label changes.

Cc: Dave Quigley <dpquigl@davequigley.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-04 16:42:38 -05:00
Trond Myklebust fcb63a9bd8 NFS: Fix a missing initialisation when reading the SELinux label
Ensure that _nfs4_do_get_security_label() also initialises the
SEQUENCE call correctly, by having it call into nfs4_call_sync().

Reported-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-01 12:42:25 -04:00
Jeff Layton 12207f69b3 nfs: fix oops when trying to set SELinux label
Chao reported the following oops when testing labeled NFS:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffffa0568703>] nfs4_xdr_enc_setattr+0x43/0x110 [nfsv4]
PGD 277bbd067 PUD 2777ea067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache sg coretemp kvm_intel kvm crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul iTCO_wdt glue_helper ablk_helper cryptd iTCO_vendor_support bnx2 pcspkr serio_raw i7core_edac cdc_ether microcode usbnet edac_core mii lpc_ich i2c_i801 mfd_core shpchp ioatdma dca acpi_cpufreq mperf nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sr_mod sd_mod cdrom crc_t10dif mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ata_generic ttm pata_acpi drm ata_piix libata megaraid_sas i2c_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 4 PID: 25657 Comm: chcon Not tainted 3.10.0-33.el7.x86_64 #1
Hardware name: IBM System x3550 M3 -[7944OEJ]-/90Y4784     , BIOS -[D6E150CUS-1.11]- 02/08/2011
task: ffff880178397220 ti: ffff8801595d2000 task.ti: ffff8801595d2000
RIP: 0010:[<ffffffffa0568703>]  [<ffffffffa0568703>] nfs4_xdr_enc_setattr+0x43/0x110 [nfsv4]
RSP: 0018:ffff8801595d3888  EFLAGS: 00010296
RAX: 0000000000000000 RBX: ffff8801595d3b30 RCX: 0000000000000b4c
RDX: ffff8801595d3b30 RSI: ffff8801595d38e0 RDI: ffff880278b6ec00
RBP: ffff8801595d38c8 R08: ffff8801595d3b30 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801595d38e0
R13: ffff880277a4a780 R14: ffffffffa05686c0 R15: ffff8802765f206c
FS:  00007f2c68486800(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000027651a000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 ffff880277865800 ffff880278b6ec00 ffff880277a4a780
 ffff8801595d3948 ffffffffa02ad926 ffff8801595d3b30 ffff8802765f206c
Call Trace:
 [<ffffffffa02ad926>] rpcauth_wrap_req+0x86/0xd0 [sunrpc]
 [<ffffffffa02a1d40>] ? call_connect+0xb0/0xb0 [sunrpc]
 [<ffffffffa02a1d40>] ? call_connect+0xb0/0xb0 [sunrpc]
 [<ffffffffa02a1ecb>] call_transmit+0x18b/0x290 [sunrpc]
 [<ffffffffa02a1d40>] ? call_connect+0xb0/0xb0 [sunrpc]
 [<ffffffffa02aae14>] __rpc_execute+0x84/0x400 [sunrpc]
 [<ffffffffa02ac40e>] rpc_execute+0x5e/0xa0 [sunrpc]
 [<ffffffffa02a2ea0>] rpc_run_task+0x70/0x90 [sunrpc]
 [<ffffffffa02a2f03>] rpc_call_sync+0x43/0xa0 [sunrpc]
 [<ffffffffa055284d>] _nfs4_do_set_security_label+0x11d/0x170 [nfsv4]
 [<ffffffffa0558861>] nfs4_set_security_label.isra.69+0xf1/0x1d0 [nfsv4]
 [<ffffffff815fca8b>] ? avc_alloc_node+0x24/0x125
 [<ffffffff815fcd2f>] ? avc_compute_av+0x1a3/0x1b5
 [<ffffffffa055897b>] nfs4_xattr_set_nfs4_label+0x3b/0x50 [nfsv4]
 [<ffffffff811bc772>] generic_setxattr+0x62/0x80
 [<ffffffff811bcfc3>] __vfs_setxattr_noperm+0x63/0x1b0
 [<ffffffff811bd1c5>] vfs_setxattr+0xb5/0xc0
 [<ffffffff811bd2fe>] setxattr+0x12e/0x1c0
 [<ffffffff811a4d22>] ? final_putname+0x22/0x50
 [<ffffffff811a4f2b>] ? putname+0x2b/0x40
 [<ffffffff811aa1cf>] ? user_path_at_empty+0x5f/0x90
 [<ffffffff8119bc29>] ? __sb_start_write+0x49/0x100
 [<ffffffff811bd66f>] SyS_lsetxattr+0x8f/0xd0
 [<ffffffff8160cf99>] system_call_fastpath+0x16/0x1b
Code: 48 8b 02 48 c7 45 c0 00 00 00 00 48 c7 45 c8 00 00 00 00 48 c7 45 d0 00 00 00 00 48 c7 45 d8 00 00 00 00 48 c7 45 e0 00 00 00 00 <48> 8b 00 48 8b 00 48 85 c0 0f 84 ae 00 00 00 48 8b 80 b8 03 00
RIP  [<ffffffffa0568703>] nfs4_xdr_enc_setattr+0x43/0x110 [nfsv4]
 RSP <ffff8801595d3888>
CR2: 0000000000000000

The problem is that _nfs4_do_set_security_label calls rpc_call_sync()
directly which fails to do any setup of the SEQUENCE call. Have it use
nfs4_call_sync() instead which does the right thing. While we're at it
change the name of "args" to "arg" to better match the pattern in
_nfs4_do_setattr.

Reported-by: Chao Ye <cye@redhat.com>
Cc: David Quigley <dpquigl@davequigley.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-01 12:41:39 -04:00
Jeff Layton 1acd1c301f nfs: fix inverted test for delegation in nfs4_reclaim_open_state
commit 6686390bab (NFS: remove incorrect "Lock reclaim failed!"
warning.) added a test for a delegation before checking to see if any
reclaimed locks failed. The test however is backward and is only doing
that check when a delegation is held instead of when one isn't.

Cc: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Fixes: 6686390bab6a: NFS: remove incorrect "Lock reclaim failed!" warning.
Cc: stable@vger.kernel.org # 3.12
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-31 13:24:07 -04:00
Trond Myklebust c698dbf9fe Merge branch 'fscache' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into linux-next
Pull fs-cache fixes from David Howells:

Can you pull these commits to fix an issue with NFS whereby caching can be
enabled on a file that is open for writing by subsequently opening it for
reading.  This can be made to crash by opening it for writing again if you're
quick enough.

The gist of the patchset is that the cookie should be acquired at inode
creation only and subsequently enabled and disabled as appropriate (which
dispenses with the backing objects when they're not needed).

The extra synchronisation that NFS does can then be dispensed with as it is
thenceforth managed by FS-Cache.

Could you send these on to Linus?

This likely will need fixing also in CIFS and 9P also once the FS-Cache
changes are upstream.  AFS and Ceph are probably safe.

* 'fscache' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  NFS: Use i_writecount to control whether to get an fscache cookie in nfs_open()
  FS-Cache: Provide the ability to enable/disable cookies
  FS-Cache: Add use/unuse/wake cookie wrappers
2013-10-28 19:36:46 -04:00
J. Bruce Fields a3f432bfd0 nfs: use IS_ROOT not DCACHE_DISCONNECTED
This check was added by Al Viro with
d9e80b7de9 "nfs d_revalidate() is too
trigger-happy with d_drop()", with the explanation that we don't want to
remove the root of a disconnected tree, which will still be included on
the s_anon list.

But DCACHE_DISCONNECTED does *not* actually identify dentries that are
disconnected from the dentry tree or hashed on s_anon.  IS_ROOT() is the
way to do that.

Also add a comment from Al's commit to remind us why this check is
there.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 19:19:19 -04:00
Geyslan G. Bem 6706246b22 nfs: Use PTR_ERR_OR_ZERO in 'nfs/nfs4super.c'
Use 'PTR_ERR_OR_ZERO()' rather than 'IS_ERR(...) ? PTR_ERR(...) : 0'.

Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 18:16:56 -04:00
Geyslan G. Bem 54bcfa6682 nfs: Use PTR_ERR_OR_ZERO in 'nfs41_callback_up' function
Use 'PTR_ERR_OR_ZERO()' rather than 'IS_ERR(...) ? PTR_ERR(...) : 0'.

Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 18:16:55 -04:00
Geyslan G. Bem 4f5829d726 nfs: Remove useless 'error' assignment
the 'error' variable was been assigned twice in vain.

Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 18:16:55 -04:00
Weston Andros Adamson 4d4b69dd84 NFS: add support for multiple sec= mount options
This patch adds support for multiple security options which can be
specified using a colon-delimited list of security flavors (the same
syntax as nfsd's exports file).

This is useful, for instance, when NFSv4.x mounts cross SECINFO
boundaries. With this patch a user can use "sec=krb5i,krb5p"
to mount a remote filesystem using krb5i, but can still cross
into krb5p-only exports.

New mounts will try all security options before failing.  NFSv4.x
SECINFO results will be compared against the sec= flavors to
find the first flavor in both lists or if no match is found will
return -EPERM.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:38:02 -04:00
Weston Andros Adamson 5837f6dfcb NFS: stop using NFS_MOUNT_SECFLAVOUR server flag
Since the parsed sec= flavor is now stored in nfs_server->auth_info,
we no longer need an nfs_server flag to determine if a sec= option was
used.

This flag has not been completely removed because it is still needed for
the (old but still supported) non-text parsed mount options ABI
compatability.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:37:56 -04:00
Weston Andros Adamson 0f5f49b8b3 NFS: cache parsed auth_info in nfs_server
Cache the auth_info structure in nfs_server and pass these values to submounts.

This lays the groundwork for supporting multiple sec= options.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:37:43 -04:00
Weston Andros Adamson a3f73c27af NFS: separate passed security flavs from selected
When filling parsed_mount_data, store the parsed sec= mount option in
the new struct nfs_auth_info and the chosen flavor in selected_flavor.

This patch lays the groundwork for supporting multiple sec= options.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:36:58 -04:00
Weston Andros Adamson 47fd88e6b7 NFSv4: make nfs_find_best_sec static
It's not used outside of nfs4namespace.c anymore.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:33:34 -04:00
Chuck Lever 0625c2dd6a NFS: Fix possible endless state recovery wait
In nfs4_wait_clnt_recover(), hold a reference to the clp being
waited on.  The state manager can reduce clp->cl_count to 1, in
which case the nfs_put_client() in nfs4_run_state_manager() can
free *clp before wait_on_bit() returns and allows
nfs4_wait_clnt_recover() to run again.

The behavior at that point is non-deterministic.  If the waited-on
bit still happens to be zero, wait_on_bit() will wake the waiter as
expected.  If the bit is set again (say, if the memory was poisoned
when freed) wait_on_bit() can leave the waiter asleep.

This is a narrow fix which ensures the safety of accessing *clp in
nfs4_wait_clnt_recover(), but does not address the continued use
of a possibly freed *clp after nfs4_wait_clnt_recover() returns
(see nfs_end_delegation_return(), for example).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:31:55 -04:00
Chuck Lever cd3fadece2 NFS: Set EXCHGID4_FLAG_SUPP_MOVED_MIGR
Broadly speaking, v4.1 migration is untested.  There are no servers
in the wild that support NFSv4.1 migration.  However, as server
implementations become available, we do want to enable testing by
developers, while leaving it disabled for environments for which
broken migration support would be an unpleasant surprise.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:31:25 -04:00
Chuck Lever d1c2331e75 NFS: Handle SEQ4_STATUS_LEASE_MOVED
With the advent of NFSv4 sessions in NFSv4.1 and following, a "lease
moved" condition is reported differently than it is in NFSv4.0.

NFSv4 minor version 0 servers return an error status code,
NFS4ERR_LEASE_MOVED, to signal that a lease has moved.  This error
causes the whole compound operation to fail.  Normal compounds
against this server continue to fail until the client performs
migration recovery on the migrated share.

Minor version 1 and later servers assert a bit flag in the reply to
a compound's SEQUENCE operation to signal LEASE_MOVED.  This is not
a fatal condition: operations against this server continue normally.
The server asserts this flag until the client performs migration
recovery on the migrated share.

Note that servers MUST NOT return NFS4ERR_LEASE_MOVED to NFSv4
clients not using NFSv4.0.

After the server asserts any of the sr_status_flags in the SEQUENCE
operation in a typical compound, our client initiates standard lease
recovery.  For NFSv4.1+, a stand-alone SEQUENCE operation is
performed to discover what recovery is needed.

If SEQ4_STATUS_LEASE_MOVED is asserted in this stand-alone SEQUENCE
operation, our client attempts to discover which FSIDs have been
migrated, and then performs migration recovery on each.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:31:07 -04:00
Chuck Lever f8aba1e8d5 NFS: Handle NFS4ERR_LEASE_MOVED during async RENEW
With NFSv4 minor version 0, the asynchronous lease RENEW
heartbeat can return NFS4ERR_LEASE_MOVED.  Error recovery logic for
async RENEW is a separate code path from the generic NFS proc paths,
so it must be updated to handle NFS4ERR_LEASE_MOVED as well.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:30:52 -04:00
Chuck Lever 60ea681299 NFS: Migration support for RELEASE_LOCKOWNER
Currently the Linux NFS client ignores the operation status code for
the RELEASE_LOCKOWNER operation.  Like NFSv3's UMNT operation,
RELEASE_LOCKOWNER is a courtesy to help servers manage their
resources, and the outcome is not consequential for the client.

During a migration, a server may report NFS4ERR_LEASE_MOVED, in
which case the client really should retry, since typically
LEASE_MOVED has nothing to do with the current operation, but does
prevent it from going forward.

Also, it's important for a client to respond as soon as possible to
a moved lease condition, since the client's lease could expire on
the destination without further action by the client.

NFS4ERR_DELAY is not included in the list of valid status codes for
RELEASE_LOCKOWNER in RFC 3530bis.  However, rfc3530-migration-update
does permit migration-capable servers to return DELAY to clients,
but only in the context of an ongoing migration.  In this case the
server has frozen lock state in preparation for migration, and a
client retry would help the destination server purge unneeded state
once migration recovery is complete.

Interestly, NFS4ERR_MOVED is not valid for RELEASE_LOCKOWNER, even
though lock owners can be migrated with Transparent State Migration.

Note that RFC 3530bis section 9.5 includes RELEASE_LOCKOWNER in the
list of operations that renew a client's lease on the server if they
succeed.  Now that our client pays attention to the operation's
status code, we can note that renewal appropriately.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:30:46 -04:00
Chuck Lever 8ef2f8d46a NFS: Implement support for NFS4ERR_LEASE_MOVED
Trigger lease-moved recovery when a request returns
NFS4ERR_LEASE_MOVED.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:30:27 -04:00
Chuck Lever b7f7a66e42 NFS: Support NFS4ERR_LEASE_MOVED recovery in state manager
A migration on the FSID in play for the current NFS operation
is reported via the error status code NFS4ERR_MOVED.

"Lease moved" means that a migration has occurred on some other
FSID than the one for the current operation.  It's a signal that
the client should take action immediately to handle a migration
that it may not have noticed otherwise.  This is so that the
client's lease does not expire unnoticed on the destination server.

In NFSv4.0, a moved lease is reported with the NFS4ERR_LEASE_MOVED
error status code.

To recover from NFS4ERR_LEASE_MOVED, check each FSID for that server
to see if it is still present.  Invoke nfs4_try_migration() if the
FSID is no longer present on the server.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:30:21 -04:00
Chuck Lever 44c9993384 NFS: Add method to detect whether an FSID is still on the server
Introduce a mechanism for probing a server to determine if an FSID
is present or absent.

The on-the-wire compound is different between minor version 0 and 1.
Minor version 0 appends a RENEW operation to identify which client
ID is probing.  Minor version 1 has a SEQUENCE operation in the
compound which effectively carries the same information.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:30:03 -04:00
Chuck Lever 352297b917 NFS: Handle NFS4ERR_MOVED during delegation recall
When a server returns NFS4ERR_MOVED during a delegation recall,
trigger the new migration recovery logic in the state manager.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:25:30 -04:00
Chuck Lever 519ae255d4 NFS: Add migration recovery callouts in nfs4proc.c
When a server returns NFS4ERR_MOVED, trigger the new migration
recovery logic in the state manager.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:25:23 -04:00
Chuck Lever 9f51a78e3a NFS: Rename "stateid_invalid" label
I'm going to use this exit label also for migration recovery
failures.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:25:10 -04:00
Chuck Lever f1478c13c0 NFS: Re-use exit code in nfs4_async_handle_error()
Clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:24:55 -04:00
Chuck Lever c9fdeb280b NFS: Add basic migration support to state manager thread
Migration recovery and state recovery must be serialized, so handle
both in the state manager thread.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:24:40 -04:00
Chuck Lever ce6cda1845 NFS: Add a super_block backpointer to the nfs_server struct
NFS_SB() returns the pointer to an nfs_server struct, given a
pointer to a super_block.  But we have no way to go back the other
way.

Add a super_block backpointer field so that, given an nfs_server
struct, it is easy to get to the filesystem's root dentry.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:24:26 -04:00
Chuck Lever b03d735b4c NFS: Add method to retrieve fs_locations during migration recovery
The nfs4_proc_fs_locations() function is invoked during referral
processing to perform a GETATTR(fs_locations) on an object's parent
directory in order to discover the target of the referral.  It
performs a LOOKUP in the compound, so the client needs to know the
parent's file handle a priori.

Unfortunately this function is not adequate for handling migration
recovery.  We need to probe fs_locations information on an FSID, but
there's no parent directory available for many operations that
can return NFS4ERR_MOVED.

Another subtlety: recovering from NFS4ERR_LEASE_MOVED is a process
of walking over a list of known FSIDs that reside on the server, and
probing whether they have migrated.  Once the server has detected
that the client has probed all migrated file systems, it stops
returning NFS4ERR_LEASE_MOVED.

A minor version zero server needs to know what client ID is
requesting fs_locations information so it can clear the flag that
forces it to continue returning NFS4ERR_LEASE_MOVED.  This flag is
set per client ID and per FSID.  However, the client ID is not an
argument of either the PUTFH or GETATTR operations.  Later minor
versions have client ID information embedded in the compound's
SEQUENCE operation.

Therefore, by convention, minor version zero clients send a RENEW
operation in the same compound as the GETATTR(fs_locations), since
RENEW's one argument is a clientid4.  This allows a minor version
zero server to identify correctly the client that is probing for a
migration.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:24:00 -04:00
Chuck Lever 9e6ee76dfb NFS: Export _nfs_display_fhandle()
Allow code in nfsv4.ko to use _nfs_display_fhandle().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:23:35 -04:00
Chuck Lever ec011fe847 NFS: Introduce a vector of migration recovery ops
The differences between minor version 0 and minor version 1
migration will be abstracted by the addition of a set of migration
recovery ops.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:23:17 -04:00
Chuck Lever 800c06a5bf NFS: Add functions to swap transports during migration recovery
Introduce functions that can walk through an array of returned
fs_locations information and connect a transport to one of the
destination servers listed therein.

Note that NFS minor version 1 introduces "fs_locations_info" which
extends the locations array sorting criteria available to clients.
This is not supported yet.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:23:07 -04:00
Chuck Lever 32e62b7c3e NFS: Add nfs4_update_server
New function nfs4_update_server() moves an nfs_server to a different
nfs_client.  This is done as part of migration recovery.

Though it may be appealing to think of them as the same thing,
migration recovery is not the same as following a referral.

For a referral, the client has not descended into the file system
yet: it has no nfs_server, no super block, no inodes or open state.
It is enough to simply instantiate the nfs_server and super block,
and perform a referral mount.

For a migration, however, we have all of those things already, and
they have to be moved to a different nfs_client.  No local namespace
changes are needed here.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:22:29 -04:00
Weston Andros Adamson d2bfda2e7a NFSv4: don't reprocess cached open CLAIM_PREVIOUS
Cached opens have already been handled by _nfs4_opendata_reclaim_to_nfs4_state
and can safely skip being reprocessed, but must still call update_open_stateid
to make sure that all active fmodes are recovered.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Cc: stable@vger.kernel.org # 3.7.x: f494a6071d3: NFSv4: fix NULL dereference
Cc: stable@vger.kernel.org # 3.7.x: a43ec98b72a: NFSv4: don't fail on missin
Cc: stable@vger.kernel.org # 3.7.x
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:10:56 -04:00
Trond Myklebust d49f042aee NFSv4: Fix state reference counting in _nfs4_opendata_reclaim_to_nfs4_state
Currently, if the call to nfs_refresh_inode fails, then we end up leaking
a reference count, due to the call to nfs4_get_open_state.
While we're at it, replace nfs4_get_open_state with a simple call to
atomic_inc(); there is no need to do a full lookup of the struct nfs_state
since it is passed as an argument in the struct nfs4_opendata, and
is already assigned to the variable 'state'.

Cc: stable@vger.kernel.org # 3.7.x: a43ec98b72a: NFSv4: don't fail on missing
Cc: stable@vger.kernel.org # 3.7.x
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 14:57:12 -04:00
Weston Andros Adamson a43ec98b72 NFSv4: don't fail on missing fattr in open recover
This is an unneeded check that could cause the client to fail to recover
opens.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 14:54:03 -04:00
Weston Andros Adamson f494a6071d NFSv4: fix NULL dereference in open recover
_nfs4_opendata_reclaim_to_nfs4_state doesn't expect to see a cached
open CLAIM_PREVIOUS, but this can happen. An example is when there are
RDWR openers and RDONLY openers on a delegation stateid. The recovery
path will first try an open CLAIM_PREVIOUS for the RDWR openers, this
marks the delegation as not needing RECLAIM anymore, so the open
CLAIM_PREVIOUS for the RDONLY openers will not actually send an rpc.

The NULL dereference is due to _nfs4_opendata_reclaim_to_nfs4_state
returning PTR_ERR(rpc_status) when !rpc_done. When the open is
cached, rpc_done == 0 and rpc_status == 0, thus
_nfs4_opendata_reclaim_to_nfs4_state returns NULL - this is unexpected
by callers of nfs4_opendata_to_nfs4_state().

This can be reproduced easily by opening the same file two times on an
NFSv4.0 mount with delegations enabled, once as RDWR and once as RDONLY then
sleeping for a long time.  While the files are held open, kick off state
recovery and this NULL dereference will be hit every time.

An example OOPS:

[   65.003602] BUG: unable to handle kernel NULL pointer dereference at 00000000
00000030
[   65.005312] IP: [<ffffffffa037d6ee>] __nfs4_close+0x1e/0x160 [nfsv4]
[   65.006820] PGD 7b0ea067 PUD 791ff067 PMD 0
[   65.008075] Oops: 0000 [#1] SMP
[   65.008802] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache
snd_ens1371 gameport nfsd snd_rawmidi snd_ac97_codec ac97_bus btusb snd_seq snd
_seq_device snd_pcm ppdev bluetooth auth_rpcgss coretemp snd_page_alloc crc32_pc
lmul crc32c_intel ghash_clmulni_intel microcode rfkill nfs_acl vmw_balloon serio
_raw snd_timer lockd parport_pc e1000 snd soundcore parport i2c_piix4 shpchp vmw
_vmci sunrpc ata_generic mperf pata_acpi mptspi vmwgfx ttm scsi_transport_spi dr
m mptscsih mptbase i2c_core
[   65.018684] CPU: 0 PID: 473 Comm: 192.168.10.85-m Not tainted 3.11.2-201.fc19
.x86_64 #1
[   65.020113] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
Reference Platform, BIOS 6.00 07/31/2013
[   65.022012] task: ffff88003707e320 ti: ffff88007b906000 task.ti: ffff88007b906000
[   65.023414] RIP: 0010:[<ffffffffa037d6ee>]  [<ffffffffa037d6ee>] __nfs4_close+0x1e/0x160 [nfsv4]
[   65.025079] RSP: 0018:ffff88007b907d10  EFLAGS: 00010246
[   65.026042] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   65.027321] RDX: 0000000000000050 RSI: 0000000000000001 RDI: 0000000000000000
[   65.028691] RBP: ffff88007b907d38 R08: 0000000000016f60 R09: 0000000000000000
[   65.029990] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[   65.031295] R13: 0000000000000050 R14: 0000000000000000 R15: 0000000000000001
[   65.032527] FS:  0000000000000000(0000) GS:ffff88007f600000(0000) knlGS:0000000000000000
[   65.033981] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   65.035177] CR2: 0000000000000030 CR3: 000000007b27f000 CR4: 00000000000407f0
[   65.036568] Stack:
[   65.037011]  0000000000000000 0000000000000001 ffff88007b907d90 ffff88007a880220
[   65.038472]  ffff88007b768de8 ffff88007b907d48 ffffffffa037e4a5 ffff88007b907d80
[   65.039935]  ffffffffa036a6c8 ffff880037020e40 ffff88007a880000 ffff880037020e40
[   65.041468] Call Trace:
[   65.042050]  [<ffffffffa037e4a5>] nfs4_close_state+0x15/0x20 [nfsv4]
[   65.043209]  [<ffffffffa036a6c8>] nfs4_open_recover_helper+0x148/0x1f0 [nfsv4]
[   65.044529]  [<ffffffffa036a886>] nfs4_open_recover+0x116/0x150 [nfsv4]
[   65.045730]  [<ffffffffa036d98d>] nfs4_open_reclaim+0xad/0x150 [nfsv4]
[   65.046905]  [<ffffffffa037d979>] nfs4_do_reclaim+0x149/0x5f0 [nfsv4]
[   65.048071]  [<ffffffffa037e1dc>] nfs4_run_state_manager+0x3bc/0x670 [nfsv4]
[   65.049436]  [<ffffffffa037de20>] ? nfs4_do_reclaim+0x5f0/0x5f0 [nfsv4]
[   65.050686]  [<ffffffffa037de20>] ? nfs4_do_reclaim+0x5f0/0x5f0 [nfsv4]
[   65.051943]  [<ffffffff81088640>] kthread+0xc0/0xd0
[   65.052831]  [<ffffffff81088580>] ? insert_kthread_work+0x40/0x40
[   65.054697]  [<ffffffff8165686c>] ret_from_fork+0x7c/0xb0
[   65.056396]  [<ffffffff81088580>] ? insert_kthread_work+0x40/0x40
[   65.058208] Code: 5c 41 5d 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 89 d5 41 54 53 48 89 fb <4c> 8b 67 30 f0 41 ff 44 24 44 49 8d 7c 24 40 e8 0e 0a 2d e1 44
[   65.065225] RIP  [<ffffffffa037d6ee>] __nfs4_close+0x1e/0x160 [nfsv4]
[   65.067175]  RSP <ffff88007b907d10>
[   65.068570] CR2: 0000000000000030
[   65.070098] ---[ end trace 0d1fe4f5c7dd6f8b ]---

Cc: <stable@vger.kernel.org> #3.7+
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 14:53:32 -04:00
Trond Myklebust 83c78eb042 NFSv4.1: Don't change the security label as part of open reclaim.
The current caching model calls for the security label to be set on
first lookup and/or on any subsequent label changes. There is no
need to do it as part of an open reclaim.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 14:50:38 -04:00
Jeff Layton 1966903f8e nfs: fix handling of invalid mount options in nfs_remount
nfs_parse_mount_options returns 0 on error, not -errno.

Reported-by: Karel Zak <kzak@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 14:35:07 -04:00
Jeff Layton 57acc40d73 nfs: reject version and minorversion changes on remount attempts
Reported-by: Eric Doutreleau <edoutreleau@genoscope.cns.fr>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 14:30:23 -04:00
Andy Adamson 3660cd4322 NFSv4 Remove zeroing state kern warnings
As of commit 5d422301f9 we no longer zero the
state.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 14:28:53 -04:00
Al Viro 6de1472f1a nfs: use %p[dD] instead of open-coded (and often racy) equivalents
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:50 -04:00
Trond Myklebust 99875249bf NFSv4: Ensure that we disable the resend timeout for NFSv4
The spec states that the client should not resend requests because
the server will disconnect if it needs to drop an RPC request.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-01 18:22:11 -04:00
Trond Myklebust a6f951ddbd NFSv4: Fix a use-after-free situation in _nfs4_proc_getlk()
In nfs4_proc_getlk(), when some error causes a retry of the call to
_nfs4_proc_getlk(), we can end up with Oopses of the form

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000134
 IP: [<ffffffff8165270e>] _raw_spin_lock+0xe/0x30
<snip>
 Call Trace:
  [<ffffffff812f287d>] _atomic_dec_and_lock+0x4d/0x70
  [<ffffffffa053c4f2>] nfs4_put_lock_state+0x32/0xb0 [nfsv4]
  [<ffffffffa053c585>] nfs4_fl_release_lock+0x15/0x20 [nfsv4]
  [<ffffffffa0522c06>] _nfs4_proc_getlk.isra.40+0x146/0x170 [nfsv4]
  [<ffffffffa052ad99>] nfs4_proc_lock+0x399/0x5a0 [nfsv4]

The problem is that we don't clear the request->fl_ops after the first
try and so when we retry, nfs4_set_lock_state() exits early without
setting the lock stateid.
Regression introduced by commit 70cc6487a4
(locks: make ->lock release private data before returning in GETLK case)

Reported-by: Weston Andros Adamson <dros@netapp.com>
Reported-by: Jorge Mora <mora@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: <stable@vger.kernel.org> #2.6.22+
2013-10-01 18:21:28 -04:00
Anna Schumaker 367156d9a8 NFS: Give "flavor" an initial value to fix a compile warning
The previous patch introduces a compile warning by not assigning an initial
value to the "flavor" variable.  This could only be a problem if the server
returns a supported secflavor list of length zero, but it's better to
fix this before it's ever hit.

Signed-off-by: Anna Schumaker <bjschuma@netapp.com>
Acked-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-29 16:03:34 -04:00
Weston Andros Adamson 58a8cf1212 NFSv4.1: try SECINFO_NO_NAME flavs until one works
Call nfs4_lookup_root_sec for each flavor returned by SECINFO_NO_NAME until
one works.

One example of a situation this fixes:

 - server configured for krb5
 - server principal somehow gets deleted from KDC
 - server still thinking krb is good, sends krb5 as first entry in
    SECINFO_NO_NAME response
 - client tries krb5, but this fails without even sending an RPC because
    gssd's requests to the KDC can't find the server's principal

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-29 16:03:34 -04:00
Trond Myklebust acd65e5bc1 NFSv4.1: Ensure memory ordering between nfs4_ds_connect and nfs4_fl_prepare_ds
We need to ensure that the initialisation of the data server nfs_client
structure in nfs4_ds_connect is correctly ordered w.r.t. the read of
ds->ds_clp in nfs4_fl_prepare_ds.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-29 15:58:35 -04:00
Trond Myklebust 52b26a3e1b NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt fails
- Fix an Oops when nfs4_ds_connect() returns an error.
- Always check the device status after waiting for a connect to complete.

Reported-by: Andy Adamson <andros@netapp.com>
Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: <stable@vger.kernel.org> # v3.10+
2013-09-29 15:56:35 -04:00
David Howells f1fe29b4a0 NFS: Use i_writecount to control whether to get an fscache cookie in nfs_open()
Use i_writecount to control whether to get an fscache cookie in nfs_open() as
NFS does not do write caching yet.  I *think* this is the cause of a problem
encountered by Mark Moseley whereby __fscache_uncache_page() gets a NULL
pointer dereference because cookie->def is NULL:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160
PGD 0
Thread overran stack, or stack corrupted
Oops: 0000 [#1] SMP
Modules linked in: ...
CPU: 7 PID: 18993 Comm: php Not tainted 3.11.1 #1
Hardware name: Dell Inc. PowerEdge R420/072XWF, BIOS 1.3.5 08/21/2012
task: ffff8804203460c0 ti: ffff880420346640
RIP: 0010:[<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160
RSP: 0018:ffff8801053af878 EFLAGS: 00210286
RAX: 0000000000000000 RBX: ffff8800be2f8780 RCX: ffff88022ffae5e8
RDX: 0000000000004c66 RSI: ffffea00055ff440 RDI: ffff8800be2f8780
RBP: ffff8801053af898 R08: 0000000000000001 R09: 0000000000000003
R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00055ff440
R13: 0000000000001000 R14: ffff8800c50be538 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88042fc60000(0063) knlGS:00000000e439c700
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000000000010 CR3: 0000000001d8f000 CR4: 00000000000607f0
Stack:
...
Call Trace:
[<ffffffff81365a72>] __nfs_fscache_invalidate_page+0x42/0x70
[<ffffffff813553d5>] nfs_invalidate_page+0x75/0x90
[<ffffffff811b8f5e>] truncate_inode_page+0x8e/0x90
[<ffffffff811b90ad>] truncate_inode_pages_range.part.12+0x14d/0x620
[<ffffffff81d6387d>] ? __mutex_lock_slowpath+0x1fd/0x2e0
[<ffffffff811b95d3>] truncate_inode_pages_range+0x53/0x70
[<ffffffff811b969d>] truncate_inode_pages+0x2d/0x40
[<ffffffff811b96ff>] truncate_pagecache+0x4f/0x70
[<ffffffff81356840>] nfs_setattr_update_inode+0xa0/0x120
[<ffffffff81368de4>] nfs3_proc_setattr+0xc4/0xe0
[<ffffffff81357f78>] nfs_setattr+0xc8/0x150
[<ffffffff8122d95b>] notify_change+0x1cb/0x390
[<ffffffff8120a55b>] do_truncate+0x7b/0xc0
[<ffffffff8121f96c>] do_last+0xa4c/0xfd0
[<ffffffff8121ffbc>] path_openat+0xcc/0x670
[<ffffffff81220a0e>] do_filp_open+0x4e/0xb0
[<ffffffff8120ba1f>] do_sys_open+0x13f/0x2b0
[<ffffffff8126aaf6>] compat_SyS_open+0x36/0x50
[<ffffffff81d7204c>] sysenter_dispatch+0x7/0x24

The code at the instruction pointer was disassembled:

> (gdb) disas __fscache_uncache_page
> Dump of assembler code for function __fscache_uncache_page:
> ...
> 0xffffffff812a18ff <+31>: mov 0x48(%rbx),%rax
> 0xffffffff812a1903 <+35>: cmpb $0x0,0x10(%rax)
> 0xffffffff812a1907 <+39>: je 0xffffffff812a19cd <__fscache_uncache_page+237>

These instructions make up:

	ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);

That cmpb is the faulting instruction (%rax is 0).  So cookie->def is NULL -
which presumably means that the cookie has already been at least partway
through __fscache_relinquish_cookie().

What I think may be happening is something like a three-way race on the same
file:

	PROCESS 1	PROCESS 2	PROCESS 3
	===============	===============	===============
	open(O_TRUNC|O_WRONLY)
			open(O_RDONLY)
					open(O_WRONLY)
	-->nfs_open()
	-->nfs_fscache_set_inode_cookie()
	nfs_fscache_inode_lock()
	nfs_fscache_disable_inode_cookie()
	__fscache_relinquish_cookie()
	nfs_inode->fscache = NULL
	<--nfs_fscache_set_inode_cookie()

			-->nfs_open()
			-->nfs_fscache_set_inode_cookie()
			nfs_fscache_inode_lock()
			nfs_fscache_enable_inode_cookie()
			__fscache_acquire_cookie()
			nfs_inode->fscache = cookie
			<--nfs_fscache_set_inode_cookie()
	<--nfs_open()
	-->nfs_setattr()
	...
	...
	-->nfs_invalidate_page()
	-->__nfs_fscache_invalidate_page()
	cookie = nfsi->fscache
					-->nfs_open()
					-->nfs_fscache_set_inode_cookie()
					nfs_fscache_inode_lock()
					nfs_fscache_disable_inode_cookie()
					-->__fscache_relinquish_cookie()
	-->__fscache_uncache_page(cookie)
	<crash>
					<--__fscache_relinquish_cookie()
					nfs_inode->fscache = NULL
					<--nfs_fscache_set_inode_cookie()

What is needed is something to prevent process #2 from reacquiring the cookie
- and I think checking i_writecount should do the trick.

It's also possible to have a two-way race on this if the file is opened
O_TRUNC|O_RDONLY instead.

Reported-by: Mark Moseley <moseleymark@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-27 18:40:25 +01:00
David Howells 94d30ae90a FS-Cache: Provide the ability to enable/disable cookies
Provide the ability to enable and disable fscache cookies.  A disabled cookie
will reject or ignore further requests to:

	Acquire a child cookie
	Invalidate and update backing objects
	Check the consistency of a backing object
	Allocate storage for backing page
	Read backing pages
	Write to backing pages

but still allows:

	Checks/waits on the completion of already in-progress objects
	Uncaching of pages
	Relinquishment of cookies

Two new operations are provided:

 (1) Disable a cookie:

	void fscache_disable_cookie(struct fscache_cookie *cookie,
				    bool invalidate);

     If the cookie is not already disabled, this locks the cookie against other
     dis/enablement ops, marks the cookie as being disabled, discards or
     invalidates any backing objects and waits for cessation of activity on any
     associated object.

     This is a wrapper around a chunk split out of fscache_relinquish_cookie(),
     but it reinitialises the cookie such that it can be reenabled.

     All possible failures are handled internally.  The caller should consider
     calling fscache_uncache_all_inode_pages() afterwards to make sure all page
     markings are cleared up.

 (2) Enable a cookie:

	void fscache_enable_cookie(struct fscache_cookie *cookie,
				   bool (*can_enable)(void *data),
				   void *data)

     If the cookie is not already enabled, this locks the cookie against other
     dis/enablement ops, invokes can_enable() and, if the cookie is not an
     index cookie, will begin the procedure of acquiring backing objects.

     The optional can_enable() function is passed the data argument and returns
     a ruling as to whether or not enablement should actually be permitted to
     begin.

     All possible failures are handled internally.  The cookie will only be
     marked as enabled if provisional backing objects are allocated.

A later patch will introduce these to NFS.  Cookie enablement during nfs_open()
is then contingent on i_writecount <= 0.  can_enable() checks for a race
between open(O_RDONLY) and open(O_WRONLY/O_RDWR).  This simplifies NFS's cookie
handling and allows us to get rid of open(O_RDONLY) accidentally introducing
caching to an inode that's open for writing already.

One operation has its API modified:

 (3) Acquire a cookie.

	struct fscache_cookie *fscache_acquire_cookie(
		struct fscache_cookie *parent,
		const struct fscache_cookie_def *def,
		void *netfs_data,
		bool enable);

     This now has an additional argument that indicates whether the requested
     cookie should be enabled by default.  It doesn't need the can_enable()
     function because the caller must prevent multiple calls for the same netfs
     object and it doesn't need to take the enablement lock because no one else
     can get at the cookie before this returns.

Signed-off-by: David Howells <dhowells@redhat.com
2013-09-27 18:40:25 +01:00
Trond Myklebust 5bc2afc2b5 NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem method
Determine if we've created a new file by examining the directory change
attribute and/or the O_EXCL flag.

This fixes a regression when doing a non-exclusive create of a new file.
If the FILE_CREATED flag is not set, the atomic_open() command will
perform full file access permissions checks instead of just checking
for MAY_OPEN.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-26 10:20:18 -04:00
Miklos Szeredi 01c919abaf nfs: set FILE_CREATED
Set FILE_CREATED on O_CREAT|O_EXCL.  If the NFS server honored our request
for exclusivity then this must be correct.

Currently this is a no-op, since the VFS sets FILE_CREATED anyway.  The
next patch will, however, require this flag to be always set by
filesystems.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-16 19:17:24 -04:00
Linus Torvalds 9bf12df31f Merge git://git.kvack.org/~bcrl/aio-next
Pull aio changes from Ben LaHaise:
 "First off, sorry for this pull request being late in the merge window.
  Al had raised a couple of concerns about 2 items in the series below.
  I addressed the first issue (the race introduced by Gu's use of
  mm_populate()), but he has not provided any further details on how he
  wants to rework the anon_inode.c changes (which were sent out months
  ago but have yet to be commented on).

  The bulk of the changes have been sitting in the -next tree for a few
  months, with all the issues raised being addressed"

* git://git.kvack.org/~bcrl/aio-next: (22 commits)
  aio: rcu_read_lock protection for new rcu_dereference calls
  aio: fix race in ring buffer page lookup introduced by page migration support
  aio: fix rcu sparse warnings introduced by ioctx table lookup patch
  aio: remove unnecessary debugging from aio_free_ring()
  aio: table lookup: verify ctx pointer
  staging/lustre: kiocb->ki_left is removed
  aio: fix error handling and rcu usage in "convert the ioctx list to table lookup v3"
  aio: be defensive to ensure request batching is non-zero instead of BUG_ON()
  aio: convert the ioctx list to table lookup v3
  aio: double aio_max_nr in calculations
  aio: Kill ki_dtor
  aio: Kill ki_users
  aio: Kill unneeded kiocb members
  aio: Kill aio_rw_vect_retry()
  aio: Don't use ctx->tail unnecessarily
  aio: io_cancel() no longer returns the io_event
  aio: percpu ioctx refcount
  aio: percpu reqs_available
  aio: reqs_active -> reqs_available
  aio: fix build when migration is disabled
  ...
2013-09-13 10:55:58 -07:00
Linus Torvalds ac4de9543a Merge branch 'akpm' (patches from Andrew Morton)
Merge more patches from Andrew Morton:
 "The rest of MM.  Plus one misc cleanup"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (35 commits)
  mm/Kconfig: add MMU dependency for MIGRATION.
  kernel: replace strict_strto*() with kstrto*()
  mm, thp: count thp_fault_fallback anytime thp fault fails
  thp: consolidate code between handle_mm_fault() and do_huge_pmd_anonymous_page()
  thp: do_huge_pmd_anonymous_page() cleanup
  thp: move maybe_pmd_mkwrite() out of mk_huge_pmd()
  mm: cleanup add_to_page_cache_locked()
  thp: account anon transparent huge pages into NR_ANON_PAGES
  truncate: drop 'oldsize' truncate_pagecache() parameter
  mm: make lru_add_drain_all() selective
  memcg: document cgroup dirty/writeback memory statistics
  memcg: add per cgroup writeback pages accounting
  memcg: check for proper lock held in mem_cgroup_update_page_stat
  memcg: remove MEMCG_NR_FILE_MAPPED
  memcg: reduce function dereference
  memcg: avoid overflow caused by PAGE_ALIGN
  memcg: rename RESOURCE_MAX to RES_COUNTER_MAX
  memcg: correct RESOURCE_MAX to ULLONG_MAX
  mm: memcg: do not trap chargers with full callstack on OOM
  mm: memcg: rework and document OOM waiting and wakeup
  ...
2013-09-12 15:44:27 -07:00
Kirill A. Shutemov 7caef26767 truncate: drop 'oldsize' truncate_pagecache() parameter
truncate_pagecache() doesn't care about old size since commit
cedabed49b ("vfs: Fix vmtruncate() regression").  Let's drop it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12 15:38:02 -07:00
Linus Torvalds 26935fb06e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 4 from Al Viro:
 "list_lru pile, mostly"

This came out of Andrew's pile, Al ended up doing the merge work so that
Andrew didn't have to.

Additionally, a few fixes.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits)
  super: fix for destroy lrus
  list_lru: dynamically adjust node arrays
  shrinker: Kill old ->shrink API.
  shrinker: convert remaining shrinkers to count/scan API
  staging/lustre/libcfs: cleanup linux-mem.h
  staging/lustre/ptlrpc: convert to new shrinker API
  staging/lustre/obdclass: convert lu_object shrinker to count/scan API
  staging/lustre/ldlm: convert to shrinkers to count/scan API
  hugepage: convert huge zero page shrinker to new shrinker API
  i915: bail out earlier when shrinker cannot acquire mutex
  drivers: convert shrinkers to new count/scan API
  fs: convert fs shrinkers to new scan/count API
  xfs: fix dquot isolation hang
  xfs-convert-dquot-cache-lru-to-list_lru-fix
  xfs: convert dquot cache lru to list_lru
  xfs: rework buffer dispose list tracking
  xfs-convert-buftarg-lru-to-generic-code-fix
  xfs: convert buftarg LRU to generic code
  fs: convert inode and dentry shrinking to be node aware
  vmscan: per-node deferred work
  ...
2013-09-12 15:01:38 -07:00
Linus Torvalds 1d7b24ff33 NFS client bugfixes:
- Fix a few credential reference leaks resulting from the SP4_MACH_CRED
   NFSv4.1 state protection code.
 - Fix the SUNRPC bloatometer footprint: convert a 256K hashtable into the
   intended 64 byte structure.
 - Fix a long standing XDR issue with FREE_STATEID
 - Fix a potential WARN_ON spamming issue
 - Fix a missing dprintk() kuid conversion
 
 New features:
 - Enable the NFSv4.1 state protection support for the WRITE and COMMIT
   operations.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSMiO+AAoJEGcL54qWCgDyuwEQALNAMpcRhASpqrRSuX94aKn3
 ATENr87ov2FCXcTP/OBjdlcryyjp+0e5JBW5T0nHn90Uylz4p/87eOILlqIq4ax2
 4QldKAuHdk5gLwiX5ebWpDtlwjTwyth1PRD7iPHT8lvIlO0IT7S/VDaa/04J37PL
 Lw1zaTD0cpdRkdTnA12RDJ5oTW0YwmSBb5qJQROjinwa/ALuIZJpoBNCV01lIP2k
 VaW0Yd8A+hqtawmxnf3G14r50Ds269AZ5K4hcRjQMEWeetlwfXFSTSjx8dzgsQkx
 4VF6wiCSwsKEdrp8csRv+fsHiGRjNfzdSTrQxcJa+ssP6qX0KWHYPdw2jgbozX+2
 kUQw2bFgxug+zdNjp+z1daJzw4QAfkjfNBWzt4w7a+8VOnR+/fydJzmka4mlJUKB
 IDy8l/KrSCjCHi9VYal27+IQs/bcLAIvASUF14cZ/+ZY9MUsWhYXVPHNLhwTPds2
 jFvawh77V6MHg/wA2+D7yHbHmOOmZaH2/Af9v3HKsVhhoLwqr5LO9qfAq63KSxzW
 udzmjlSEhlOiJKDMZo9HigjKhU+Ndujr7RqsP6WFjTPa4yn6499cbTy7izze6MPB
 JZDlmkInnZAtLDOuHAwxSNuNfBD6Yrzk1PV8Gv2xMEdp41bxgAg//K3WXx2vSGWa
 4TQMHjaegAkdHyTK0rJD
 =IdGo
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes (part 2) from Trond Myklebust:
 "Bugfixes:
   - Fix a few credential reference leaks resulting from the
     SP4_MACH_CRED NFSv4.1 state protection code.
   - Fix the SUNRPC bloatometer footprint: convert a 256K hashtable into
     the intended 64 byte structure.
   - Fix a long standing XDR issue with FREE_STATEID
   - Fix a potential WARN_ON spamming issue
   - Fix a missing dprintk() kuid conversion

  New features:
   - Enable the NFSv4.1 state protection support for the WRITE and
     COMMIT operations"

* tag 'nfs-for-3.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: No, I did not intend to create a 256KiB hashtable
  sunrpc: Add missing kuids conversion for printing
  NFSv4.1: sp4_mach_cred: WARN_ON -> WARN_ON_ONCE
  NFSv4.1: sp4_mach_cred: no need to ref count creds
  NFSv4.1: fix SECINFO* use of put_rpccred
  NFSv4.1: sp4_mach_cred: ask for WRITE and COMMIT
  NFSv4.1 fix decode_free_stateid
2013-09-12 13:39:34 -07:00
Weston Andros Adamson 312cd958a7 NFSv4.1: sp4_mach_cred: WARN_ON -> WARN_ON_ONCE
No need to spam the logs

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-11 09:08:08 -04:00
Weston Andros Adamson ade33ff58e NFSv4.1: sp4_mach_cred: no need to ref count creds
The cl_machine_cred doesn't need to be reference counted here -
a reference is held is for the lifetime of the struct nfs_client.
Also, no need to put_rpccred the rpc_message.rpc_cred.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-11 09:07:53 -04:00
Weston Andros Adamson 7cb852dfc8 NFSv4.1: fix SECINFO* use of put_rpccred
Recent SP4_MACH_CRED changes allows rpc_message.rpc_cred to change,
so keep a separate pointer to the machine cred for put_rpccred.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-11 09:07:27 -04:00
Weston Andros Adamson a02796250f NFSv4.1: sp4_mach_cred: ask for WRITE and COMMIT
Request SP4_MACH_CRED WRITE and COMMIT support in spo_must_allow list --
they're already supported by the client.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-11 09:06:43 -04:00
Dave Chinner 1ab6c4997e fs: convert fs shrinkers to new scan/count API
Convert the filesystem shrinkers to use the new API, and standardise some
of the behaviours of the shrinkers at the same time.  For example,
nr_to_scan means the number of objects to scan, not the number of objects
to free.

I refactored the CIFS idmap shrinker a little - it really needs to be
broken up into a shrinker per tree and keep an item count with the tree
root so that we don't need to walk the tree every time the shrinker needs
to count the number of objects in the tree (i.e.  all the time under
memory pressure).

[glommer@openvz.org: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree]
[assorted fixes folded in]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:31 -04:00
Glauber Costa 55f841ce93 super: fix calculation of shrinkable objects for small numbers
The sysctl knob sysctl_vfs_cache_pressure is used to determine which
percentage of the shrinkable objects in our cache we should actively try
to shrink.

It works great in situations in which we have many objects (at least more
than 100), because the aproximation errors will be negligible.  But if
this is not the case, specially when total_objects < 100, we may end up
concluding that we have no objects at all (total / 100 = 0, if total <
100).

This is certainly not the biggest killer in the world, but may matter in
very low kernel memory situations.

Signed-off-by: Glauber Costa <glommer@openvz.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 18:56:29 -04:00
Andy Adamson 9f79fb4825 NFSv4.1 fix decode_free_stateid
The operation status is decoded in decode_op_hdr.

Stop the print_overflow message that is always hit without this patch:
nfs: decode_free_stateid: prematurely hit end of receive buffer. Remaining
buffer length is 0 words.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-10 13:04:37 -04:00
Linus Torvalds bf97293eb8 NFS client updates for Linux 3.12
Highlights include:
 
 - Fix NFSv4 recovery so that it doesn't recover lost locks in cases such as
   lease loss due to a network partition, where doing so may result in data
   corruption. Add a kernel parameter to control choice of legacy behaviour
   or not.
 - Performance improvements when 2 processes are writing to the same file.
 - Flush data to disk when an RPCSEC_GSS session timeout is imminent.
 - Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other
   NFS clients from being able to manipulate our lease and file lockingr
   state.
 - Allow sharing of RPCSEC_GSS caches between different rpc clients
 - Fix the broken NFSv4 security auto-negotiation between client and server
 - Fix rmdir() to wait for outstanding sillyrename unlinks to complete
 - Add a tracepoint framework for debugging NFSv4 state recovery issues.
 - Add tracing to the generic NFS layer.
 - Add tracing for the SUNRPC socket connection state.
 - Clean up the rpc_pipefs mount/umount event management.
 - Merge more patches from Chuck in preparation for NFSv4 migration support.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSLelVAAoJEGcL54qWCgDyo2IQAKOfRJyZVnf4ipxi3xLNl1QF
 w/70DVSIF1S1djWN7G3vgkxj/R8KCvJ8CcvkAD2BEgRDeZJ9TtyKAdM/jYLZ+W05
 7k2QKk8fkwZmc1Y2qDqFwKHzP5ZgP5L2nGx7FNhi/99wEAe47yFG3qd3rUWKrcOf
 mnd863zgGDE2Q10slhoq/bywwMJo6tKZNeaIE8kPjgFbBEh/jslpAWr8dSA4QgvJ
 nZ8VB5XU8L+XJ0GpHHdjYm9LvQ51DbQ6omOF+0P4fI093azKmf4ZsrjMDWT8+iu3
 XkXlnQmKLGTi7yB43hHtn2NiRqwGzCcZ1Amo9PpCFaHUt1RP9cc37UhG1T+x1xWJ
 STEKDbvCdQ3FU9FvbgrGEwBR0e8fNS4fZY3ToDBflIcfwre0aWs5RCodZMUD0nUI
 4wY5J9NsQR/bL+v8KeUR4V4cXK8YrgL0zB4u4WYzH5Npxr5KD0NEKDNqRPhrB9l2
 LLF9Haql8j76Ff0ek6UGFIZjDE0h6Fs71wLBpLj+ZWArOJ7vBuLMBSOVqNpld9+9
 f2fEG7qoGF4FGTY4myH/eakMPaWnk9Ol4Ls/svSIapJ9+rePD+a93e/qnmdofIMf
 4TuEYk6ERib1qXgaeDRQuCsm2YE1Co5skGMaOsRFWgReE1c12QoJQVst2nMtEKp3
 uV2w8LgX18aZOZXJVkCM
 =ZuW+
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

   - Fix NFSv4 recovery so that it doesn't recover lost locks in cases
     such as lease loss due to a network partition, where doing so may
     result in data corruption.  Add a kernel parameter to control
     choice of legacy behaviour or not.
   - Performance improvements when 2 processes are writing to the same
     file.
   - Flush data to disk when an RPCSEC_GSS session timeout is imminent.
   - Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other
     NFS clients from being able to manipulate our lease and file
     locking state.
   - Allow sharing of RPCSEC_GSS caches between different rpc clients.
   - Fix the broken NFSv4 security auto-negotiation between client and
     server.
   - Fix rmdir() to wait for outstanding sillyrename unlinks to complete
   - Add a tracepoint framework for debugging NFSv4 state recovery
     issues.
   - Add tracing to the generic NFS layer.
   - Add tracing for the SUNRPC socket connection state.
   - Clean up the rpc_pipefs mount/umount event management.
   - Merge more patches from Chuck in preparation for NFSv4 migration
     support"

* tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (107 commits)
  NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity
  NFS: nfs_compare_super shouldn't check the auth flavour unless 'sec=' was set
  NFSv4: Allow security autonegotiation for submounts
  NFSv4: Disallow security negotiation for lookups when 'sec=' is specified
  NFSv4: Fix security auto-negotiation
  NFS: Clean up nfs_parse_security_flavors()
  NFS: Clean up the auth flavour array mess
  NFSv4.1 Use MDS auth flavor for data server connection
  NFS: Don't check lock owner compatability unless file is locked (part 2)
  NFS: Don't check lock owner compatibility in writes unless file is locked
  nfs4: Map NFS4ERR_WRONG_CRED to EPERM
  nfs4.1: Add SP4_MACH_CRED write and commit support
  nfs4.1: Add SP4_MACH_CRED stateid support
  nfs4.1: Add SP4_MACH_CRED secinfo support
  nfs4.1: Add SP4_MACH_CRED cleanup support
  nfs4.1: Add state protection handler
  nfs4.1: Minimal SP4_MACH_CRED implementation
  SUNRPC: Replace pointer values with task->tk_pid and rpc_clnt->cl_clid
  SUNRPC: Add an identifier for struct rpc_clnt
  SUNRPC: Ensure rpc_task->tk_pid is available for tracepoints
  ...
2013-09-09 09:19:15 -07:00
Weston Andros Adamson b1b3e13694 NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity
Commit 97431204ea introduced a regression
that causes SECINFO_NO_NAME to fail without sending an RPC if:

 1) the nfs_client's rpc_client is using krb5i/p (now tried by default)
 2) the current user doesn't have valid kerberos credentials

This situation is quite common - as of now a sec=sys mount would use
krb5i for the nfs_client's rpc_client and a user would hardly be faulted
for not having run kinit.

The solution is to use the machine cred when trying to use an integrity
protected auth flavor for SECINFO_NO_NAME.

Older servers may not support using the machine cred or an integrity
protected auth flavor for SECINFO_NO_NAME in every circumstance, so we fall
back to using the user's cred and the filesystem's auth flavor in this case.

We run into another problem when running against linux nfs servers -
they return NFS4ERR_WRONGSEC when using integrity auth flavor (unless the
mount is also that flavor) even though that is not a valid error for
SECINFO*.  Even though it's against spec, handle WRONGSEC errors on
SECINFO_NO_NAME by falling back to using the user cred and the
filesystem's auth flavor.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07 18:39:25 -04:00
Trond Myklebust 0aea92bf67 NFS: nfs_compare_super shouldn't check the auth flavour unless 'sec=' was set
Also don't worry about obsolete mount flags...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07 18:38:04 -04:00
Trond Myklebust 47040da3c7 NFSv4: Allow security autonegotiation for submounts
In cases where the parent super block was not mounted with a 'sec=' line,
allow autonegotiation of security for the submounts.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07 17:52:42 -04:00
Trond Myklebust 41d058c3ba NFSv4: Disallow security negotiation for lookups when 'sec=' is specified
Ensure that nfs4_proc_lookup_common respects the NFS_MOUNT_SECFLAVOUR
flag.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07 17:52:42 -04:00
Trond Myklebust 5e6b19901b NFSv4: Fix security auto-negotiation
NFSv4 security auto-negotiation has been broken since
commit 4580a92d44 (NFS:
Use server-recommended security flavor by default (NFSv3))
because nfs4_try_mount() will automatically select AUTH_SYS
if it sees no auth flavours.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
2013-09-07 16:18:30 -04:00
Trond Myklebust 19e7b8d240 NFS: Clean up nfs_parse_security_flavors()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07 16:12:45 -04:00
Trond Myklebust 74c9881162 NFS: Clean up the auth flavour array mess
What is the point of having a 'auth_flavor_len' field, if it is
always set to 1, and can't be used to determine if the user has
selected an auth flavour?
This cleanup goes back to using auth_flavor_len for its original
intended purpose, and gets rid of the ad-hoc replacements.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07 16:11:24 -04:00
Andy Adamson 0e20162ed1 NFSv4.1 Use MDS auth flavor for data server connection
Commit 4edaa308 "NFS: Use "krb5i" to establish NFSv4 state whenever possible"
uses the nfs_client cl_rpcclient for all state management operations, and
will use krb5i or auth_sys with no regard to the mount command authflavor
choice.

The MDS, as any NFSv4.1 mount point, uses the nfs_server rpc client for all
non-state management operations with a different nfs_server for each fsid
encountered traversing the mount point, each with a potentially different
auth flavor.

pNFS data servers are not mounted in the normal sense as there is no associated
nfs_server structure. Data servers can also export multiple fsids, each with
a potentially different auth flavor.

Data servers need to use the same authflavor as the MDS server rpc client for
non-state management operations. Populate a list of rpc clients with the MDS
server rpc client auth flavor for the DS to use.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-06 14:49:16 -04:00
Trond Myklebust 4109bb7496 NFS: Don't check lock owner compatability unless file is locked (part 2)
When coalescing requests into a single READ or WRITE RPC call, and there
is no file locking involved, we don't have to refuse coalescing for
requests where the lock owner information doesn't match.

Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-06 11:27:41 -04:00
Trond Myklebust 0f1d260550 NFS: Don't check lock owner compatibility in writes unless file is locked
If we're doing buffered writes, and there is no file locking involved,
then we don't have to worry about whether or not the lock owner information
is identical.
By relaxing this check, we ensure that fork()ed child processes can write
to a page without having to first sync dirty data that was written
by the parent to disk.

Reported-by: Quentin Barnes <qbarnes@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Quentin Barnes <qbarnes@gmail.com>
2013-09-05 18:11:42 -04:00
Miklos Szeredi 13caa9fb5b nfs: use check_submounts_and_drop()
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically.

check_submounts_and_drop() can deal with negative dentries and
non-directories as well.

Non-directories can also be mounted on.  And just like directories we don't
want these to disappear with invalidation.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05 16:23:52 -04:00
Weston Andros Adamson 8897538e97 nfs4: Map NFS4ERR_WRONG_CRED to EPERM
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05 10:51:22 -04:00
Weston Andros Adamson 8c21c62c44 nfs4.1: Add SP4_MACH_CRED write and commit support
WRITE and COMMIT can use the machine credential.

If WRITE is supported and COMMIT is not, make all (mach cred) writes FILE_SYNC4.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05 10:50:45 -04:00
Weston Andros Adamson 3787d5063c nfs4.1: Add SP4_MACH_CRED stateid support
TEST_STATEID and FREE_STATEID can use the machine credential.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05 10:49:35 -04:00
Weston Andros Adamson 8b5bee2e1b nfs4.1: Add SP4_MACH_CRED secinfo support
SECINFO and SECINFO_NONAME can use the machine credential.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05 10:48:30 -04:00
Weston Andros Adamson fa940720ce nfs4.1: Add SP4_MACH_CRED cleanup support
CLOSE and LOCKU can use the machine credential.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05 10:44:17 -04:00