Commit Graph

1908 Commits

Author SHA1 Message Date
Al Viro 301f0268b6 nfsd: racy access to ->d_name in nsfd4_encode_path()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-03 22:50:28 -04:00
J. Bruce Fields 248f807b47 nfsd4: nfsd4_create_clid_dir prints uninitialized data
Take the easy way out and just remove the printk.

Reported-by: David Howells <dhowells@redhat.com>
2013-08-30 17:30:52 -04:00
J. Bruce Fields bf7bd3e98b nfsd4: fix leak of inode reference on delegation failure
This fixes a regression from 68a3396178
"nfsd4: shut down more of delegation earlier".

After that commit, nfs4_set_delegation() failures result in
nfs4_put_delegation being called, but nfs4_put_delegation doesn't free
the nfs4_file that has already been set by alloc_init_deleg().

This can result in an oops on later unmounting the exported filesystem.

Note also delaying the fi_had_conflict check we're able to return a
better error (hence give 4.1 clients a better idea why the delegation
failed; though note CONFLICT isn't an exact match here, as that's
supposed to indicate a current conflict, but all we know here is that
there was one recently).

Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Tested-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-08-30 17:30:52 -04:00
J. Bruce Fields 3477565e6a Revert "nfsd: nfs4_file_get_access: need to be more careful with O_RDWR"
This reverts commit df66e75395.

nfsd4_lock can get a read-only or write-only reference when only a
read-write open is available.  This is normal.

Cc: Harshula Jayasuriya <harshula@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-08-30 17:30:45 -04:00
J. Bruce Fields b8297cec2d Linux 3.11-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJSCDSjAAoJEHm+PkMAQRiGDXMIAI7Loae0Oqb1eoeJkvjyZsBS
 OJDeeEcn+k58VbxVHyRdc7hGo4yI4tUZm172SpnOaM8sZ/ehPU7zBrwJK2lzX334
 /jAM3uvVPfxA2nu0I4paNpkED/NQ8NRRsYE1iTE8dzHXOH6dA3mgp5qfco50rQvx
 rvseXpME4KIAJEq4jnyFZF5+nuHiPueM9JftPmSSmJJ3/KY9kY1LESovyWd7ttg1
 jYSVPFal9J0E+tl2UQY5g9H16GqhhjYn+39Iei6Q5P4bL4ZubQgTRQTN9nyDc06Z
 ezQtGoqZ8kEz/2SyRlkda6PzjSEhgXlc8mCL5J7AW+dMhTHHx2IrosjiCA80kG8=
 =c0rK
 -----END PGP SIGNATURE-----

Merge tag 'v3.11-rc5' into for-3.12 branch

For testing purposes I want some nfs and nfsd bugfixes (specifically,
58cd57bfd9 and previous nfsd patches, and
Trond's 4f3cc4809a).
2013-08-30 16:42:49 -04:00
Weston Andros Adamson 58cd57bfd9 nfsd: Fix SP4_MACH_CRED negotiation in EXCHANGE_ID
- don't BUG_ON() when not SP4_NONE
 - calculate recv and send reserve sizes correctly

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-08-07 12:06:07 -04:00
J. Bruce Fields c47205914c nfsd4: Fix MACH_CRED NULL dereference
Fixes a NULL-dereference on attempts to use MACH_CRED protection over
auth_sys.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-08-07 12:05:51 -04:00
J. Bruce Fields b1948a641d nfsd4: fix setlease error return
This actually makes a difference in the 4.1 case, since we use the
status to decide what reason to give the client for the delegation
refusal (see nfsd4_open_deleg_none_ext), and in theory a client might
choose suboptimal behavior if we give the wrong answer.

Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-26 17:02:07 -04:00
Harshula Jayasuriya df66e75395 nfsd: nfs4_file_get_access: need to be more careful with O_RDWR
If fi_fds = {non-NULL, NULL, non-NULL} and oflag = O_WRONLY
the WARN_ON_ONCE(!(fp->fi_fds[oflag] || fp->fi_fds[O_RDWR]))
doesn't trigger when it should.

Signed-off-by: Harshula Jayasuriya <harshula@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-23 12:26:51 -04:00
Harshula Jayasuriya e4daf1ffbe nfsd: nfsd_open: when dentry_open returns an error do not propagate as struct file
The following call chain:
------------------------------------------------------------
nfs4_get_vfs_file
- nfsd_open
  - dentry_open
    - do_dentry_open
      - __get_file_write_access
        - get_write_access
          - return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;
------------------------------------------------------------

can result in the following state:
------------------------------------------------------------
struct nfs4_file {
...
  fi_fds = {0xffff880c1fa65c80, 0xffffffffffffffe6, 0x0},
  fi_access = {{
      counter = 0x1
    }, {
      counter = 0x0
    }},
...
------------------------------------------------------------

1) First time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
NULL, hence nfsd_open() is called where we get status set to an error
and fp->fi_fds[O_WRONLY] to -ETXTBSY. Thus we do not reach
nfs4_file_get_access() and fi_access[O_WRONLY] is not incremented.

2) Second time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
NOT NULL (-ETXTBSY), so nfsd_open() is NOT called, but
nfs4_file_get_access() IS called and fi_access[O_WRONLY] is incremented.
Thus we leave a landmine in the form of the nfs4_file data structure in
an incorrect state.

3) Eventually, when __nfs4_file_put_access() is called it finds
fi_access[O_WRONLY] being non-zero, it decrements it and calls
nfs4_file_put_fd() which tries to fput -ETXTBSY.
------------------------------------------------------------
...
     [exception RIP: fput+0x9]
     RIP: ffffffff81177fa9  RSP: ffff88062e365c90  RFLAGS: 00010282
     RAX: ffff880c2b3d99cc  RBX: ffff880c2b3d9978  RCX: 0000000000000002
     RDX: dead000000100101  RSI: 0000000000000001  RDI: ffffffffffffffe6
     RBP: ffff88062e365c90   R8: ffff88041fe797d8   R9: ffff88062e365d58
     R10: 0000000000000008  R11: 0000000000000000  R12: 0000000000000001
     R13: 0000000000000007  R14: 0000000000000000  R15: 0000000000000000
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #9 [ffff88062e365c98] __nfs4_file_put_access at ffffffffa0562334 [nfsd]
 #10 [ffff88062e365cc8] nfs4_file_put_access at ffffffffa05623ab [nfsd]
 #11 [ffff88062e365ce8] free_generic_stateid at ffffffffa056634d [nfsd]
 #12 [ffff88062e365d18] release_open_stateid at ffffffffa0566e4b [nfsd]
 #13 [ffff88062e365d38] nfsd4_close at ffffffffa0567401 [nfsd]
 #14 [ffff88062e365d88] nfsd4_proc_compound at ffffffffa0557f28 [nfsd]
 #15 [ffff88062e365dd8] nfsd_dispatch at ffffffffa054543e [nfsd]
 #16 [ffff88062e365e18] svc_process_common at ffffffffa04ba5a4 [sunrpc]
 #17 [ffff88062e365e98] svc_process at ffffffffa04babe0 [sunrpc]
 #18 [ffff88062e365eb8] nfsd at ffffffffa0545b62 [nfsd]
 #19 [ffff88062e365ee8] kthread at ffffffff81090886
 #20 [ffff88062e365f48] kernel_thread at ffffffff8100c14a
------------------------------------------------------------

Cc: stable@vger.kernel.org
Signed-off-by: Harshula Jayasuriya <harshula@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-23 12:15:32 -04:00
Linus Torvalds 61f98b0fca Merge branch 'for-3.11' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfixes from Bruce Fields:
 "Just three minor bugfixes"

* 'for-3.11' of git://linux-nfs.org/~bfields/linux:
  svcrdma: underflow issue in decode_write_list()
  nfsd4: fix minorversion support interface
  lockd: protect nlm_blocked access in nlmsvc_retry_blocked
2013-07-17 13:43:55 -07:00
J. Bruce Fields 35f7a14fc1 nfsd4: fix minorversion support interface
You can turn on or off support for minorversions using e.g.

	echo "-4.2" >/proc/fs/nfsd/versions

However, the current implementation is a little wonky.  For example, the
above will turn off 4.2 support, but it will also turn *on* 4.1 support.

This didn't matter as long as we only had 2 minorversions, which was
true till very recently.

And do a little cleanup here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-12 16:48:52 -04:00
Linus Torvalds 0ff08ba5d0 Merge branch 'for-3.11' of git://linux-nfs.org/~bfields/linux
Pull nfsd changes from Bruce Fields:
 "Changes this time include:

   - 4.1 enabled on the server by default: the last 4.1-specific issues
     I know of are fixed, so we're not going to find the rest of the
     bugs without more exposure.
   - Experimental support for NFSv4.2 MAC Labeling (to allow running
     selinux over NFS), from Dave Quigley.
   - Fixes for some delicate cache/upcall races that could cause rare
     server hangs; thanks to Neil Brown and Bodo Stroesser for extreme
     debugging persistence.
   - Fixes for some bugs found at the recent NFS bakeathon, mostly v4
     and v4.1-specific, but also a generic bug handling fragmented rpc
     calls"

* 'for-3.11' of git://linux-nfs.org/~bfields/linux: (31 commits)
  nfsd4: support minorversion 1 by default
  nfsd4: allow destroy_session over destroyed session
  svcrpc: fix failures to handle -1 uid's
  sunrpc: Don't schedule an upcall on a replaced cache entry.
  net/sunrpc: xpt_auth_cache should be ignored when expired.
  sunrpc/cache: ensure items removed from cache do not have pending upcalls.
  sunrpc/cache: use cache_fresh_unlocked consistently and correctly.
  sunrpc/cache: remove races with queuing an upcall.
  nfsd4: return delegation immediately if lease fails
  nfsd4: do not throw away 4.1 lock state on last unlock
  nfsd4: delegation-based open reclaims should bypass permissions
  svcrpc: don't error out on small tcp fragment
  svcrpc: fix handling of too-short rpc's
  nfsd4: minor read_buf cleanup
  nfsd4: fix decoding of compounds across page boundaries
  nfsd4: clean up nfs4_open_delegation
  NFSD: Don't give out read delegations on creates
  nfsd4: allow client to send no cb_sec flavors
  nfsd4: fail attempts to request gss on the backchannel
  nfsd4: implement minimal SP4_MACH_CRED
  ...
2013-07-11 10:17:13 -07:00
Linus Torvalds be0c5d8c0b NFS client updates for Linux 3.11
Feature highlights include:
 - Add basic client support for NFSv4.2
 - Add basic client support for Labeled NFS (selinux for NFSv4.2)
 - Fix the use of credentials in NFSv4.1 stateful operations, and
   add support for NFSv4.1 state protection.
 
 Bugfix highlights:
 - Fix another NFSv4 open state recovery race
 - Fix an NFSv4.1 back channel session regression
 - Various rpc_pipefs races
 - Fix another issue with NFSv3 auth negotiation
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJR2vsSAAoJEGcL54qWCgDyWBIP/AqlpBBAblxbNQ1Bl/0m1Pdb
 iKH961qgM4U1BzK0svGtHTZqkovpm4o/VbkbKBT5mQ4g6SbbsJ/AsS1plCyfnIZi
 bdnKNJyj6zg0NsAkJ3vKWqd4BTaP+icdSfEIlRKQxAPESewN7b5B3OWgY4KdYmnk
 q5BP25anC1ryxVycSY67ux8S2IKXVSRZeCZv+RO21rvZ2G0bV5y7t8Om28ztxEnU
 RKrHgQHgaaktR7i8QVO0sbiWq3iqLa3GPkUvFLwWGr8PQJtTkYY0QwYSrsV3N4rY
 hYpMRUZFHpZ8UG5YvBT6xyOy/XaGwMGKSfZjB9/YG4QVju+tTy50U1JbTil5PEWY
 GHWYF68aurIeUkXrhSv8AVnOnhir0mISx5ou/SV7p0QoAZ92V6kq+LkPrW520qlc
 z8ILh3j28pN3ZUCIEArcaZhYCt48uO2hwBi5TqevQyyGRsXFGbN1moD5jvHkllft
 Fi0XGuCBdvhrzFRZcsEl+PDq7fT8lXUK2BHe8oR5jz9PhUp+jpEl9m/eg3RsjJjN
 DuxsHye2U4chScdnRtLBQvpFtdINvWX/Gy8Bi7kdE5tsQySvOa+rdwuBc7h88PHC
 +4xI2iX3z4O1+GpsAe/T9+pjW689jEilS+eVDRVEGl6yHGn9q8PYOayjPjwbJHxS
 R2mLTRhKu1DKguTzO13f
 =wGjn
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Feature highlights include:
   - Add basic client support for NFSv4.2
   - Add basic client support for Labeled NFS (selinux for NFSv4.2)
   - Fix the use of credentials in NFSv4.1 stateful operations, and add
     support for NFSv4.1 state protection.

  Bugfix highlights:
   - Fix another NFSv4 open state recovery race
   - Fix an NFSv4.1 back channel session regression
   - Various rpc_pipefs races
   - Fix another issue with NFSv3 auth negotiation

  Please note that Labeled NFS does require some additional support from
  the security subsystem.  The relevant changesets have all been
  reviewed and acked by James Morris."

* tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (54 commits)
  NFS: Set NFS_CS_MIGRATION for NFSv4 mounts
  NFSv4.1 Refactor nfs4_init_session and nfs4_init_channel_attrs
  nfs: have NFSv3 try server-specified auth flavors in turn
  nfs: have nfs_mount fake up a auth_flavs list when the server didn't provide it
  nfs: move server_authlist into nfs_try_mount_request
  nfs: refactor "need_mount" code out of nfs_try_mount
  SUNRPC: PipeFS MOUNT notification optimization for dying clients
  SUNRPC: split client creation routine into setup and registration
  SUNRPC: fix races on PipeFS UMOUNT notifications
  SUNRPC: fix races on PipeFS MOUNT notifications
  NFSv4.1 use pnfs_device maxcount for the objectlayout gdia_maxcount
  NFSv4.1 use pnfs_device maxcount for the blocklayout gdia_maxcount
  NFSv4.1 Fix gdia_maxcount calculation to fit in ca_maxresponsesize
  NFS: Improve legacy idmapping fallback
  NFSv4.1 end back channel session draining
  NFS: Apply v4.1 capabilities to v4.2
  NFSv4.1: Clean up layout segment comparison helper names
  NFSv4.1: layout segment comparison helpers should take 'const' parameters
  NFSv4: Move the DNS resolver into the NFSv4 module
  rpc_pipefs: only set rpc_dentry_ops if d_op isn't already set
  ...
2013-07-09 12:09:43 -07:00
J. Bruce Fields d109148111 nfsd4: support minorversion 1 by default
We now have minimal minorversion 1 support; turn it on by default.

This can still be turned off with "echo -4.1 >/proc/fs/nfsd/versions".

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-08 19:48:10 -04:00
J. Bruce Fields f0f51f5cdd nfsd4: allow destroy_session over destroyed session
RFC 5661 allows a client to destroy a session using a compound
associated with the destroyed session, as long as the DESTROY_SESSION op
is the last op of the compound.

We attempt to allow this, but testing against a Solaris client (which
does destroy sessions in this way) showed that we were failing the
DESTROY_SESSION with NFS4ERR_DELAY, because we assumed the reference
count on the session (held by us) represented another rpc in progress
over this session.

Fix this by noting that in this case the expected reference count is 1,
not 0.

Also, note as long as the session holds a reference to the compound
we're destroying, we can't free it here--instead, delay the free till
the final put in nfs4svc_encode_compoundres.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-08 19:46:38 -04:00
J. Bruce Fields d08d32e6e5 nfsd4: return delegation immediately if lease fails
This case shouldn't happen--the administrator shouldn't really allow
other applications access to the export until clients have had the
chance to reclaim their state--but if it does then we should set the
"return this lease immediately" bit on the reply.  That still leaves
some small races, but it's the best the protocol allows us to do in the
case a lease is ripped out from under us....

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-01 17:32:07 -04:00
J. Bruce Fields 0a262ffb75 nfsd4: do not throw away 4.1 lock state on last unlock
This reverts commit eb2099f31b "nfsd4:
release lockowners on last unlock in 4.1 case".  Trond identified
language in rfc 5661 section 8.2.4 which forbids this behavior:

	Stateids associated with byte-range locks are an exception.
	They remain valid even if a LOCKU frees all remaining locks, so
	long as the open file with which they are associated remains
	open, unless the client frees the stateids via the FREE_STATEID
	operation.

And bakeathon 2013 testing found a 4.1 freebsd client was getting an
incorrect BAD_STATEID return from a FREE_STATEID in the above situation
and then failing.

The spec language honestly was probably a mistake but at this point with
implementations already following it we're probably stuck with that.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-01 17:32:06 -04:00
J. Bruce Fields 89f6c3362c nfsd4: delegation-based open reclaims should bypass permissions
We saw a v4.0 client's create fail as follows:

	- open create succeeds and gets a read delegation
	- client attempts to set mode on new file, gets DELAY while
	  server recalls delegation.
	- client attempts a CLAIM_DELEGATE_CUR open using the
	  delegation, gets error because of new file mode.

This probably can't happen on a recent kernel since we're no longer
giving out delegations on create opens.  Nevertheless, it's a
bug--reclaim opens should bypass permission checks.

Reported-by: Steve Dickson <steved@redhat.com>
Reported-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-01 17:32:05 -04:00
J. Bruce Fields 590b743143 nfsd4: minor read_buf cleanup
The code to step to the next page seems reasonably self-contained.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-01 17:32:03 -04:00
J. Bruce Fields 247500820e nfsd4: fix decoding of compounds across page boundaries
A freebsd NFSv4.0 client was getting rare IO errors expanding a tarball.
A network trace showed the server returning BAD_XDR on the final getattr
of a getattr+write+getattr compound.  The final getattr started on a
page boundary.

I believe the Linux client ignores errors on the post-write getattr, and
that that's why we haven't seen this before.

Cc: stable@vger.kernel.org
Reported-by: Rick Macklem <rmacklem@uoguelph.ca>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-01 17:29:40 -04:00
J. Bruce Fields 99c415156c nfsd4: clean up nfs4_open_delegation
The nfs4_open_delegation logic is unecessarily baroque.

Also stop pretending we support write delegations in several places.

Some day we will support write delegations, but when that happens adding
back in these flag parameters will be the easy part.  For now they're
just confusing.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-01 17:23:07 -04:00
Steve Dickson 9a0590aec3 NFSD: Don't give out read delegations on creates
When an exclusive create is done with the mode bits
set (aka open(testfile, O_CREAT | O_EXCL, 0777)) this
causes a OPEN op followed by a SETATTR op. When a
read delegation is given in the OPEN, it causes
the SETATTR to delay with EAGAIN until the
delegation is recalled.

This patch caused exclusive creates to give out
a write delegation (which turn into no delegation)
which allows the SETATTR seamlessly succeed.

Signed-off-by: Steve Dickson <steved@redhat.com>
[bfields: do this for any CREATE, not just exclusive; comment]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-01 17:23:07 -04:00
J. Bruce Fields 57569a7070 nfsd4: allow client to send no cb_sec flavors
In testing I notice that some of the pynfs tests forget to send any
cb_sec flavors, and that we haven't necessarily errored out in that case
before.

I'll fix pynfs, but am also inclined to default to trying AUTH_NONE in
that case in case this is something clients actually do.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-01 17:23:07 -04:00
J. Bruce Fields b78724b705 nfsd4: fail attempts to request gss on the backchannel
We don't support gss on the backchannel.  We should state that fact up
front rather than just letting things continue and later making the
client try to figure out why the backchannel isn't working.

Trond suggested instead returning NFS4ERR_NOENT.  I think it would be
tricky for the client to distinguish between the case "I don't support
gss on the backchannel" and "I can't find that in my cache, please
create another context and try that instead", and I'd prefer something
that currently doesn't have any other meaning for this operation, hence
the (somewhat arbitrary) NFS4ERR_ENCR_ALG_UNSUPP.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-01 17:23:06 -04:00
J. Bruce Fields 57266a6e91 nfsd4: implement minimal SP4_MACH_CRED
Do a minimal SP4_MACH_CRED implementation suggested by Trond, ignoring
the client-provided spo_must_* arrays and just enforcing credential
checks for the minimum required operations.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-01 17:23:06 -04:00
J. Bruce Fields 0dc1531aca svcrpc: store gss mech in svc_cred
Store a pointer to the gss mechanism used in the rq_cred and cl_cred.
This will make it easier to enforce SP4_MACH_CRED, which needs to
compare the mechanism used on the exchange_id with that used on
protected operations.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-01 17:23:06 -04:00
Jeff Layton 1c8c601a8c locks: protect most of the file_lock handling with i_lock
Having a global lock that protects all of this code is a clear
scalability problem. Instead of doing that, move most of the code to be
protected by the i_lock instead. The exceptions are the global lists
that the ->fl_link sits on, and the ->fl_block list.

->fl_link is what connects these structures to the
global lists, so we must ensure that we hold those locks when iterating
over or updating these lists.

Furthermore, sound deadlock detection requires that we hold the
blocked_list state steady while checking for loops. We also must ensure
that the search and update to the list are atomic.

For the checking and insertion side of the blocked_list, push the
acquisition of the global lock into __posix_lock_file and ensure that
checking and update of the  blocked_list is done without dropping the
lock in between.

On the removal side, when waking up blocked lock waiters, take the
global lock before walking the blocked list and dequeue the waiters from
the global list prior to removal from the fl_block list.

With this, deadlock detection should be race free while we minimize
excessive file_lock_lock thrashing.

Finally, in order to avoid a lock inversion problem when handling
/proc/locks output we must ensure that manipulations of the fl_block
list are also protected by the file_lock_lock.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:42 +04:00
Al Viro ac6614b764 [readdir] constify ->actor
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:05 +04:00
Al Viro bb6f619b3a [readdir] introduce ->iterate(), ctx->pos, dir_emit()
New method - ->iterate(file, ctx).  That's the replacement for ->readdir();
it takes callback from ctx->actor, uses ctx->pos instead of file->f_pos and
calls dir_emit(ctx, ...) instead of filldir(data, ...).  It does *not*
update file->f_pos (or look at it, for that matter); iterate_dir() does the
update.

Note that dir_emit() takes the offset from ctx->pos (and eventually
filldir_t will lose that argument).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:47 +04:00
Al Viro 5c0ba4e076 [readdir] introduce iterate_dir() and dir_context
iterate_dir(): new helper, replacing vfs_readdir().

struct dir_context: contains the readdir callback (and will get more stuff
in it), embedded into whatever data that callback wants to deal with;
eventually, we'll be passing it to ->readdir() replacement instead of
(data,filldir) pair.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:46 +04:00
Jim Rees 1a9357f443 nfsd: avoid undefined signed overflow
In C, signed integer overflow results in undefined behavior, but unsigned
overflow wraps around. So do the subtraction first, then cast to signed.

Reported-by: Joakim Tjernlund <joakim.tjernlund@transmode.se>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-05-21 11:02:03 -04:00
J. Bruce Fields ba4e55bb67 nfsd4: fix compile in !CONFIG_NFSD_V4_SECURITY_LABEL case
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-05-15 10:35:18 -04:00
David Quigley 18032ca062 NFSD: Server implementation of MAC Labeling
Implement labeled NFS on the server: encoding and decoding, and writing
and reading, of file labels.

Enabled with CONFIG_NFSD_V4_SECURITY_LABEL.

Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-05-15 09:27:02 -04:00
Steve Dickson 4bdc33ed5b NFSDv4.2: Add NFS v4.2 support to the NFS server
This enables NFSv4.2 support for the server. To enable this
code do the following:
  echo "+4.2" >/proc/fs/nfsd/versions

after the nfsd kernel module is loaded.

On its own this does nothing except allow the server to respond to
compounds with minorversion set to 2.  All the new NFSv4.2 features are
optional, so this is perfectly legal.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-05-13 10:11:47 -04:00
J. Bruce Fields 4f540e29dc nfsd4: store correct client minorversion for >=4.2
This code assumes that any client using exchange_id is using NFSv4.1,
but with the introduction of 4.2 that will no longer true.

This main effect of this is that client callbacks will use the same
minorversion as that used on the exchange_id.

Note that clients are forbidden from mixing 4.1 and 4.2 compounds.  (See
rfc 5661, section 2.7, #13: "A client MUST NOT attempt to use a stateid,
filehandle, or similar returned object from the COMPOUND procedure with
minor version X for another COMPOUND procedure with minor version Y,
where X != Y.")  However, we do not currently attempt to enforce this
except in the case of mixing zero minor version with non-zero minor
versions.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-05-13 10:11:45 -04:00
Zhao Hongjiang eb48241bb4 nfsd: get rid of the unused functions in vfs
The fh_lock_parent(), nfsd_truncate(), nfsd_notify_change() and
nfsd_sync_dir() fuctions are neither implemented nor used, just remove
them.

Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-05-13 10:11:44 -04:00
Steve Dickson 4488cc96c5 NFS: Add NFSv4.2 protocol constants
Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-05-13 10:05:43 -04:00
Linus Torvalds 2dbd3cac87 Merge branch 'for-3.10' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
 "Small fixes for two bugs and two warnings"

* 'for-3.10' of git://linux-nfs.org/~bfields/linux:
  nfsd: fix oops when legacy_recdir_name_error is passed a -ENOENT error
  SUNRPC: fix decoding of optional gss-proxy xdr fields
  SUNRPC: Refactor gssx_dec_option_array() to kill uninitialized warning
  nfsd4: don't allow owner override on 4.1 CLAIM_FH opens
2013-05-10 09:28:55 -07:00
Jeff Layton 7255e716b1 nfsd: fix oops when legacy_recdir_name_error is passed a -ENOENT error
Toralf reported the following oops to the linux-nfs mailing list:

    -----------------[snip]------------------
    NFSD: unable to generate recoverydir name (-2).
    NFSD: disabling legacy clientid tracking. Reboot recovery will not function correctly!
    BUG: unable to handle kernel NULL pointer dereference at 000003c8
    IP: [<f90a3d91>] nfsd4_client_tracking_exit+0x11/0x50 [nfsd]
    *pdpt = 000000002ba33001 *pde = 0000000000000000
    Oops: 0000 [#1] SMP
    Modules linked in: loop nfsd auth_rpcgss ipt_MASQUERADE xt_owner xt_multiport ipt_REJECT xt_tcpudp xt_recent xt_conntrack nf_conntrack_ftp xt_limit xt_LOG iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables af_packet pppoe pppox ppp_generic slhc bridge stp llc tun arc4 iwldvm mac80211 coretemp kvm_intel uvcvideo sdhci_pci sdhci mmc_core videobuf2_vmalloc videobuf2_memops usblp videobuf2_core i915 iwlwifi psmouse videodev cfg80211 kvm fbcon bitblit cfbfillrect acpi_cpufreq mperf evdev softcursor font cfbimgblt i2c_algo_bit cfbcopyarea intel_agp intel_gtt drm_kms_helper snd_hda_codec_conexant drm agpgart fb fbdev tpm_tis thinkpad_acpi tpm nvram e1000e rfkill thermal ptp wmi pps_core tpm_bios 8250_pci processor 8250 ac snd_hda_intel snd_hda_codec snd_pcm battery video i2c_i801 snd_page_alloc snd_timer button serial_core i2c_core snd soundcore thermal_sys hwmon aesni_intel ablk_helper cryp
td lrw aes_i586 xts gf128mul cbc fuse nfs lockd sunrpc dm_crypt dm_mod hid_monterey hid_microsoft hid_logitech hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech hid_generic usbhid hid sr_mod cdrom sg [last unloaded: microcode]
    Pid: 6374, comm: nfsd Not tainted 3.9.1 #6 LENOVO 4180F65/4180F65
    EIP: 0060:[<f90a3d91>] EFLAGS: 00010202 CPU: 0
    EIP is at nfsd4_client_tracking_exit+0x11/0x50 [nfsd]
    EAX: 00000000 EBX: fffffffe ECX: 00000007 EDX: 00000007
    ESI: eb9dcb00 EDI: eb2991c0 EBP: eb2bde38 ESP: eb2bde34
    DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
    CR0: 80050033 CR2: 000003c8 CR3: 2ba80000 CR4: 000407f0
    DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    DR6: ffff0ff0 DR7: 00000400
    Process nfsd (pid: 6374, ti=eb2bc000 task=eb2711c0 task.ti=eb2bc000)
    Stack:
    fffffffe eb2bde4c f90a3e0c f90a7754 fffffffe eb0a9c00 eb2bdea0 f90a41ed
    eb2991c0 1b270000 eb2991c0 eb2bde7c f9099ce9 eb2bde98 0129a020 eb29a020
    eb2bdecc eb2991c0 eb2bdea8 f9099da5 00000000 eb9dcb00 00000001 67822f08
    Call Trace:
    [<f90a3e0c>] legacy_recdir_name_error+0x3c/0x40 [nfsd]
    [<f90a41ed>] nfsd4_create_clid_dir+0x15d/0x1c0 [nfsd]
    [<f9099ce9>] ? nfsd4_lookup_stateid+0x99/0xd0 [nfsd]
    [<f9099da5>] ? nfs4_preprocess_seqid_op+0x85/0x100 [nfsd]
    [<f90a4287>] nfsd4_client_record_create+0x37/0x50 [nfsd]
    [<f909d6ce>] nfsd4_open_confirm+0xfe/0x130 [nfsd]
    [<f90980b1>] ? nfsd4_encode_operation+0x61/0x90 [nfsd]
    [<f909d5d0>] ? nfsd4_free_stateid+0xc0/0xc0 [nfsd]
    [<f908fd0b>] nfsd4_proc_compound+0x41b/0x530 [nfsd]
    [<f9081b7b>] nfsd_dispatch+0x8b/0x1a0 [nfsd]
    [<f857b85d>] svc_process+0x3dd/0x640 [sunrpc]
    [<f908165d>] nfsd+0xad/0x110 [nfsd]
    [<f90815b0>] ? nfsd_destroy+0x70/0x70 [nfsd]
    [<c1054824>] kthread+0x94/0xa0
    [<c1486937>] ret_from_kernel_thread+0x1b/0x28
    [<c1054790>] ? flush_kthread_work+0xd0/0xd0
    Code: 86 b0 00 00 00 90 c5 0a f9 c7 04 24 70 76 0a f9 e8 74 a9 3d c8 eb ba 8d 76 00 55 89 e5 53 66 66 66 66 90 8b 15 68 c7 0a f9 85 d2 <8b> 88 c8 03 00 00 74 2c 3b 11 77 28 8b 5c 91 08 85 db 74 22 8b
    EIP: [<f90a3d91>] nfsd4_client_tracking_exit+0x11/0x50 [nfsd] SS:ESP 0068:eb2bde34
    CR2: 00000000000003c8
    ---[ end trace 09e54015d145c9c6 ]---

The problem appears to be a regression that was introduced in commit
9a9c6478 "nfsd: make NFSv4 recovery client tracking options per net".
Prior to that commit, it was safe to pass a NULL net pointer to
nfsd4_client_tracking_exit in the legacy recdir case, and
legacy_recdir_name_error did so. After that comit, the net pointer must
be valid.

This patch just fixes legacy_recdir_name_error to pass in a valid net
pointer to that function.

Cc: <stable@vger.kernel.org> # v3.8+
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Reported-and-tested-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-05-09 12:58:36 -04:00
J. Bruce Fields 9f415eb255 nfsd4: don't allow owner override on 4.1 CLAIM_FH opens
The Linux client is using CLAIM_FH to implement regular opens, not just
recovery cases, so it depends on the server to check permissions
correctly.

Therefore the owner override, which may make sense in the delegation
recovery case, isn't right in the CLAIM_FH case.

Symptoms: on a client with 49f9a0fafd
"NFSv4.1: Enable open-by-filehandle", Bryan noticed this:

	touch test.txt
	chmod 000 test.txt
	echo test > test.txt

succeeding.

Cc: stable@kernel.org
Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-05-03 16:36:14 -04:00
Linus Torvalds 1db772216f Merge branch 'for-3.10' of git://linux-nfs.org/~bfields/linux
Pull nfsd changes from J Bruce Fields:
 "Highlights include:

   - Some more DRC cleanup and performance work from Jeff Layton

   - A gss-proxy upcall from Simo Sorce: currently krb5 mounts to the
     server using credentials from Active Directory often fail due to
     limitations of the svcgssd upcall interface.  This replacement
     lifts those limitations.  The existing upcall is still supported
     for backwards compatibility.

   - More NFSv4.1 support: at this point, if a user with a current
     client who upgrades from 4.0 to 4.1 should see no regressions.  In
     theory we do everything a 4.1 server is required to do.  Patches
     for a couple minor exceptions are ready for 3.11, and with those
     and some more testing I'd like to turn 4.1 on by default in 3.11."

Fix up semantic conflict as per Stephen Rothwell and linux-next:

Commit 030d794bf4 ("SUNRPC: Use gssproxy upcall for server RPCGSS
authentication") adds two new users of "PDE(inode)->data", but we're
supposed to use "PDE_DATA(inode)" instead since commit d9dda78bad
("procfs: new helper - PDE_DATA(inode)").

The old PDE() macro is no longer available since commit c30480b92c
("proc: Make the PROC_I() and PDE() macros internal to procfs")

* 'for-3.10' of git://linux-nfs.org/~bfields/linux: (60 commits)
  NFSD: SECINFO doesn't handle unsupported pseudoflavors correctly
  NFSD: Simplify GSS flavor encoding in nfsd4_do_encode_secinfo()
  nfsd: make symbol nfsd_reply_cache_shrinker static
  svcauth_gss: fix error return code in rsc_parse()
  nfsd4: don't remap EISDIR errors in rename
  svcrpc: fix gss-proxy to respect user namespaces
  SUNRPC: gssp_procedures[] can be static
  SUNRPC: define {create,destroy}_use_gss_proxy_proc_entry in !PROC case
  nfsd4: better error return to indicate SSV non-support
  nfsd: fix EXDEV checking in rename
  SUNRPC: Use gssproxy upcall for server RPCGSS authentication.
  SUNRPC: Add RPC based upcall mechanism for RPCGSS auth
  SUNRPC: conditionally return endtime from import_sec_context
  SUNRPC: allow disabling idle timeout
  SUNRPC: attempt AF_LOCAL connect on setup
  nfsd: Decode and send 64bit time values
  nfsd4: put_client_renew_locked can be static
  nfsd4: remove unused macro
  nfsd4: remove some useless code
  nfsd4: implement SEQ4_STATUS_RECALLABLE_STATE_REVOKED
  ...
2013-05-03 10:59:39 -07:00
Linus Torvalds 20b4fb4852 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS updates from Al Viro,

Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).

7kloc removed.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
  don't bother with deferred freeing of fdtables
  proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
  proc: Make the PROC_I() and PDE() macros internal to procfs
  proc: Supply a function to remove a proc entry by PDE
  take cgroup_open() and cpuset_open() to fs/proc/base.c
  ppc: Clean up scanlog
  ppc: Clean up rtas_flash driver somewhat
  hostap: proc: Use remove_proc_subtree()
  drm: proc: Use remove_proc_subtree()
  drm: proc: Use minor->index to label things, not PDE->name
  drm: Constify drm_proc_list[]
  zoran: Don't print proc_dir_entry data in debug
  reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
  proc: Supply an accessor for getting the data from a PDE's parent
  airo: Use remove_proc_subtree()
  rtl8192u: Don't need to save device proc dir PDE
  rtl8187se: Use a dir under /proc/net/r8180/
  proc: Add proc_mkdir_data()
  proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
  proc: Move PDE_NET() to fs/proc/proc_net.c
  ...
2013-05-01 17:51:54 -07:00
Chuck Lever 676e4ebd5f NFSD: SECINFO doesn't handle unsupported pseudoflavors correctly
If nfsd4_do_encode_secinfo() can't find GSS info that matches an
export security flavor, it assumes the flavor is not a GSS
pseudoflavor, and simply puts it on the wire.

However, if this XDR encoding logic is given a legitimate GSS
pseudoflavor but the RPC layer says it does not support that
pseudoflavor for some reason, then the server leaks GSS pseudoflavor
numbers onto the wire.

I confirmed this happens by blacklisting rpcsec_gss_krb5, then
attempted a client transition from the pseudo-fs to a Kerberos-only
share.  The client received a flavor list containing the Kerberos
pseudoflavor numbers, rather than GSS tuples.

The encoder logic can check that each pseudoflavor in flavs[] is
less than MAXFLAVOR before writing it into the buffer, to prevent
this.  But after "nflavs" is written into the XDR buffer, the
encoder can't skip writing flavor information into the buffer when
it discovers the RPC layer doesn't support that flavor.

So count the number of valid flavors as they are written into the
XDR buffer, then write that count into a placeholder in the XDR
buffer when all recognized flavors have been encoded.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-30 19:18:21 -04:00
Chuck Lever ed9411a004 NFSD: Simplify GSS flavor encoding in nfsd4_do_encode_secinfo()
Clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-30 19:18:21 -04:00
Wei Yongjun c8c797f9fd nfsd: make symbol nfsd_reply_cache_shrinker static
symbol 'nfsd_reply_cache_shrinker' only used within this file. It should
be static.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-30 18:19:34 -04:00
J. Bruce Fields 2a6cf944c2 nfsd4: don't remap EISDIR errors in rename
We're going out of our way here to remap an error to make rfc 3530
happy--but the rfc itself (nor rfc 1813, which has similar language)
gives no justification.  And disagrees with local filesystem behavior,
with Linux and posix man pages, and knfsd's implemented behavior for v2
and v3.

And the documented behavior seems better, in that it gives a little more
information--you could implement the 3530 behavior using the posix
behavior, but not the other way around.

Also, the Linux client makes no attempt to remap this error in the v4
case, so it can end up just returning EEXIST to the application in a
case where it should return EISDIR.

So honestly I think the rfc's are just buggy here--or in any case it
doesn't see worth the trouble to remap this error.

Reported-by: Frank S Filz <ffilz@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-30 15:44:20 -04:00
Linus Torvalds 8728f986fe NFS client bugfixes and cleanups for 3.10
- NLM: stable fix for NFSv2/v3 blocking locks
 - NFSv4.x: stable fixes for the delegation recall error handling code
 - NFSv4.x: Security flavour negotiation fixes and cleanups by Chuck Lever
 - SUNRPC: A number of RPCSEC_GSS fixes and cleanups also from Chuck
 - NFSv4.x assorted state management and reboot recovery bugfixes
 - NFSv4.1: In cases where we have already looked up a file, and hold a
   valid filehandle, use the new open-by-filehandle operation instead of
   opening by name.
 - Allow the NFSv4.1 callback thread to freeze
 - NFSv4.x: ensure that file unlock waits for readahead to complete
 - NFSv4.1: ensure that the RPC layer doesn't override the NFS session
   table size negotiation by limiting the number of slots.
 - NFSv4.x: Fix SETATTR spec compatibility issues
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRfu+cAAoJEGcL54qWCgDylxkP/24cXOLHMKMYnIab0cQYIW2m
 SQgADGE+MqgTVlWjGVWublVMDY1R51iINsksAjxMtXYt50FdBJEqV2uxIGi4VnbR
 nR9eppqQ6vk5e6r5+aZyVmWKoLFnJ4MBF6OpPUZB5mf8iH/fiixmSYLvseopPdDj
 bjHwCxg+xEgew5EhQF/xqkEfkAp2NN84xUksTWb9uDIW2c3SJweY/ZVR2Zsqpugm
 oqYVtrSLvNKqINQG8OP10s+mRWULwoqapF+kEHlxNbRy26C0zlbXPaneSgYzqHsY
 OyRkAT7uJJqStYlqdW7k+DhyNMB+T33WAGJpWQlfJGYk5d/n0rtBJDVo0hfhCSQr
 VkOXiO9J08NMbelCu4+0CJii7h5GCaqpuJEEmNL6AlF/TJVkIQJuRaG2+WDmEtO2
 oYd4UfXlAbUuts1SW7u/yyN/yrjVTm1tZYRBqn2VJdqh1s8dMxEWPct2Yn314mpS
 ODAnbDkEhtWlc9cloSRnwKec5WcxMZb19IJeK9ZvHm7PfIu/QHtj6Ren8s1//bZI
 OMQxC/Vf/wcjMdNtr7QdMNxWG1aK8DL9mYP5XwCrkZ560LIrtxmhyqYeoGAfgO5u
 +K/gKmQwjsaPhEa8jbP2/wI0II9yKPWj/fVwqhbhqaBUx5GA2iAKcdpPP6JAMAti
 +PXkLTtkyrIgSNwzl63S
 =Hgot
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes and cleanups from Trond Myklebust:

 - NLM: stable fix for NFSv2/v3 blocking locks

 - NFSv4.x: stable fixes for the delegation recall error handling code

 - NFSv4.x: Security flavour negotiation fixes and cleanups by Chuck
   Lever

 - SUNRPC: A number of RPCSEC_GSS fixes and cleanups also from Chuck

 - NFSv4.x assorted state management and reboot recovery bugfixes

 - NFSv4.1: In cases where we have already looked up a file, and hold a
   valid filehandle, use the new open-by-filehandle operation instead of
   opening by name.

 - Allow the NFSv4.1 callback thread to freeze

 - NFSv4.x: ensure that file unlock waits for readahead to complete

 - NFSv4.1: ensure that the RPC layer doesn't override the NFS session
   table size negotiation by limiting the number of slots.

 - NFSv4.x: Fix SETATTR spec compatibility issues

* tag 'nfs-for-3.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (67 commits)
  NFSv4: Warn once about servers that incorrectly apply open mode to setattr
  NFSv4: Servers should only check SETATTR stateid open mode on size change
  NFSv4: Don't recheck permissions on open in case of recovery cached open
  NFSv4.1: Don't do a delegated open for NFS4_OPEN_CLAIM_DELEG_CUR_FH modes
  NFSv4.1: Use the more efficient open_noattr call for open-by-filehandle
  NFS: Retry SETCLIENTID with AUTH_SYS instead of AUTH_NONE
  NFSv4: Ensure that we clear the NFS_OPEN_STATE flag when appropriate
  LOCKD: Ensure that nlmclnt_block resets block->b_status after a server reboot
  NFSv4: Ensure the LOCK call cannot use the delegation stateid
  NFSv4: Use the open stateid if the delegation has the wrong mode
  nfs: Send atime and mtime as a 64bit value
  NFSv4: Record the OPEN create mode used in the nfs4_opendata structure
  NFSv4.1: Set the RPC_CLNT_CREATE_INFINITE_SLOTS flag for NFSv4.1 transports
  SUNRPC: Allow rpc_create() to request that TCP slots be unlimited
  SUNRPC: Fix a livelock problem in the xprt->backlog queue
  NFSv4: Fix handling of revoked delegations by setattr
  NFSv4 release the sequence id in the return on close case
  nfs: remove unnecessary check for NULL inode->i_flock from nfs_delegation_claim_locks
  NFS: Ensure that NFS file unlock waits for readahead to complete
  NFS: Add functionality to allow waiting on all outstanding reads to complete
  ...
2013-04-30 11:28:08 -07:00
Jeff Layton 398c33aaa4 nfsd: convert nfs4_alloc_stid() to use idr_alloc_cyclic()
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:41 -07:00
J. Bruce Fields b1df763723 Merge branch 'nfs-for-next' of git://linux-nfs.org/~trondmy/nfs-2.6 into for-3.10
Note conflict: Chuck's patches modified (and made static)
gss_mech_get_by_OID, which is still needed by gss-proxy patches.

The conflict resolution is a bit minimal; we may want some more cleanup.
2013-04-29 16:23:34 -04:00
J. Bruce Fields dd30333cf5 nfsd4: better error return to indicate SSV non-support
As 4.1 becomes less experimental and SSV still isn't implemented, we
have to admit it's not going to be, and return some sensible error
rather than just saying "our server's broken".  Discussion in the ietf
group hasn't turned up any objections to using NFS4ERR_ENC_ALG_UNSUPP
for that purpose.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-26 16:18:15 -04:00
J. Bruce Fields aa387d6ce1 nfsd: fix EXDEV checking in rename
We again check for the EXDEV a little later on, so the first check is
redundant.  This check is also slightly racier, since a badly timed
eviction from the export cache could leave us with the two fh_export
pointers pointing to two different cache entries which each refer to the
same underlying export.

It's better to compare vfsmounts as the later check does, but that
leaves a minor security hole in the case where the two exports refer to
two different directories especially if (for example) they have
different root-squashing options.

So, compare ex_path.dentry too.

Reported-by: Joe Habermann <joe.habermann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-26 16:18:15 -04:00
Bryan Schumaker bf8d909705 nfsd: Decode and send 64bit time values
The seconds field of an nfstime4 structure is 64bit, but we are assuming
that the first 32bits are zero-filled.  So if the client tries to set
atime to a value before the epoch (touch -t 196001010101), then the
server will save the wrong value on disk.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-23 14:49:37 -04:00
Fengguang Wu ba138435d1 nfsd4: put_client_renew_locked can be static
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-16 22:15:00 -04:00
J. Bruce Fields 9aeb5aeeb0 nfsd4: remove unused macro
Cleanup a piece I forgot to remove in
9411b1d4c7 "nfsd4: cleanup handling of
nfsv4.0 closed stateid's".

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-16 21:51:55 -04:00
fanchaoting 53584f6652 nfsd4: remove some useless code
The "list_empty(&oo->oo_owner.so_stateids)" is aways true, so remove it.

Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-16 10:59:31 -04:00
J. Bruce Fields 3bd64a5ba1 nfsd4: implement SEQ4_STATUS_RECALLABLE_STATE_REVOKED
A 4.1 server must notify a client that has had any state revoked using
the SEQ4_STATUS_RECALLABLE_STATE_REVOKED flag.  The client can figure
out exactly which state is the problem using CHECK_STATEID and then free
it using FREE_STATEID.  The status flag will be unset once all such
revoked stateids are freed.

Our server's only recallable state is delegations.  So we keep with each
4.1 client a list of delegations that have timed out and been recalled,
but haven't yet been freed by FREE_STATEID.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-16 10:59:30 -04:00
J. Bruce Fields 23340032e6 nfsd4: clean up validate_stateid
The logic here is better expressed with a switch statement.

While we're here, CLOSED stateids (or stateids of an unkown type--which
would indicate a server bug) should probably return nfserr_bad_stateid,
though this behavior shouldn't affect any non-buggy client.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-09 17:42:28 -04:00
J. Bruce Fields 06b332a522 nfsd4: check backchannel attributes on create_session
Make sure the client gives us an adequate backchannel.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-09 16:53:56 -04:00
J. Bruce Fields 55c760cfc4 nfsd4: fix forechannel attribute negotiation
Negotiation of the 4.1 session forechannel attributes is a mess.  Fix:

	- Move it all into check_forechannel_attrs instead of spreading
	  it between that, alloc_session, and init_forechannel_attrs.
	- set a minimum "slotsize" so that our drc memory limits apply
	  even for small maxresponsesize_cached.  This also fixes some
	  bugs when slotsize becomes <= 0.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-09 16:43:44 -04:00
J. Bruce Fields 373cd4098a nfsd4: cleanup check_forechannel_attrs
Pass this struct by reference, not by value, and return an error instead
of a boolean to allow for future additions.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-09 15:49:50 -04:00
Al Viro 75ef9de126 constify a bunch of struct file_operations instances
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 14:16:20 -04:00
J. Bruce Fields 0c7c3e67ab nfsd4: don't close read-write opens too soon
Don't actually close any opens until we don't need them at all.

This means being left with write access when it's not really necessary,
but that's better than putting a file that might still have posix locks
held on it, as we have been.

Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-09 09:08:57 -04:00
J. Bruce Fields eb2099f31b nfsd4: release lockowners on last unlock in 4.1 case
In the 4.1 case we're supposed to release lockowners as soon as they're
no longer used.

It would probably be more efficient to reference count them, but that's
slightly fiddly due to the need to have callbacks from locks.c to take
into account lock merging and splitting.

For most cases just scanning the inode's lock list on unlock for
matching locks will be sufficient.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-09 09:08:56 -04:00
J. Bruce Fields bbc9c36c31 nfsd4: more sessions/open-owner-replay cleanup
More logic that's unnecessary in the 4.1 case.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-09 09:08:56 -04:00
J. Bruce Fields 3d74e6a5b6 nfsd4: no need for replay_owner in sessions case
The replay_owner will never be used in the sessions case.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-09 09:08:55 -04:00
J. Bruce Fields c383747ef6 nfsd4: remove some redundant comments
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-09 09:08:54 -04:00
Wei Yongjun 2c44a23471 nfsd: use kmem_cache_free() instead of kfree()
memory allocated by kmem_cache_alloc() should be freed using
kmem_cache_free(), not kfree().

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-09 09:08:47 -04:00
J. Bruce Fields 9411b1d4c7 nfsd4: cleanup handling of nfsv4.0 closed stateid's
Closed stateid's are kept around a little while to handle close replays
in the 4.0 case.  So we stash them in the last-used stateid in the
oo_last_closed_stateid field of the open owner.  We can free that in
encode_seqid_op_tail once the seqid on the open owner is next
incremented.  But we don't want to do that on the close itself; so we
set NFS4_OO_PURGE_CLOSE flag set on the open owner, skip freeing it the
first time through encode_seqid_op_tail, then when we see that flag set
next time we free it.

This is unnecessarily baroque.

Instead, just move the logic that increments the seqid out of the xdr
code and into the operation code itself.

The justification given for the current placement is that we need to
wait till the last minute to be sure we know whether the status is a
sequence-id-mutating error or not, but examination of the code shows
that can't actually happen.

Reported-by: Yanchuan Nian <ycnian@gmail.com>
Tested-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-08 09:55:32 -04:00
J. Bruce Fields 41d22663cb nfsd4: remove unused nfs4_check_deleg argument
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-04 13:25:17 -04:00
J. Bruce Fields e8c69d17d1 nfsd4: make del_recall_lru per-network-namespace
If nothing else this simplifies the nfs4_state_shutdown_net logic a tad.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-04 13:25:16 -04:00
J. Bruce Fields 68a3396178 nfsd4: shut down more of delegation earlier
Once we've unhashed the delegation, it's only hanging around for the
benefit of an oustanding recall, which only needs the encoded
filehandle, stateid, and dl_retries counter.  No point keeping the file
around any longer, or keeping it hashed.

This also fixes a race: calls to idr_remove should really be serialized
by the caller, but the nfs4_put_delegation call from the callback code
isn't taking the state lock.

(Better might be to cancel the callback before destroying the
delegation, and remove any need for reference counting--but I don't see
an easy way to cancel an rpc call.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-04 13:25:15 -04:00
J. Bruce Fields 8be2d2344c nfsd4: minor cb_recall simplification
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-04 13:25:14 -04:00
fanchaoting ff7c4b3693 nfsd: remove /proc/fs/nfs when create /proc/fs/nfs/exports error
when create /proc/fs/nfs/exports error, we should remove /proc/fs/nfs,
if don't do it, it maybe cause Memory leak.

 Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com>
 Reviewed-by: chendt.fnst <chendt.fnst@cn.fujitsu.com>

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 15:30:07 -04:00
fanchaoting b022032e19 nfsd: don't run get_file if nfs4_preprocess_stateid_op return error
we should return error status directly when nfs4_preprocess_stateid_op
return error.

Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 15:19:06 -04:00
Jeff Layton 89876f8c0d nfsd: convert the file_hashtbl to a hlist
We only ever traverse the hash chains in the forward direction, so a
double pointer list head isn't really necessary.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 15:11:04 -04:00
J. Bruce Fields 66b2b9b2b0 nfsd4: don't destroy in-use session
This changes session destruction to be similar to client destruction in
that attempts to destroy a session while in use (which should be rare
corner cases) result in DELAY.  This simplifies things somewhat and
helps meet a coming 4.2 requirement.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:48:40 -04:00
J. Bruce Fields 221a687669 nfsd4: don't destroy in-use clients
When a setclientid_confirm or create_session confirms a client after a
client reboot, it also destroys any previous state held by that client.

The shutdown of that previous state must be careful not to free the
client out from under threads processing other requests that refer to
the client.

This is a particular problem in the NFSv4.1 case when we hold a
reference to a session (hence a client) throughout compound processing.

The server attempts to handle this by unhashing the client at the time
it's destroyed, then delaying the final free to the end.  But this still
leaves some races in the current code.

I believe it's simpler just to fail the attempt to destroy the client by
returning NFS4ERR_DELAY.  This is a case that should never happen
anyway.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:48:39 -04:00
J. Bruce Fields 4f6e6c1773 nfsd4: simplify bind_conn_to_session locking
The locking here is very fiddly, and there's no reason for us to be
setting cstate->session, since this is the only op in the compound.
Let's just take the state lock and drop the reference counting.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:48:39 -04:00
J. Bruce Fields abcdff09a0 nfsd4: fix destroy_session race
destroy_session uses the session and client without continuously holding
any reference or locks.

Put the whole thing under the state lock for now.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:48:38 -04:00
J. Bruce Fields bfa85e83a8 nfsd4: clientid lookup cleanup
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:48:37 -04:00
J. Bruce Fields c0293b0131 nfsd4: destroy_clientid simplification
I'm not sure what the check for clientid expiry was meant to do here.

The check for a matching session is redundant given the previous check
for state: a client without state is, in particular, a client without
sessions.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:48:36 -04:00
J. Bruce Fields 1ca507920d nfsd4: remove some dprintk's
E.g. printk's that just report the return value from an op are
uninteresting as we already do that in the main proc_compound loop.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:48:36 -04:00
J. Bruce Fields 0eb6f20aa5 nfsd4: STALE_STATEID cleanup
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:48:35 -04:00
J. Bruce Fields 78389046f7 nfsd4: warn on odd create_session state
This should never happen.

(Note: the comparable case in setclientid_confirm *can* happen, since
updating a client record can result in both confirmed and unconfirmed
records with the same clientid.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:48:34 -04:00
ycnian@gmail.com 491402a787 nfsd: fix bug on nfs4 stateid deallocation
NFS4_OO_PURGE_CLOSE is not handled properly. To avoid memory leak, nfs4
stateid which is pointed by oo_last_closed_stid is freed in nfsd4_close(),
but NFS4_OO_PURGE_CLOSE isn't cleared meanwhile. So the stateid released in
THIS close procedure may be freed immediately in the coming encoding function.
Sorry that Signed-off-by was forgotten in last version.

Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:48:34 -04:00
Yanchuan Nian 9c6bdbb8dd nfsd: remove unused macro in nfsv4
lk_rflags is never used anywhere, and rflags is not defined in struct
nfsd4_lock.

Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:48:33 -04:00
J. Bruce Fields 2e4b7239a6 nfsd4: fix use-after-free of 4.1 client on connection loss
Once we drop the lock here there's nothing keeping the client around:
the only lock still held is the xpt_lock on this socket, but this socket
no longer has any connection with the client so there's no way for other
code to know we're still using the client.

The solution is simple: all nfsd4_probe_callback does is set a few
variables and queue some work, so there's no reason we can't just keep
it under the lock.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:48:32 -04:00
J. Bruce Fields b0a9d3ab57 nfsd4: fix race on client shutdown
Dropping the session's reference count after the client's means we leave
a window where the session's se_client pointer is NULL.  An xpt_user
callback that encounters such a session may then crash:

[  303.956011] BUG: unable to handle kernel NULL pointer dereference at 0000000000000318
[  303.959061] IP: [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
[  303.959061] PGD 37811067 PUD 3d498067 PMD 0
[  303.959061] Oops: 0002 [#8] PREEMPT SMP
[  303.959061] Modules linked in: md5 nfsd auth_rpcgss nfs_acl snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc microcode psmouse snd_timer serio_raw pcspkr evdev snd soundcore i2c_piix4 i2c_core intel_agp intel_gtt processor button nfs lockd sunrpc fscache ata_generic pata_acpi ata_piix uhci_hcd libata btrfs usbcore usb_common crc32c scsi_mod libcrc32c zlib_deflate floppy virtio_balloon virtio_net virtio_pci virtio_blk virtio_ring virtio
[  303.959061] CPU 0
[  303.959061] Pid: 264, comm: nfsd Tainted: G      D      3.8.0-ARCH+ #156 Bochs Bochs
[  303.959061] RIP: 0010:[<ffffffff81481a8e>]  [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
[  303.959061] RSP: 0018:ffff880037877dd8  EFLAGS: 00010202
[  303.959061] RAX: 0000000000000100 RBX: ffff880037a2b698 RCX: ffff88003d879278
[  303.959061] RDX: ffff88003d879278 RSI: dead000000100100 RDI: 0000000000000318
[  303.959061] RBP: ffff880037877dd8 R08: ffff88003c5a0f00 R09: 0000000000000002
[  303.959061] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[  303.959061] R13: 0000000000000318 R14: ffff880037a2b680 R15: ffff88003c1cbe00
[  303.959061] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[  303.959061] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  303.959061] CR2: 0000000000000318 CR3: 000000003d49c000 CR4: 00000000000006f0
[  303.959061] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  303.959061] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  303.959061] Process nfsd (pid: 264, threadinfo ffff880037876000, task ffff88003c1fd0a0)
[  303.959061] Stack:
[  303.959061]  ffff880037877e08 ffffffffa03772ec ffff88003d879000 ffff88003d879278
[  303.959061]  ffff88003d879080 0000000000000000 ffff880037877e38 ffffffffa0222a1f
[  303.959061]  0000000000107ac0 ffff88003c22e000 ffff88003d879000 ffff88003c1cbe00
[  303.959061] Call Trace:
[  303.959061]  [<ffffffffa03772ec>] nfsd4_conn_lost+0x3c/0xa0 [nfsd]
[  303.959061]  [<ffffffffa0222a1f>] svc_delete_xprt+0x10f/0x180 [sunrpc]
[  303.959061]  [<ffffffffa0223d96>] svc_recv+0xe6/0x580 [sunrpc]
[  303.959061]  [<ffffffffa03587c5>] nfsd+0xb5/0x140 [nfsd]
[  303.959061]  [<ffffffffa0358710>] ? nfsd_destroy+0x90/0x90 [nfsd]
[  303.959061]  [<ffffffff8107ae00>] kthread+0xc0/0xd0
[  303.959061]  [<ffffffff81010000>] ? perf_trace_xen_mmu_set_pte_at+0x50/0x100
[  303.959061]  [<ffffffff8107ad40>] ? kthread_freezable_should_stop+0x70/0x70
[  303.959061]  [<ffffffff814898ec>] ret_from_fork+0x7c/0xb0
[  303.959061]  [<ffffffff8107ad40>] ? kthread_freezable_should_stop+0x70/0x70
[  303.959061] Code: ff ff 5d c3 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 65 48 8b 04 25 f0 c6 00 00 48 89 e5 83 80 44 e0 ff ff 01 b8 00 01 00 00 <3e> 66 0f c1 07 0f b6 d4 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f
[  303.959061] RIP  [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
[  303.959061]  RSP <ffff880037877dd8>
[  303.959061] CR2: 0000000000000318
[  304.001218] ---[ end trace 2d809cd4a7931f5a ]---
[  304.001903] note: nfsd[264] exited with preempt_count 2

Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:48:31 -04:00
J. Bruce Fields 9d313b17db nfsd4: handle seqid-mutating open errors from xdr decoding
If a client sets an owner (or group_owner or acl) attribute on open for
create, and the mapping of that owner to an id fails, then we return
BAD_OWNER.  But BAD_OWNER is a seqid-mutating error, so we can't
shortcut the open processing that case: we have to at least look up the
owner so we can find the seqid to bump.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:47:53 -04:00
J. Bruce Fields b600de7ab9 nfsd4: remove BUG_ON
This BUG_ON just crashes the thread a little earlier than it would
otherwise--it doesn't seem useful.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:47:45 -04:00
Jeff Layton 0733c7ba1e nfsd: scale up the number of DRC hash buckets with cache size
We've now increased the size of the duplicate reply cache by quite a
bit, but the number of hash buckets has not changed. So, we've gone from
an average hash chain length of 16 in the old code to 4096 when the
cache is its largest. Change the code to scale out the number of buckets
with the max size of the cache.

At the same time, we also need to fix the hash function since the
existing one isn't really suitable when there are more than 256 buckets.
Move instead to use the stock hash_32 function for this. Testing on a
machine that had 2048 buckets showed that this gave a smaller
longest:average ratio than the existing hash function:

The formula here is longest hash bucket searched divided by average
number of entries per bucket at the time that we saw that longest
bucket:

    old hash: 68/(39258/2048) == 3.547404
    hash_32:  45/(33773/2048) == 2.728807

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:47:25 -04:00
Jeff Layton 98d821bda1 nfsd: keep stats on worst hash balancing seen so far
The typical case with the DRC is a cache miss, so if we keep track of
the max number of entries that we've ever walked over in a search, then
we should have a reasonable estimate of the longest hash chain that
we've ever seen.

With that, we'll also keep track of the total size of the cache when we
see the longest chain. In the case of a tie, we prefer to track the
smallest total cache size in order to properly gauge the worst-case
ratio of max vs. avg chain length.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:47:25 -04:00
Jeff Layton a2f999a37e nfsd: add new reply_cache_stats file in nfsdfs
For presenting statistics relating to duplicate reply cache.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:47:24 -04:00
Jeff Layton 6c6910cd4d nfsd: track memory utilization by the DRC
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:47:23 -04:00
Jeff Layton 9dc56143c2 nfsd: break out comparator into separate function
Break out the function that compares the rqstp and checksum against a
reply cache entry. While we're at it, track the efficacy of the checksum
over the NFS data by tracking the cases where we would have incorrectly
matched a DRC entry if we had not tracked it or the length.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:47:22 -04:00
Jeff Layton 0b9ea37f24 nfsd: eliminate one of the DRC cache searches
The most common case is to do a search of the cache, followed by an
insert. In the case where we have to allocate an entry off the slab,
then we end up having to redo the search, which is wasteful.

Better optimize the code for the common case by eliminating the initial
search of the cache and always preallocating an entry. In the case of a
cache hit, we'll end up just freeing that entry but that's preferable to
an extra search.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-03 11:47:22 -04:00
Chuck Lever a77c806fb9 SUNRPC: Refactor nfsd4_do_encode_secinfo()
Clean up.  This matches a similar API for the client side, and
keeps ULP fingers out the of the GSS mech switch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-29 15:43:33 -04:00
J. Bruce Fields 64a817cfbd nfsd4: reject "negative" acl lengths
Since we only enforce an upper bound, not a lower bound, a "negative"
length can get through here.

The symptom seen was a warning when we attempt to a kmalloc with an
excessive size.

Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-03-26 16:18:27 -04:00
Kent Overstreet e49dbbf3e7 nfsd: fix bad offset use
vfs_writev() updates the offset argument - but the code then passes the
offset to vfs_fsync_range(). Since offset now points to the offset after
what was just written, this is probably not what was intended

Introduced by face15025f "nfsd: use
vfs_fsync_range(), not O_SYNC, for stable writes".

Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: stable@vger.kernel.org
Reviewed-by: Zach Brown <zab@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-03-22 16:55:15 -04:00
Jeff Layton ac534ff2d5 nfsd: fix startup order in nfsd_reply_cache_init
If we end up doing "goto out_nomem" in this function, we'll call
nfsd_reply_cache_shutdown. That will attempt to walk the LRU list and
free entries, but that list may not be initialized yet if the server is
starting up for the first time. It's also possible for the shrinker to
kick in before we've initialized the LRU list.

Rearrange the initialization so that the LRU list_head and cache size
are initialized before doing any of the allocations that might fail.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-03-18 17:21:30 -04:00
Jeff Layton a517b608fa nfsd: only unhash DRC entries that are in the hashtable
It's not safe to call hlist_del() on a newly initialized hlist_node.
That leads to a NULL pointer dereference. Only do that if the entry
is hashed.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-03-18 14:58:32 -04:00
Tejun Heo ebd6c70714 nfsd: convert to idr_alloc()
idr_get_new*() and friends are about to be deprecated.  Convert to the
new idr_alloc() interface.

Only compile-tested.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-13 15:21:45 -07:00
Tejun Heo 801cb2d62d nfsd: remove unused get_new_stid()
get_new_stid() is no longer used since commit 3abdb60712 ("nfsd4:
simplify idr allocation").  Remove it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-13 15:21:45 -07:00
Eric W. Biederman 7f78e03513 fs: Limit sys_mount to only request filesystem modules.
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.

A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.

Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.

Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives.  Allowing simple, safe,
well understood work-arounds to known problematic software.

This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work.  While writing this patch I saw a handful of such
cases.  The most significant being autofs that lives in the module
autofs4.

This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.

After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module.  The common pattern in the kernel is to call request_module()
without regards to the users permissions.  In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted.  In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-03 19:36:31 -08:00
Linus Torvalds b6669737d3 Merge branch 'for-3.9' of git://linux-nfs.org/~bfields/linux
Pull nfsd changes from J Bruce Fields:
 "Miscellaneous bugfixes, plus:

   - An overhaul of the DRC cache by Jeff Layton.  The main effect is
     just to make it larger.  This decreases the chances of intermittent
     errors especially in the UDP case.  But we'll need to watch for any
     reports of performance regressions.

   - Containerized nfsd: with some limitations, we now support
     per-container nfs-service, thanks to extensive work from Stanislav
     Kinsbursky over the last year."

Some notes about conflicts, since there were *two* non-data semantic
conflicts here:

 - idr_remove_all() had been added by a memory leak fix, but has since
   become deprecated since idr_destroy() does it for us now.

 - xs_local_connect() had been added by this branch to make AF_LOCAL
   connections be synchronous, but in the meantime Trond had changed the
   calling convention in order to avoid a RCU dereference.

There were a couple of more obvious actual source-level conflicts due to
the hlist traversal changes and one just due to code changes next to
each other, but those were trivial.

* 'for-3.9' of git://linux-nfs.org/~bfields/linux: (49 commits)
  SUNRPC: make AF_LOCAL connect synchronous
  nfsd: fix compiler warning about ambiguous types in nfsd_cache_csum
  svcrpc: fix rpc server shutdown races
  svcrpc: make svc_age_temp_xprts enqueue under sv_lock
  lockd: nlmclnt_reclaim(): avoid stack overflow
  nfsd: enable NFSv4 state in containers
  nfsd: disable usermode helper client tracker in container
  nfsd: use proper net while reading "exports" file
  nfsd: containerize NFSd filesystem
  nfsd: fix comments on nfsd_cache_lookup
  SUNRPC: move cache_detail->cache_request callback call to cache_read()
  SUNRPC: remove "cache_request" argument in sunrpc_cache_pipe_upcall() function
  SUNRPC: rework cache upcall logic
  SUNRPC: introduce cache_detail->cache_request callback
  NFS: simplify and clean cache library
  NFS: use SUNRPC cache creation and destruction helper for DNS cache
  nfsd4: free_stid can be static
  nfsd: keep a checksum of the first 256 bytes of request
  sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer
  sunrpc: fix comment in struct xdr_buf definition
  ...
2013-02-28 18:02:55 -08:00
Sasha Levin b67bfe0d42 hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived

        list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

        hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:24 -08:00
Linus Torvalds d895cb1af1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile (part one) from Al Viro:
 "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
  locking violations, etc.

  The most visible changes here are death of FS_REVAL_DOT (replaced with
  "has ->d_weak_revalidate()") and a new helper getting from struct file
  to inode.  Some bits of preparation to xattr method interface changes.

  Misc patches by various people sent this cycle *and* ocfs2 fixes from
  several cycles ago that should've been upstream right then.

  PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  saner proc_get_inode() calling conventions
  proc: avoid extra pde_put() in proc_fill_super()
  fs: change return values from -EACCES to -EPERM
  fs/exec.c: make bprm_mm_init() static
  ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
  ocfs2: fix possible use-after-free with AIO
  ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
  get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
  target: writev() on single-element vector is pointless
  export kernel_write(), convert open-coded instances
  fs: encode_fh: return FILEID_INVALID if invalid fid_type
  kill f_vfsmnt
  vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
  nfsd: handle vfs_getattr errors in acl protocol
  switch vfs_getattr() to struct path
  default SET_PERSONALITY() in linux/elf.h
  ceph: prepopulate inodes only when request is aborted
  d_hash_and_lookup(): export, switch open-coded instances
  9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
  9p: split dropping the acls from v9fs_set_create_acl()
  ...
2013-02-26 20:16:07 -08:00
J. Bruce Fields 4f4a4fadde nfsd: handle vfs_getattr errors in acl protocol
We're currently ignoring errors from vfs_getattr.

The correct thing to do is to do the stat in the main service procedure
not in the response encoding.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:46:09 -05:00
Al Viro 3dadecce20 switch vfs_getattr() to struct path
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:46:08 -05:00
Linus Torvalds 94f2f14234 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace and namespace infrastructure changes from Eric W Biederman:
 "This set of changes starts with a few small enhnacements to the user
  namespace.  reboot support, allowing more arbitrary mappings, and
  support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
  user namespace root.

  I do my best to document that if you care about limiting your
  unprivileged users that when you have the user namespace support
  enabled you will need to enable memory control groups.

  There is a minor bug fix to prevent overflowing the stack if someone
  creates way too many user namespaces.

  The bulk of the changes are a continuation of the kuid/kgid push down
  work through the filesystems.  These changes make using uids and gids
  typesafe which ensures that these filesystems are safe to use when
  multiple user namespaces are in use.  The filesystems converted for
  3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs.  The
  changes for these filesystems were a little more involved so I split
  the changes into smaller hopefully obviously correct changes.

  XFS is the only filesystem that remains.  I was hoping I could get
  that in this release so that user namespace support would be enabled
  with an allyesconfig or an allmodconfig but it looks like the xfs
  changes need another couple of days before it they are ready."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
  cifs: Enable building with user namespaces enabled.
  cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
  cifs: Convert struct cifs_sb_info to use kuids and kgids
  cifs: Modify struct smb_vol to use kuids and kgids
  cifs: Convert struct cifsFileInfo to use a kuid
  cifs: Convert struct cifs_fattr to use kuid and kgids
  cifs: Convert struct tcon_link to use a kuid.
  cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
  cifs: Convert from a kuid before printing current_fsuid
  cifs: Use kuids and kgids SID to uid/gid mapping
  cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
  cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
  cifs: Override unmappable incoming uids and gids
  nfsd: Enable building with user namespaces enabled.
  nfsd: Properly compare and initialize kuids and kgids
  nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
  nfsd: Modify nfsd4_cb_sec to use kuids and kgids
  nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
  nfsd: Convert nfsxdr to use kuids and kgids
  nfsd: Convert nfs3xdr to use kuids and kgids
  ...
2013-02-25 16:00:49 -08:00
Zhang Yanfei 697ce9be7d fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used
The three variables are calculated from nr_free_buffer_pages so change
their types to unsigned long in case of overflow.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:22 -08:00
Al Viro 496ad9aa8e new helper: file_inode(file)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:31 -05:00
Jeff Layton 56edc86b5a nfsd: fix compiler warning about ambiguous types in nfsd_cache_csum
kbuild test robot says:

tree:   git://linux-nfs.org/~bfields/linux.git for-3.9
head:   deb4534f4f
commit: 01a7decf75 [32/44] nfsd: keep a checksum of the first 256 bytes of request
config: i386-randconfig-x088 (attached as .config)

All warnings:

   fs/nfsd/nfscache.c: In function 'nfsd_cache_csum':
>> fs/nfsd/nfscache.c:266:9: warning: comparison of distinct pointer types lacks a cast [enabled by default]

vim +266 fs/nfsd/nfscache.c

   250		__wsum csum;
   251		struct xdr_buf *buf = &rqstp->rq_arg;
   252		const unsigned char *p = buf->head[0].iov_base;
   253		size_t csum_len = min_t(size_t, buf->head[0].iov_len + buf->page_len,
   254					RC_CSUMLEN);
   255		size_t len = min(buf->head[0].iov_len, csum_len);
   256
   257		/* rq_arg.head first */
   258		csum = csum_partial(p, len, 0);
   259		csum_len -= len;
   260
   261		/* Continue into page array */
   262		idx = buf->page_base / PAGE_SIZE;
   263		base = buf->page_base & ~PAGE_MASK;
   264		while (csum_len) {
   265			p = page_address(buf->pages[idx]) + base;
 > 266			len = min(PAGE_SIZE - base, csum_len);
   267			csum = csum_partial(p, len, csum);
   268			csum_len -= len;
   269			base = 0;
   270			++idx;
   271		}
   272		return csum;
   273	}
   274

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-17 11:02:23 -05:00
Stanislav Kinsbursky deb4534f4f nfsd: enable NFSv4 state in containers
Currently, NFSd is ready to operate in network namespace based containers.
So let's drop check for "init_net" and make it able to fly.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-15 11:21:02 -05:00
Stanislav Kinsbursky 71a5030693 nfsd: disable usermode helper client tracker in container
This tracker uses khelper kthread to execute binaries.
Execution itself is done from kthread context - i.e. global root is used.
This is not suitable for containers with own root.
So, disable this tracker for a while.

Note: one of possible solutions can be pass "init" callback to khelper, which
will swap root to desired one.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-15 11:21:01 -05:00
Stanislav Kinsbursky 96d851c4d2 nfsd: use proper net while reading "exports" file
Functuon "exports_open" is used for both "/proc/fs/nfs/exports" and
"/proc/fs/nfsd/exports" files.
Now NFSd filesystem is containerised, so proper net can be taken from
superblock for "/proc/fs/nfsd/exports" reader.
But for "/proc/fs/nfsd/exports" only current->nsproxy->net_ns can be used.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-15 11:21:01 -05:00
Stanislav Kinsbursky 11f779421a nfsd: containerize NFSd filesystem
This patch makes NFSD file system superblock to be created per net.
This makes possible to get proper network namespace from superblock instead of
using hard-coded "init_net".

Note: NFSd fs super-block holds network namespace. This garantees, that
network namespace won't disappear from underneath of it.
This, obviously, means, that in case of kill of a container's "init" (which is not a mount
namespace, but network namespace creator) netowrk namespace won't be
destroyed.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-15 11:21:00 -05:00
Jeff Layton 1ac8362977 nfsd: fix comments on nfsd_cache_lookup
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-15 10:43:48 -05:00
Stanislav Kinsbursky 2d4383383b SUNRPC: rework cache upcall logic
For most of SUNRPC caches (except NFS DNS cache) cache_detail->cache_upcall is
redundant since all that it's implementations are doing is calling
sunrpc_cache_pipe_upcall() with proper function address argument.
Cache request function address is now stored on cache_detail structure and
thus all the code can be simplified.
Now, for those cache details, which doesn't have cache_upcall callback (the
only one, which still has is nfs_dns_resolve_template)
sunrpc_cache_pipe_upcall will be called instead.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-15 10:43:46 -05:00
Stanislav Kinsbursky 73fb847a44 SUNRPC: introduce cache_detail->cache_request callback
This callback will allow to simplify upcalls in further patches in this
series.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-15 10:43:45 -05:00
Eric W. Biederman 6fab877900 nfsd: Properly compare and initialize kuids and kgids
Use uid_eq(uid, GLOBAL_ROOT_UID) instead of !uid.
Use gid_eq(gid, GLOBAL_ROOT_GID) instead of !gid.
Use uid_eq(uid, INVALID_UID) instead of uid == -1
Use gid_eq(uid, INVALID_GID) instead of gid == -1
Use uid = GLOBAL_ROOT_UID instead of uid = 0;
Use gid = GLOBAL_ROOT_GID instead of gid = 0;
Use !uid_eq(uid1, uid2) instead of uid1 != uid2.
Use !gid_eq(gid1, gid2) instead of gid1 != gid2.
Use uid_eq(uid1, uid2) instead of uid1 == uid2.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:16:09 -08:00
Eric W. Biederman 4c1e1b34d5 nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:16:08 -08:00
Eric W. Biederman 03bc6d1cc1 nfsd: Modify nfsd4_cb_sec to use kuids and kgids
Change uid and gid in struct nfsd4_cb_sec to be of type kuid_t and
kgid_t.

In nfsd4_decode_cb_sec when reading uids and gids off the wire convert
them to kuids and kgids, and if they don't convert to valid kuids or
valid kuids ignore RPC_AUTH_UNIX and don't fill in any of the fields.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:16:07 -08:00
Eric W. Biederman ab8e4aee0a nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
In struct nfs4_ace remove the member who and replace it with an
anonymous union holding who_uid and who_gid.  Allowing typesafe
storage uids and gids.

Add a helper pace_gt for sorting posix_acl_entries.

In struct posix_user_ace_state to replace uid with a union
of kuid_t uid and kgid_t gid.

Remove all initializations of the deprecated posic_acl_entry
e_id field.  Which is not present when user namespaces are enabled.

Split find_uid into two functions find_uid and find_gid that work
in a typesafe manner.

In nfs4xdr update nfsd4_encode_fattr to deal with the changes
in struct nfs4_ace.

Rewrite nfsd4_encode_name to take a kuid_t and a kgid_t instead
of a generic id and flag if it is a group or a uid.  Replace
the group flag with a test for a valid gid.

Modify nfsd4_encode_user to take a kuid_t and call the modifed
nfsd4_encode_name.

Modify nfsd4_encode_group to take a kgid_t and call the modified
nfsd4_encode_name.

Modify nfsd4_encode_aclname to take an ace instead of taking the
fields of an ace broken out.  This allows it to detect if the ace is
for a user or a group and to pass the appropriate value while still
being typesafe.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:16:06 -08:00
Eric W. Biederman 7c19723e99 nfsd: Convert nfsxdr to use kuids and kgids
When reading uids and gids off the wire convert them to
kuids and kgids.  If the conversion results in an invalid
result don't set the ATTR_UID or ATTR_GID.

When putting kuids and kgids onto the wire first convert
them to uids and gids the other side will understand.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:16:05 -08:00
Eric W. Biederman 458878a705 nfsd: Convert nfs3xdr to use kuids and kgids
When reading uids and gids off the wire convert them to kuids and
kgids.

When putting kuids and kgids onto the wire first convert them to uids
and gids the other side will understand.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:16:04 -08:00
Eric W. Biederman e097258f2e nfsd: Remove nfsd_luid, nfsd_lgid, nfsd_ruid and nfsd_rgid
These trivial macros that don't currently do anything are the last
vestiages of an old attempt at uid mapping that was removed from the
kernel in September of 2002.  Remove them to make it clear what the
code is currently doing.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:51 -08:00
Eric W. Biederman 65e10f6d0a nfsd: Convert idmap to use kuids and kgids
Convert nfsd_map_name_to_uid to return a kuid_t value.
Convert nfsd_map_name_to_gid to return a kgid_t value.
Convert nfsd_map_uid_to_name to take a kuid_t parameter.
Convert nfsd_map_gid_to_name to take a kgid_t paramater.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:49 -08:00
Eric W. Biederman b5663898ec nfsd: idmap use u32 not uid_t as the intermediate type
u32 and uid_t have the same size and semantics so this change
should have no operational effect.  This just removes the WTF
factor when looking at variables that hold both uids and gids
whos type is uid_t.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:37 -08:00
Eric W. Biederman 6c1810e040 nfsd: Remove declaration of nonexistent nfs4_acl_permisison
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:35 -08:00
Fengguang Wu e56a316214 nfsd4: free_stid can be static
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
2013-02-11 16:22:50 -05:00
Jeff Layton 01a7decf75 nfsd: keep a checksum of the first 256 bytes of request
Now that we're allowing more DRC entries, it becomes a lot easier to hit
problems with XID collisions. In order to mitigate those, calculate a
checksum of up to the first 256 bytes of each request coming in and store
that in the cache entry, along with the total length of the request.

This initially used crc32, but Chuck Lever and Jim Rees pointed out that
crc32 is probably more heavyweight than we really need for generating
these checksums, and recommended looking at using the same routines that
are used to generate checksums for IP packets.

On an x86_64 KVM guest measurements with ftrace showed ~800ns to use
csum_partial vs ~1750ns for crc32.  The difference probably isn't
terribly significant, but for now we may as well use csum_partial.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Stones-thrown-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-08 16:02:26 -05:00
Jeff Layton 5976687a2b sunrpc: move address copy/cmp/convert routines and prototypes from clnt.h to addr.h
These routines are used by server and client code, so having them in a
separate header would be best.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-05 09:41:14 -05:00
J. Bruce Fields 3abdb60712 nfsd4: simplify idr allocation
We don't really need to preallocate at all; just allocate and initialize
everything at once, but leave the sc_type field initially 0 to prevent
finding the stateid till it's fully initialized.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-05 09:41:12 -05:00
majianpeng 2d32b29a1c nfsd: Fix memleak
When free nfs-client, it must free the ->cl_stateids.

Cc: stable@kernel.org
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-05 09:40:47 -05:00
Jeff Layton b4e7f2c945 nfsd: register a shrinker for DRC cache entries
Since we dynamically allocate them now, allow the system to call us up
to release them if it gets low on memory. Since these entries aren't
replaceable, only free ones that are expired or that are over the cap.
The the seeks value is set to '1' however to indicate that freeing the
these entries is low-cost.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 17:19:13 -05:00
Jeff Layton aca8a23de6 nfsd: add recurring workqueue job to clean the cache
It's not sufficient to only clean the cache when requests come in. What
if we have a flurry of activity and then the server goes idle? Add a
workqueue job that will clean the cache every RC_EXPIRE period.

Care is taken to only run this when we expect to have entries expiring.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 17:19:12 -05:00
Jeff Layton 2c6b691c05 nfsd: when updating an entry with RC_NOCACHE, just free it
There's no need to keep entries around that we're declaring RC_NOCACHE.
Ditto if there's a problem with the entry.

With this change too, there's no need to test for RC_UNUSED in the
search function. If the entry's in the hash table then it's either
INPROG or DONE.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 17:19:11 -05:00
Jeff Layton 13cc8a78e8 nfsd: remove the cache_disabled flag
With the change to dynamically allocate entries, the cache is never
disabled on the fly. Remove this flag.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 17:19:11 -05:00
Jeff Layton 0338dd1572 nfsd: dynamically allocate DRC entries
The existing code keeps a fixed-size cache of 1024 entries. This is much
too small for a busy server, and wastes memory on an idle one.  This
patch changes the code to dynamically allocate and free these cache
entries.

A cap on the number of entries is retained, but it's much larger than
the existing value and now scales with the amount of low memory in the
machine.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 17:19:10 -05:00
Jeff Layton 0ee0bf7ee5 nfsd: track the number of DRC entries in the cache
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 17:19:09 -05:00
Jeff Layton 56c2548b2d nfsd: always move DRC entries to the end of LRU list when updating timestamp
...otherwise, we end up with the list ordering wrong. Currently, it's
not a problem since we skip RC_INPROG entries, but keeping the ordering
strict will be necessary for a later patch that adds a cache cleaner.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 17:19:09 -05:00
Jeff Layton 2eeb9b2abc nfsd: initialize the exp->ex_uuid field in svc_export_init
commit 885c91f746 in Bruce's tree was causing oopses for me:

general protection fault: 0000 [#1] SMP
Modules linked in: nfsd(OF) nfs_acl(OF) auth_rpcgss(OF) lockd(OF) sunrpc(OF) kvm_amd kvm microcode i2c_piix4 virtio_net virtio_balloon cirrus drm_kms_helper ttm drm virtio_blk i2c_core
CPU 0
Pid: 564, comm: exportfs Tainted: GF          O 3.8.0-0.rc5.git2.1.fc19.x86_64 #1 Bochs Bochs
RIP: 0010:[<ffffffff811b1509>]  [<ffffffff811b1509>] kfree+0x49/0x280
RSP: 0018:ffff88007a3d7c50  EFLAGS: 00010203
RAX: 01adaf8dadadad80 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000001
RDX: ffffffff7fffffff RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6b6b
RBP: ffff88007a3d7c80 R08: 6b6b6b6b6b6b6b6b R09: 0000000000000000
R10: 0000000000000018 R11: 0000000000000000 R12: ffff88006a117b50
R13: ffffffffa01a589c R14: ffff8800631b0f50 R15: 01ad998dadadad80
FS:  00007fcaa3616740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f5d84b6fdd8 CR3: 0000000064db4000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process exportfs (pid: 564, threadinfo ffff88007a3d6000, task ffff88006af28000)
Stack:
 ffff88007a3d7c80 ffff88006a117b68 ffff88006a117b50 0000000000000000
 ffff8800631b0f50 ffff88006a117b50 ffff88007a3d7ca0 ffffffffa01a589c
 ffff880036be1148 ffff88007a3d7cf8 ffff88007a3d7e28 ffffffffa01a6a98
Call Trace:
 [<ffffffffa01a589c>] svc_export_put+0x5c/0x70 [nfsd]
 [<ffffffffa01a6a98>] svc_export_parse+0x328/0x7e0 [nfsd]
 [<ffffffffa016f1c7>] cache_do_downcall+0x57/0x70 [sunrpc]
 [<ffffffffa016f25e>] cache_downcall+0x7e/0x100 [sunrpc]
 [<ffffffffa016f338>] cache_write_procfs+0x58/0x90 [sunrpc]
 [<ffffffffa016f2e0>] ? cache_downcall+0x100/0x100 [sunrpc]
 [<ffffffff8123b0e5>] proc_reg_write+0x75/0xb0
 [<ffffffff811ccecf>] vfs_write+0x9f/0x170
 [<ffffffff811cd089>] sys_write+0x49/0xa0
 [<ffffffff816e0919>] system_call_fastpath+0x16/0x1b
Code: 66 66 66 90 48 83 fb 10 0f 86 c3 00 00 00 48 89 df 49 bf 00 00 00 00 00 ea ff ff e8 f2 12 ea ff 48 c1 e8 0c 48 c1 e0 06 49 01 c7 <49> 8b 07 f6 c4 80 0f 85 1d 02 00 00 49 8b 07 a8 80 0f 84 ee 01
RIP  [<ffffffff811b1509>] kfree+0x49/0x280
 RSP <ffff88007a3d7c50>

I think Majianpeng's patch is correct, but incomplete. In order for it
to be safe to free the ex_uuid unconditionally in svc_export_put, we
need to make sure it's initialized to NULL in the init routine.

Cc: majianpeng <majianpeng@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 09:16:24 -05:00
Jeff Layton a4a3ec3291 nfsd: break out hashtable search into separate function
Later, we'll need more than one call site for this, so break it out
into a new function.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 09:16:24 -05:00
Jeff Layton d1a0774de6 nfsd: clean up and clarify the cache expiration code
Add a preprocessor constant for the expiry time of cache entries, and
move the test for an expired entry into a function. Note that the current
code does not test for RC_INPROG. It just assumes that it won't take more
than 2 minutes to fill out an in-progress entry.

I'm not sure how valid that assumption is though, so let's just ensure
that we never consider an RC_INPROG entry to be expired.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 09:16:23 -05:00
Jeff Layton 25e6b8b0e1 nfsd: remove redundant test from nfsd_reply_cache_free
Entries can only get a c_type of RC_REPLBUFF iff they are
RC_DONE. Therefore the test for RC_DONE isn't necessary here.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 09:16:22 -05:00
Jeff Layton f09841fdfa nfsd: add alloc and free functions for DRC entries
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 09:16:22 -05:00
Jeff Layton 8a8bc40d9b nfsd: create a dedicated slabcache for DRC entries
Currently we use kmalloc() which wastes a little bit of memory on each
allocation since it's a power of 2 allocator. Since we're allocating a
1024 of these now, and may need even more later, let's create a new
slabcache for them.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 09:16:21 -05:00
Jeff Layton 09662d58d5 nfsd: get rid of RC_INTR
The reply cache code never returns this status.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 09:16:20 -05:00
Jeff Layton 6dc8889589 nfsd: remove unneeded spinlock in nfsd_cache_update
The locking rules for cache entries say that locking the cache_lock
isn't needed if you're just touching the current entry. Earlier
in this function we set rp->c_state to RC_UNUSED without any locking,
so I believe it's ok to do the same here.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 09:16:19 -05:00
Jeff Layton 7b9e8522a6 nfsd: fix IPv6 address handling in the DRC
Currently, it only stores the first 16 bytes of any address. struct
sockaddr_in6 is 28 bytes however, so we're currently ignoring the last
12 bytes of the address.

Expand the c_addr field to a sockaddr_in6, and cast it to a sockaddr_in
as necessary. Also fix the comparitor to use the existing RPC
helpers for this.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-04 09:16:19 -05:00
majianpeng 885c91f746 nfsd: Fix memleak in svc_export_put
In func svc_export_parse, the uuid which used kmemdup to alloc will be
changed in func export_update.So the later kfree don't free this memory.
And it can't be free in func svc_export_parse because other place still
used.So put this operation in func svc_export_put.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-01-29 16:50:03 -05:00
J. Bruce Fields ff89be87c7 nfsd4: require version 4 when enabling or disabling minorversion
The current code will allow silly things like:

	echo "+2 +3 +4 +7.1">/proc/fs/nfsd/versions

Reported-by: Fan Chaoting <fanchaoting@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-01-23 18:25:01 -05:00
Stanislav Kinsbursky bca0ec6511 nfsd: fix unused "nn" variable warning in free_client()
If CONFIG_LOCKDEP is disabled, then there would be a warning like this:

  CC [M]  fs/nfsd/nfs4state.o
fs/nfsd/nfs4state.c: In function ‘free_client’:
fs/nfsd/nfs4state.c:1051:19: warning: unused variable ‘nn’ [-Wunused-variable]

So, let's add "maybe_unused" tag to this variable.

Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-01-23 18:17:40 -05:00
Yanchuan Nian 266533c6df nfsd: Don't unlock the state while it's not locked
In the procedure of CREATE_SESSION, the state is locked after
alloc_conn_from_crses(). If the allocation fails, the function
goes to "out_free_session", and then "out" where there is an
unlock function.

Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-01-23 18:17:37 -05:00
Yanchuan Nian 74b70dded3 nfsd: Pass correct slot number to nfsd4_put_drc_mem()
In alloc_session(), numslots is the correct slot number used by the session.
But the slot number passed to nfsd4_put_drc_mem() is the one from nfs client.

Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-01-23 18:17:36 -05:00
J. Bruce Fields 84822d0b3b nfsd4: simplify nfsd4_encode_fattr interface slightly
It seems slightly simpler to make nfsd4_encode_fattr rather than its
callers responsible for advancing the write pointer on success.

(Also: the count == 0 check in the verify case looks superfluous.
Running out of buffer space is really the only reason fattr encoding
should fail with eresource.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-01-23 18:17:35 -05:00
Kees Cook f987c90257 fs/nfsd: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21 14:39:05 -08:00
J. Bruce Fields 10532b560b Revert "nfsd: warn on odd reply state in nfsd_vfs_read"
This reverts commit 79f77bf9a4.

This is obviously wrong, and I have no idea how I missed seeing the
warning in testing: I must just not have looked at the right logs.  The
caller bumps rq_resused/rq_next_page, so it will always be hit on a
large enough read.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-21 17:07:45 -08:00
J. Bruce Fields 24ffb93872 nfsd4: don't leave freed stateid hashed
Note the stateid is hashed early on in init_stid(), but isn't currently
being unhashed on error paths.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-17 22:00:28 -05:00
J. Bruce Fields a1dc695582 nfsd4: free_stateid can use the current stateid
Cc: Tigran Mkrtchyan <kofemann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-17 22:00:27 -05:00
J. Bruce Fields afc59400d6 nfsd4: cleanup: replace rq_resused count by rq_next_page pointer
It may be a matter of personal taste, but I find this makes the code
clearer.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-17 22:00:16 -05:00
J. Bruce Fields 79f77bf9a4 nfsd: warn on odd reply state in nfsd_vfs_read
As far as I can tell this shouldn't currently happen--or if it does,
something is wrong and data is going to be corrupted.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-17 21:55:46 -05:00
J. Bruce Fields d5f50b0c29 nfsd4: fix oops on unusual readlike compound
If the argument and reply together exceed the maximum payload size, then
a reply with a read-like operation can overlow the rq_pages array.

Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-17 21:55:21 -05:00
J. Bruce Fields 9b3234b922 nfsd4: disable zero-copy on non-final read ops
To ensure ordering of read data with any following operations, turn off
zero copy if the read is not the final operation in the compound.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-17 16:02:41 -05:00
Bryan Schumaker 18d9a2ca2e NFSD: Correct the size calculation in fault_inject_write
If len == 0 we end up with size = (0 - 1), which could cause bad things
to happen in copy_from_user().

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 18:24:22 -05:00
Bryan Schumaker 0a5c33e23c NFSD: Pass correct buffer size to rpc_ntop
I honestly have no idea where I got 129 from, but it's a much bigger
value than the actual buffer size (INET6_ADDRSTRLEN).

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 18:24:21 -05:00
Stanislav Kinsbursky 88c4766617 nfsd: pass proper net to nfsd_destroy() from NFSd kthreads
Since NFSd service is per-net now, we have to pass proper network
context in nfsd_shutdown() from NFSd kthreads.

The simplest way I found is to get proper net from one of transports
with permanent sockets.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:42 -05:00
Stanislav Kinsbursky 541e864f00 nfsd: simplify service shutdown
Function nfsd_shutdown is called from two places: nfsd_last_thread (when last
kernel thread is exiting) and nfsd_svc (in case of kthreads starting error).
When calling from nfsd_svc(), we can be sure that per-net resources are
allocated, so we don't need to check per-net nfsd_net_up boolean flag.
This allows us to remove nfsd_shutdown function at all and move check for
per-net nfsd_net_up boolean flag to nfsd_last_thread.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:42 -05:00
Stanislav Kinsbursky 4539f14981 nfsd: replace boolean nfsd_up flag by users counter
Since we have generic NFSd resurces, we have to introduce some way how to
allocate and destroy those resources on first per-net NFSd start and on
last per-net NFSd stop respectively.
This patch replaces global boolean nfsd_up flag (which is unused now) by users
counter and use it to determine either we need to allocate generic resources
or destroy them.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:41 -05:00
Stanislav Kinsbursky 903d9bf0ed nfsd: simplify NFSv4 state init and shutdown
This patch moves nfsd_startup_generic() and nfsd_shutdown_generic()
calls to nfsd_startup_net() and nfsd_shutdown_net() respectively, which
allows us to call nfsd_startup_net() instead of nfsd_startup() and makes
the code look clearer.  It also modifies nfsd_svc() and nfsd_shutdown()
to check nn->nfsd_net_up instead of global nfsd_up.  The latter is now
used only for generic resources shutdown and is currently useless.  It
will replaced by NFSd users counter later in this series.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:40 -05:00
Stanislav Kinsbursky bda9cac1db nfsd: introduce helpers for generic resources init and shutdown
NFSd have per-net resources and resources, used globally.
Let's move generic resources init and shutdown to separated functions since
they are going to be allocated on first NFSd service start and destroyed after
last NFSd service shutdown.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:39 -05:00
Stanislav Kinsbursky 9dd9845f08 nfsd: make NFSd service structure allocated per net
This patch makes main step in NFSd containerisation.

There could be different approaches to how to make NFSd able to handle
incoming RPC request from different network namespaces.  The two main
options are:

1) Share NFSd kthreads betwween all network namespaces.
2) Create separated pool of threads for each namespace.

While first approach looks more flexible, second one is simpler and
non-racy.  This patch implements the second option.

To make it possible to allocate separate pools of threads, we have to
make it possible to allocate separate NFSd service structures per net.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:39 -05:00
Stanislav Kinsbursky b9c0ef8571 nfsd: make NFSd service boot time per-net
This is simple: an NFSd service can be started at different times in
different network environments. So, its "boot time" has to be assigned
per net.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:38 -05:00
Stanislav Kinsbursky 2c2fe2909e nfsd: per-net NFSd up flag introduced
This patch introduces introduces per-net "nfsd_net_up" boolean flag, which has
the same purpose as general "nfsd_up" flag - skip init or shutdown of per-net
resources in case of they are inited on shutted down respectively.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:37 -05:00
Stanislav Kinsbursky 6ff50b3dea nfsd: move per-net startup code to separated function
NFSd resources are partially per-net and partially globally used.
This patch splits resources init and shutdown and moves per-net code to
separated functions.
Generic and per-net init and shutdown are called sequentially for a while.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:36 -05:00
Stanislav Kinsbursky 081603520b nfsd: pass net to __write_ports() and down
Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:36 -05:00
Stanislav Kinsbursky 3938a0d5eb nfsd: pass net to nfsd_set_nrthreads()
Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:35 -05:00
Stanislav Kinsbursky d41a9417cd nfsd: pass net to nfsd_svc()
Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:34 -05:00
Stanislav Kinsbursky 6777436b0f nfsd: pass net to nfsd_create_serv()
Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:34 -05:00
Stanislav Kinsbursky db42d1a76a nfsd: pass net to nfsd_startup() and nfsd_shutdown()
Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:33 -05:00
Stanislav Kinsbursky db6e182c17 nfsd: pass net to nfsd_init_socks()
Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:32 -05:00
Stanislav Kinsbursky f7fb86c6e6 nfsd: use "init_net" for portmapper
There could be a situation, when NFSd was started in one network namespace, but
stopped in another one.
This will trigger kernel panic, because RPCBIND client is stored on per-net
NFSd data, and will be NULL on NFSd shutdown.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:32 -05:00
Neil Brown 7007c90fb9 nfsd: avoid permission checks on EXCLUSIVE_CREATE replay
With NFSv4, if we create a file then open it we explicit avoid checking
the permissions on the file during the open because the fact that we
created it ensures we should be allow to open it (the create and the
open should appear to be a single operation).

However if the reply to an EXCLUSIVE create gets lots and the client
resends the create, the current code will perform the permission check -
because it doesn't realise that it did the open already..

This patch should fix this.

Note that I haven't actually seen this cause a problem.  I was just
looking at the code trying to figure out a different EXCLUSIVE open
related issue, and this looked wrong.

(Fix confirmed with pynfs 4.0 test OPEN4--bfields)

Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
[bfields: use OWNER_OVERRIDE and update for 4.1]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:31 -05:00
Stanislav Kinsbursky 9a9c6478a8 nfsd: make NFSv4 recovery client tracking options per net
Pointer to client tracking operations - client_tracking_ops - have to be
containerized, because different environment can support different trackers
(for example, legacy tracker currently is not suported in container).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:30 -05:00
J. Bruce Fields 9b2ef62b15 nfsd4: lockt, release_lockowner should renew clients
Fix nfsd4_lockt and release_lockowner to lookup the referenced client,
so that it can renew it, or correctly return "expired", as appropriate.

Also share some code while we're here.

Reported-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-04 07:51:12 -05:00
Bryan Schumaker 6c1e82a4b7 NFSD: Forget state for a specific client
Write the client's ip address to any state file and all appropriate
state for that client will be forgotten.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-03 09:59:03 -05:00
Bryan Schumaker d7cc431edd NFSD: Add a custom file operations structure for fault injection
Controlling the read and write functions allows me to add in "forget
client w.x.y.z", since we won't be limited to reading and writing only
u64 values.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-03 09:59:02 -05:00
Bryan Schumaker 184c18471f NFSD: Reading a fault injection file prints a state count
I also log basic information that I can figure out about the type of
state (such as number of locks for each client IP address).  This can be
useful for checking that state was actually dropped and later for
checking if the client was able to recover.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-03 09:59:01 -05:00
Bryan Schumaker 8ce54e0d82 NFSD: Fault injection operations take a per-client forget function
The eventual goal is to forget state based on ip address, so it makes
sense to call this function in a for-each-client loop until the correct
amount of state is forgotten.  I also use this patch as an opportunity
to rename the forget function from "func()" to "forget()".

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-03 09:59:00 -05:00
Bryan Schumaker 269de30f10 NFSD: Clean up forgetting and recalling delegations
Once I have a client, I can easily use its delegation list rather than
searching the file hash table for delegations to remove.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-03 09:58:59 -05:00
Bryan Schumaker 4dbdbda84f NFSD: Clean up forgetting openowners
Using "forget_n_state()" forces me to implement the code needed to
forget a specific client's openowners.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-03 09:58:58 -05:00
Bryan Schumaker fc29171f5b NFSD: Clean up forgetting locks
I use the new "forget_n_state()" function to iterate through each client
first when searching for locks.  This may slow down forgetting locks a
little bit, but it implements most of the code needed to forget a
specified client's locks.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-03 09:58:56 -05:00
Bryan Schumaker 44e34da60b NFSD: Clean up forgetting clients
I added in a generic for-each loop that takes a pass over the client_lru
list for the current net namespace and calls some function.  The next few
patches will update other operations to use this function as well.  A value
of 0 still means "forget everything that is found".

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-03 09:58:55 -05:00
Bryan Schumaker 043958395a NFSD: Lock state before calling fault injection function
Each function touches state in some way, so getting the lock earlier
can help simplify code.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-03 09:58:54 -05:00
J. Bruce Fields e5f9570319 nfsd4: discard some unused nfsd4_verify xdr code
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-03 09:43:51 -05:00
Bryan Schumaker f3c7521fe5 NFSD: Fold fault_inject.h into state.h
There were only a small number of functions in this file and since they
all affect stored state I think it makes sense to put them in state.h
instead.  I also dropped most static inline declarations since there are
no callers when fault injection is not enabled.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 13:01:02 -05:00
Stanislav Kinsbursky 5284b44e43 nfsd: make NFSv4 grace time per net
Grace time is a part of NFSv4 state engine, which is constructed per network
namespace.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:39:47 -05:00
Stanislav Kinsbursky 3d7337115d nfsd: make NFSv4 lease time per net
Lease time is a part of NFSv4 state engine, which is constructed per network
namespace.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:39:46 -05:00
Stanislav Kinsbursky 864aee5c6f nfsd: remove redundant declarations
This is a cleanup patch. Functions nfsd_pool_stats_open() and
nfsd_pool_stats_release() are declared in fs/nfsd/nfsd.h.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:13:55 -05:00
Stanislav Kinsbursky f141f79d70 nfsd: recovery - make in_grace per net
Flag in_grace is a part of client tracking state, which is network namesapce
aware. So let'a replace global static variable with per-net one.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:13:54 -05:00
Stanislav Kinsbursky 3a0733692f nfsd: recovery - make rec_file per net
Opening and closing of this file is done in client tracking init and exit
operations.
Client tracking is done in network namespace context already. So let's make
this file opened and closed per network context - this will simlify it's
management.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:13:53 -05:00
Stanislav Kinsbursky f252bc6806 nfsd: call state init and shutdown twice
Split NFSv4 state init and shutdown into two different calls: per-net one and
generic one.
Per-net cwinit/shutdown pair have to be called for any namespace, generic pair
- only once on NSFd kthreads start and shutdown respectively.

Refresh of diff-nfsd-call-state-init-twice

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:13:53 -05:00
Stanislav Kinsbursky d85ed44305 nfsd: cleanup NFSd state start a bit
This patch renames nfs4_state_start_net() into nfs4_state_create_net(), where
get_net() now performed.
Also it introduces new nfs4_state_start_net(), which is now responsible for
state creation and initializing all per-net data and which is now called from
nfs4_state_start().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:13:52 -05:00
Stanislav Kinsbursky 4dce0ac906 nfsd: cleanup NFSd state shutdown a bit
This patch renames __nfs4_state_shutdown_net() into nfs4_state_shutdown_net(),
__nfs4_state_shutdown() into nfs4_state_shutdown_net() and moves all network
related shutdown operations to nfs4_state_shutdown_net().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:13:51 -05:00
Stanislav Kinsbursky 4e37a7c207 nfsd: make delegations shutdown network namespace aware
NFSv4 delegations are stored in global list. But they are nfs4_client
dependent, which is network namespace aware already.
State shutdown and laundromat are done per network namespace as well.
So, delegations unhash have to be done in network namespace context.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:13:50 -05:00
Stanislav Kinsbursky c9a4962881 nfsd: make client_lock per net
This lock protects the client lru list and session hash table, which are
allocated per network namespace already.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:13:50 -05:00
Stanislav Kinsbursky ec28e02ca5 nfsd4: remove state lock from nfs4_state_shutdown
Protection of __nfs4_state_shutdown() with nfs4_lock_state() looks redundant.

This function is called by the last NFSd thread on it's exit and state lock
protects actually two functions (del_recall_lru is protected by recall_lock):
1) nfsd4_client_tracking_exit
2) __nfs4_state_shutdown_net

"nfsd4_client_tracking_exit" doesn't require state lock protection, because it's
state can be modified only by tracker callbacks.
Here a re they:
1) create: is called only from nfsd4_proc_compound.
2) remove: is called from either nfsd4_proc_compound or nfs4_laundromat.
3) check: is called only from nfsd4_proc_compound.
4) grace_done; called only from nfs4_laundromat.

nfsd4_proc_compound is called onll by NFSd kthread, which is exiting right
now.
nfs4_laundromat is called by laundry_wq. But laundromat_work was canceled
already.

"__nfs4_state_shutdown_net" also doesn't require state lock protection,
because all NFSd kthreads are dead, and no race can happen with NFSd start,
because "nfsd_up" flag is still set.
Moreover, all Nfsd shutdown is protected with global nfsd_mutex.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:13:49 -05:00
J. Bruce Fields dba88ba55a nfsd4: remove state lock from nfsd4_load_reboot_recovery_data
That function is only called under nfsd_mutex: we know that because the
only caller is nfsd_svc, via

        nfsd_svc
          nfsd_startup
            nfs4_state_start
              nfsd4_client_tracking_init
                client_tracking_ops->init == nfsd4_load_reboot_recovery_data

The shared state accessed here includes:

        - user_recovery_dirname: used here, modified only by
          nfs4_reset_recoverydir, which can be verified to only be
          called under nfsd_mutex.
        - filesystem state, protected by i_mutex (handwaving slightly
	  here)
        - rec_file, reclaim_str_hashtbl, reclaim_str_hashtbl_size: other
          than here, used only from code called from nfsd or laundromat
          threads, both of which should be started only after this runs
          (see nfsd_svc) and stopped before this could run again (see
          nfsd_shutdown, called from nfsd_last_thread).

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:13:48 -05:00
J. Bruce Fields a36b1725b3 nfsd4: return badname, not inval, on "." or "..", or "/"
The spec requires badname, not inval, in these cases.

Some callers want us to return enoent, but I can see no justification
for that.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-27 16:41:48 -05:00
J. Bruce Fields 063b0fb9fa nfsd4: downgrade some fs/nfsd/nfs4state.c BUG's
Linus has pointed out that indiscriminate use of BUG's can make it
harder to diagnose bugs because they can bring a machine down, often
before we manage to get any useful debugging information to the logs.
(Consider, for example, a BUG() that fires in a workqueue, or while
holding a spinlock).

Most of these BUG's won't do much more than kill an nfsd thread, but it
would still probably be safer to get out the warning without dying.

There's still more of this to do in nfsd/.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-26 09:08:16 -05:00
J. Bruce Fields ffe1137ba7 nfsd4: delay filling in write iovec array till after xdr decoding
Our server rejects compounds containing more than one write operation.
It's unclear whether this is really permitted by the spec; with 4.0,
it's possibly OK, with 4.1 (which has clearer limits on compound
parameters), it's probably not OK.  No client that we're aware of has
ever done this, but in theory it could be useful.

The source of the limitation: we need an array of iovecs to pass to the
write operation.  In the worst case that array of iovecs could have
hundreds of elements (the maximum rwsize divided by the page size), so
it's too big to put on the stack, or in each compound op.  So we instead
keep a single such array in the compound argument.

We fill in that array at the time we decode the xdr operation.

But we decode every op in the compound before executing any of them.  So
once we've used that array we can't decode another write.

If we instead delay filling in that array till the time we actually
perform the write, we can reuse it.

Another option might be to switch to decoding compound ops one at a
time.  I considered doing that, but it has a number of other side
effects, and I'd rather fix just this one problem for now.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-26 09:08:15 -05:00
J. Bruce Fields 70cc7f75b1 nfsd4: move more write parameters into xdr argument
In preparation for moving some of this elsewhere.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-26 09:08:14 -05:00
J. Bruce Fields 5a80a54d21 nfsd4: reorganize write decoding
In preparation for moving some of it elsewhere.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-26 09:08:14 -05:00
J. Bruce Fields 8a61b18c9b nfsd4: simplify reading of opnum
The comment here is totally bogus:
	- OP_WRITE + 1 is RELEASE_LOCKOWNER.  Maybe there was some older
	  version of the spec in which that served as a sort of
	  OP_ILLEGAL?  No idea, but it's clearly wrong now.
	- In any case, I can't see that the spec says anything about
	  what to do if the client sends us less ops than promised.
	  It's clearly nutty client behavior, and we should do
	  whatever's easiest: returning an xdr error (even though it
	  won't be consistent with the error on the last op returned)
	  seems fine to me.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-26 09:08:13 -05:00
J. Bruce Fields 447bfcc936 nfsd4: no, we're not going to check tags for utf8
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-26 09:08:12 -05:00
J. Bruce Fields 57d276d71a nfsd: fix v4 reply caching
Very embarassing: 1091006c5e "nfsd: turn
on reply cache for NFSv4" missed a line, effectively leaving the reply
cache off in the v4 case.  I thought I'd tested that, but I guess not.

This time, wrote a pynfs test to confirm it works.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-26 09:05:19 -05:00
Stanislav Kinsbursky 0912128149 nfsd: make laundromat network namespace aware
This patch moves laundromat_work to nfsd per-net context, thus allowing to run
multiple laundries.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:51 -05:00
Stanislav Kinsbursky 12760c6685 nfsd: pass nfsd_net instead of net to grace enders
Passing net context looks as overkill.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:50 -05:00
Stanislav Kinsbursky 3320fef19b nfsd: use service net instead of hard-coded init_net
This patch replaces init_net by SVC_NET(), where possible and also passes
proper context to nested functions where required.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:50 -05:00
Stanislav Kinsbursky 73758fed71 nfsd: make close_lru list per net
This list holds nfs4 clients (open) stateowner queue for last close replay,
which are network namespace aware. So let's make this list per network
namespace too.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:49 -05:00
Stanislav Kinsbursky 5ed58bb243 nfsd: make client_lru list per net
This list holds nfs4 clients queue for lease renewal, which are network
namespace aware. So let's make this list per network namespace too.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:48 -05:00
Stanislav Kinsbursky 1872de0e81 nfsd: make sessionid_hashtbl allocated per net
This hash holds established sessions state and closely associated with
nfs4_clients info, which are network namespace aware. So let's make it
allocated per network namespace too.

Note: this hash can be allocated in per-net operations. But it looks
better to allocate it on nfsd state start and thus don't waste resources
if server is not running.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:47 -05:00
Stanislav Kinsbursky 20e9e2bc98 nfsd: make lockowner_ino_hashtbl allocated per net
This hash holds file lock owners and closely associated with nfs4_clients info,
which are network namespace aware. So let's make it allocated per network
namespace too.

Note: this hash can be allocated in per-net operations. But it looks
better to allocate it on nfsd state start and thus don't waste resources
if server is not running.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:47 -05:00
Stanislav Kinsbursky 9b53113740 nfsd: make ownerstr_hashtbl allocated per net
This hash holds open owner state and closely associated with nfs4_clients
info, which are network namespace aware. So let's make it allocated per
network namespace too.

Note: this hash can be allocated in per-net operations. But it looks
better to allocate it on nfsd state start and thus don't waste resources
if server is not running.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:46 -05:00
Stanislav Kinsbursky a99454aa4f nfsd: make unconf_name_tree per net
This hash holds nfs4_clients info, which are network namespace aware.
So let's make it allocated per network namespace.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:45 -05:00
Stanislav Kinsbursky 0a7ec37727 nfsd: make unconf_id_hashtbl allocated per net
This hash holds nfs4_clients info, which are network namespace aware.
So let's make it allocated per network namespace.

Note: this hash can be allocated in per-net operations. But it looks
better to allocate it on nfsd state start and thus don't waste resources
if server is not running.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:45 -05:00
Stanislav Kinsbursky 382a62e76c nfsd: make conf_name_tree per net
This tree holds nfs4_clients info, which are network namespace aware.
So let's make it per network namespace.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:44 -05:00
Stanislav Kinsbursky 8daae4dc0d nfsd: make conf_id_hashtbl allocated per net
This hash holds nfs4_clients info, which are network namespace aware.
So let's make it allocated per network namespace.

Note: this hash can be allocated in per-net operations. But it looks
better to allocate it on nfsd state start and thus don't waste resources
if server is not running.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:43 -05:00
Stanislav Kinsbursky 52e19c09a1 nfsd: make reclaim_str_hashtbl allocated per net
This hash holds nfs4_clients info, which are network namespace aware.
So let's make it allocated per network namespace.

Note: this hash is used only by legacy tracker. So let's allocate hash in
tracker init.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:43 -05:00
Stanislav Kinsbursky c212cecfa2 nfsd: make nfs4_client network namespace dependent
And use it's net where possible.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:42 -05:00
Stanislav Kinsbursky 7f2210fa6b nfsd: use service net instead of hard-coded net where possible
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:41 -05:00
Fengguang Wu 2b4cf668a7 nfsd4: get_backchannel_cred should be static
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-14 11:23:00 -05:00
Fengguang Wu 135ae8270d nfsd4: init_session should be declared static
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-14 11:23:00 -05:00
Jeff Layton 7e4f015d81 nfsd: release the legacy reclaimable clients list in grace_done
The current code holds on to this list until nfsd is shut down, but it's
never touched once the grace period ends. Release that memory back into
the wild when the grace period ends.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-12 18:55:12 -05:00
Jeff Layton 2216d449a9 nfsd: get rid of cl_recdir field
Remove the cl_recdir field from the nfs4_client struct. Instead, just
compute it on the fly when and if it's needed, which is now only when
the legacy client tracking code is in effect.

The error handling in the legacy client tracker is also changed to
handle the case where md5 is unavailable. In that case, we'll warn
the admin with a KERN_ERR message and disable the client tracking.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-12 18:55:11 -05:00
Jeff Layton ac55fdc408 nfsd: move the confirmed and unconfirmed hlists to a rbtree
The current code requires that we md5 hash the name in order to store
the client in the confirmed and unconfirmed trees. Change it instead
to store the clients in a pair of rbtrees, and simply compare the
cl_names directly instead of hashing them. This also necessitates that
we add a new flag to the clp->cl_flags field to indicate which tree
the client is currently in.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-12 18:55:11 -05:00
Jeff Layton 0ce0c2b5d2 nfsd: don't search for client by hash on legacy reboot recovery gracedone
When nfsd starts, the legacy reboot recovery code creates a tracking
struct for each directory in the v4recoverydir. When the grace period
ends, it basically does a "readdir" on the directory again, and matches
each dentry in there to an existing client id to see if it should be
removed or not. If the matching client doesn't exist, or hasn't
reclaimed its state then it will remove that dentry.

This is pretty inefficient since it involves doing a lot of hash-bucket
searching. It also means that we have to keep relying on being able to
search for a nfs4_client by md5 hashed cl_recdir name.

Instead, add a pointer to the nfs4_client that indicates the association
between the nfs4_client_reclaim and nfs4_client. When a reclaim operation
comes in, we set the pointer to make that association. On gracedone, the
legacy client tracker will keep the recdir around iff:

1/ there is a reclaim record for the directory

...and...

2/ there's an association between the reclaim record and a client record
-- that is, a create or check operation was performed on the client that
matches that directory.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-12 18:55:11 -05:00
Jeff Layton 772a9bbbb5 nfsd: make nfs4_client_to_reclaim return a pointer to the reclaim record
Later callers will need to make changes to the record.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-12 18:55:11 -05:00
Jeff Layton ce30e5392f nfsd: break out reclaim record removal into separate function
We'll need to be able to call this from nfs4recover.c eventually.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-12 18:55:11 -05:00
Jeff Layton 278c931cb0 nfsd: have nfsd4_find_reclaim_client take a char * argument
Currently, it takes a client pointer, but later we're going to need to
search for these records without knowing whether a matching client even
exists.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-12 18:55:11 -05:00
Jeff Layton 8b0554e9a2 nfsd: warn about impending removal of nfsdcld upcall
Let's shoot for removing the nfsdcld upcall in 3.10. Most likely,
no one is actually using it so I don't expect this warning to
fire often (except maybe on misconfigured systems).

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-12 18:55:10 -05:00
Jeff Layton f3aa7e24c9 nfsd: pass info about the legacy recoverydir in environment variables
The usermodehelper upcall program can then decide to use this info as
a (one-way) transition mechanism to the new scheme. When a "check"
upcall occurs and the client doesn't exist in the database, we can
look to see whether the directory exists. If it does, then we'd add
the client to the database, remove the legacy recdir, and return
success to the kernel to allow the recovery to proceed.

For gracedone, we simply pass the v4recovery "topdir" so that the
upcall can clean it out prior to returning to the kernel.

A module parm is also added to disable the legacy conversion if
the admin chooses.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-12 18:55:10 -05:00
Jeff Layton 2d77bf0a55 nfsd: change heuristic for selecting the client_tracking_ops
First, try to use the new usermodehelper upcall. It should succeed or
fail quickly, so there's little cost to doing so.

If it fails, and the legacy tracking dir exists, use that. If it
doesn't exist then fall back to using nfsdcld.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-12 18:55:10 -05:00
Jeff Layton 2873d2147e nfsd: add a usermodehelper upcall for NFSv4 client ID tracking
Add a new client tracker upcall type that uses call_usermodehelper to
call out to a program. This seems to be the preferred method of
calling out to usermode these days for seldom-called upcalls. It's
simple and doesn't require a running daemon, so it should "just work"
as long as the binary is installed.

The client tracking exit operation is also changed to check for a
NULL pointer before running. The UMH upcall doesn't need to do anything
at module teardown time.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-12 18:55:10 -05:00
Jeff Layton a0af710a65 nfsd: remove unused argument to nfs4_has_reclaimed_state
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-10 14:56:54 -05:00
Jeff Layton 698d8d875a nfsd: fix error handling in nfsd4_remove_clid_dir
If the credential save fails, then we'll leak our mnt_want_write_file
reference.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-10 14:52:03 -05:00
J. Bruce Fields 12fc3e92d4 nfsd4: backchannel should use client-provided security flavor
For now this only adds support for AUTH_NULL.  (Previously we assumed
AUTH_UNIX.)  We'll also need AUTH_GSS, which is trickier.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-07 19:40:05 -05:00
J. Bruce Fields 57725155dc nfsd4: common helper to initialize callback work
I've found it confusing having the only references to
nfsd4_do_callback_rpc() in a different file.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-07 19:40:04 -05:00