Commit 14f7dd63 ("Copy XFS readdir hack into nfsd code") introduced a
bug to generic code which had been extant for a long time in the XFS
version -- it started to call through into lookup_one_len() and hence
into the file systems' ->lookup() methods without i_mutex held on the
directory.
This patch fixes it by locking the directory's i_mutex again before
calling the filldir functions. The original deadlocks which commit
14f7dd63 was designed to avoid are still avoided, because they were due
to fs-internal locking, not i_mutex.
While we're at it, fix the return type of nfsd_buffered_readdir() which
should be a __be32 not an int -- it's an NFS errno, not a Linux errno.
And return nfserrno(-ENOMEM) when allocation fails, not just -ENOMEM.
Sparse would have caught that, if it wasn't so busy bitching about
__cold__.
Commit 05f4f678 ("nfsd4: don't do lookup within readdir in recovery
code") introduced a similar problem with calling lookup_one_len()
without i_mutex, which this patch also addresses. To fix that, it was
necessary to fix the called functions so that they expect i_mutex to be
held; that part was done by J. Bruce Fields.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Umm-I-can-live-with-that-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: J. R. Okajima <hooanon05@yahoo.co.jp>
Tested-by: J. Bruce Fields <bfields@citi.umich.edu>
LKML-Reference: <8036.1237474444@jrobl>
Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
AFAICS, we have a subtle bug there: if we have crossed mountpoint
*and* it got mount --move'd away, we'll be holding only one
reference to fs containing dentry - exp->ex_path.mnt. IOW, we
ought to dput() before exp_put().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'for-2.6.30' of git://linux-nfs.org/~bfields/linux: (81 commits)
nfsd41: define nfsd4_set_statp as noop for !CONFIG_NFSD_V4
nfsd41: define NFSD_DRC_SIZE_SHIFT in set_max_drc
nfsd41: Documentation/filesystems/nfs41-server.txt
nfsd41: CREATE_EXCLUSIVE4_1
nfsd41: SUPPATTR_EXCLCREAT attribute
nfsd41: support for 3-word long attribute bitmask
nfsd: dynamically skip encoded fattr bitmap in _nfsd4_verify
nfsd41: pass writable attrs mask to nfsd4_decode_fattr
nfsd41: provide support for minor version 1 at rpc level
nfsd41: control nfsv4.1 svc via /proc/fs/nfsd/versions
nfsd41: add OPEN4_SHARE_ACCESS_WANT nfs4_stateid bmap
nfsd41: access_valid
nfsd41: clientid handling
nfsd41: check encode size for sessions maxresponse cached
nfsd41: stateid handling
nfsd: pass nfsd4_compound_state* to nfs4_preprocess_{state,seq}id_op
nfsd41: destroy_session operation
nfsd41: non-page DRC for solo sequence responses
nfsd41: Add a create session replay cache
nfsd41: create_session operation
...
Fixes the following compiler error:
fs/nfsd/nfssvc.c: In function 'set_max_drc':
fs/nfsd/nfssvc.c:240: error: 'NFSD_DRC_SIZE_SHIFT' undeclared
CONFIG_NFSD_V4 is not set
Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Implement the CREATE_EXCLUSIVE4_1 open mode conforming to
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26
This mode allows the client to atomically create a file
if it doesn't exist while setting some of its attributes.
It must be implemented if the server supports persistent
reply cache and/or pnfs.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Also, use client minorversion to generate supported attrs
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
_nfsd4_verify currently skips 3 words from the encoded buffer begining.
With support for 3-word attr bitmaps in nfsd41, nfsd4_encode_fattr
may encode 1, 2, or 3 words, and not always 2 as it used to be, hence
we need to find out where to skip using the encoded bitmap length.
Note: This patch may be applied over pre-nfsd41 nfsd.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Support enabling and disabling nfsv4.1 via /proc/fs/nfsd/versions
by writing the strings "+4.1" or "-4.1" correspondingly.
Use user mode nfs-utils (rpc.nfsd option) to enable.
This will allow us to get rid of CONFIG_NFSD_V4_1
[nfsd41: disable support for minorversion by default]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Separate the access bits from the want bits and enable __set_bit to
work correctly with st_access_bmap.
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
For nfs41, the open share flags are used also for
delegation "wants" and "signals". Check that they are valid.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Extract the clientid from sessionid to set the op_clientid on open.
Verify that the clid for other stateful ops is zero for minorversion != 0
Do all other checks for stateful ops without sessions.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
[fixed whitespace indent]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41 remove sl_session from nfsd4_open]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Calculate the space the compound response has taken after encoding the current
operation.
pad: add on 8 bytes for the next operation's op_code and status so that
there is room to cache a failure on the next operation.
Compare this length to the session se_fmaxresp_cached and return
nfserr_rep_too_big_to_cache if the length is too large.
Our se_fmaxresp_cached will always be a multiple of PAGE_SIZE, and so
will be at least a page and will therefore hold the xdr_buf head.
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfsd41: non-page DRC for solo sequence responses]
[fixed nfsd4_check_drc_limit cosmetics]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use cstate session in nfsd4_check_drc_limit]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
When sessions are used, stateful operation sequenceid and stateid handling
are not used. When sessions are used, on the first open set the seqid to 1,
mark state confirmed and skip seqid processing.
When sessionas are used the stateid generation number is ignored when it is zero
whereas without sessions bad_stateid or stale stateid is returned.
Add flags to propagate session use to all stateful ops and down to
check_stateid_generation.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfsd4_has_session should return a boolean, not u32]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: pass nfsd4_compoundres * to nfsd4_process_open1]
[nfsd41: calculate HAS_SESSION in nfs4_preprocess_stateid_op]
[nfsd41: calculate HAS_SESSION in nfs4_preprocess_seqid_op]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Currently we only use cstate->current_fh,
will also be used by nfsd41 code.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Implement the destory_session operation confoming to
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26
[use sessionid_lock spin lock]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
A session inactivity time compound (lease renewal) or a compound where the
sequence operation has sa_cachethis set to FALSE do not require any pages
to be held in the v4.1 DRC. This is because struct nfsd4_slot is already
caching the session information.
Add logic to the nfs41 server to not cache response pages for solo sequence
responses.
Return nfserr_replay_uncached_rep on the operation following the sequence
operation when sa_cachethis is FALSE.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use cstate session in nfsd4_replay_cache_entry]
[nfsd41: rename nfsd4_no_page_in_cache]
[nfsd41 rename nfsd4_enc_no_page_replay]
[nfsd41 nfsd4_is_solo_sequence]
[nfsd41 change nfsd4_not_cached return]
Signed-off-by: Andy Adamson <andros@netapp.com>
[changed return type to bool]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41 drop parens in nfsd4_is_solo_sequence call]
Signed-off-by: Andy Adamson <andros@netapp.com>
[changed "== 0" to "!"]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Replace the nfs4_client cl_seqid field with a single struct nfs41_slot used
for the create session replay cache.
The CREATE_SESSION slot sets the sl_session pointer to NULL. Otherwise, the
slot and it's replay cache are used just like the session slots.
Fix unconfirmed create_session replay response by initializing the
create_session slot sequence id to 0.
A future patch will set the CREATE_SESSION cache when a SEQUENCE operation
preceeds the CREATE_SESSION operation. This compound is currently only cached
in the session slot table.
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use bool inuse for slot state]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: revert portion of nfsd4_set_cache_entry]
Signed-off-by: Andy Adamson <andros@netpp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Implement the create_session operation confoming to
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26
Look up the client id (generated by the server on exchange_id,
given by the client on create_session).
If neither a confirmed or unconfirmed client is found
then the client id is stale
If a confirmed cilent is found (i.e. we already received
create_session for it) then compare the sequence id
to determine if it's a replay or possibly a mis-ordered rpc.
If the seqid is in order, update the confirmed client seqid
and procedd with updating the session parameters.
If an unconfirmed client_id is found then verify the creds
and seqid. If both match move the client id to confirmed state
and proceed with processing the create_session.
Currently, we do not support persistent sessions, and RDMA.
alloc_init_session generates a new sessionid and creates
a session structure.
NFSD_PAGES_PER_SLOT is used for the max response cached calculation, and for
the counting of DRC pages using the hard limits set in struct srv_serv.
A note on NFSD_PAGES_PER_SLOT:
Other patches in this series allow for NFSD_PAGES_PER_SLOT + 1 pages to be
cached in a DRC slot when the response size is less than NFSD_PAGES_PER_SLOT *
PAGE_SIZE but xdr_buf pages are used. e.g. a READDIR operation will encode a
small amount of data in the xdr_buf head, and then the READDIR in the xdr_buf
pages. So, the hard limit calculation use of pages by a session is
underestimated by the number of cached operations using the xdr_buf pages.
Yet another patch caches no pages for the solo sequence operation, or any
compound where cache_this is False. So the hard limit calculation use of
pages by a session is overestimated by the number of these operations in the
cache.
TODO: improve resource pre-allocation and negotiate session
parameters accordingly. Respect and possibly adjust
backchannel attributes.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
[nfsd41: remove headerpadsz from channel attributes]
Our client and server only support a headerpadsz of 0.
[nfsd41: use DRC limits in fore channel init]
[nfsd41: do not change CREATE_SESSION back channel attrs]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[use sessionid_lock spin lock]
[nfsd41: use bool inuse for slot state]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41 remove sl_session from alloc_init_session]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[simplify nfsd4_encode_create_session error handling]
[nfsd41: fix comment style in init_forechannel_attrs]
[nfsd41: allocate struct nfsd4_session and slot table in one piece]
[nfsd41: no need to INIT_LIST_HEAD in alloc_init_session just prior to list_add]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Replay a request in nfsd4_sequence.
Add a minorversion to struct nfsd4_compound_state.
Pass the current slot to nfs4svc_encode_compound res via struct
nfsd4_compoundres to set an NFSv4.1 DRC entry.
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use bool inuse for slot state]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use cstate session in nfs4svc_encode_compoundres]
[nfsd41 replace nfsd4_set_cache_entry]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Use no more than 1/128th of the number of free pages at nfsd startup for the
v4.1 DRC.
This is an arbitrary default which should probably end up under the control
of an administrator.
Signed-off-by: Andy Adamson <andros@netapp.com>
[moved added fields in struct svc_serv under CONFIG_NFSD_V4_1]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[fix set_max_drc calculation of sv_drc_max_pages]
[moved NFSD_DRC_SIZE_SHIFT's declaration up in header file]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cache all the result pages, including the rpc header in rq_respages[0],
for a request in the slot table cache entry.
Cache the statp pointer from nfsd_dispatch which points into rq_respages[0]
just past the rpc header. When setting a cache entry, calculate and save the
length of the nfs data minus the rpc header for rq_respages[0].
When replaying a cache entry, replace the cached rpc header with the
replayed request rpc result header, unless there is not enough room in the
cached results first page. In that case, use the cached rpc header.
The sessions fore channel maxresponse size cached is set to NFSD_PAGES_PER_SLOT
* PAGE_SIZE. For compounds we are cacheing with operations such as READDIR
that use the xdr_buf->pages to hold data, we choose to cache the extra page of
data rather than copying data from xdr_buf->pages into the xdr_buf->head page.
[nfsd41: limit cache to maxresponsesize_cached]
[nfsd41: mv nfsd4_set_statp under CONFIG_NFSD_V4_1]
[nfsd41: rename nfsd4_move_pages]
[nfsd41: rename page_no variable]
[nfsd41: rename nfsd4_set_cache_entry]
[nfsd41: fix nfsd41_copy_replay_data comment]
[nfsd41: add to nfsd4_set_cache_entry]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andy Adamson<andros@netapp.com>
[nfsd41: do not verify nfserr_sequence_pos for minorversion 0]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Implement the sequence operation conforming to
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26
Check for stale clientid (as derived from the sessionid).
Enforce slotid range and exactly-once semantics using
the slotid and seqid.
If everything went well renew the client lease and
mark the slot INPROGRESS.
Add a struct nfsd4_slot pointer to struct nfsd4_compound_state.
To be used for sessions DRC replay.
[nfsd41: rename sequence catchthis to cachethis]
Signed-off-by: Andy Adamson<andros@netapp.com>
[pulled some code to set cstate->slot from "nfsd DRC logic"]
[use sessionid_lock spin lock]
[nfsd41: use bool inuse for slot state]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd: add a struct nfsd4_slot pointer to struct nfsd4_compound_state]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: add nfsd4_session pointer to nfsd4_compound_state]
[nfsd41: set cstate session]
[nfsd41: use cstate session in nfsd4_sequence]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[simplify nfsd4_encode_sequence error handling]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
We need to distinguish between client names provided by NFSv4.0 clients
SETCLIENTID and those provided by NFSv4.1 via EXCHANGE_ID when looking
up the clientid by string.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfsd41: use boolean values for use_exchange_id argument]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: simplify match_clientid_establishment logic]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Implement the exchange_id operation confoming to
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-28
Based on the client provided name, hash a client id.
If a confirmed one is found, compare the op's creds and
verifier. If the creds match and the verifier is different
then expire the old client (client re-incarnated), otherwise,
if both match, assume it's a replay and ignore it.
If an unconfirmed client is found, then copy the new creds
and verifer if need update, otherwise assume replay.
The client is moved to a confirmed state on create_session.
In the nfs41 branch set the exchange_id flags to
EXCHGID4_FLAG_USE_NON_PNFS | EXCHGID4_FLAG_SUPP_MOVED_REFER
(pNFS is not supported, Referrals are supported,
Migration is not.).
Address various scenarios from section 18.35 of the spec:
1. Check for EXCHGID4_FLAG_UPD_CONFIRMED_REC_A and set
EXCHGID4_FLAG_CONFIRMED_R as appropriate.
2. Return error codes per 18.35.4 scenarios.
3. Update client records or generate new client ids depending on
scenario.
Note: 18.35.4 case 3 probably still needs revisiting. The handling
seems not quite right.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Andy Adamosn <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use utsname for major_id (and copy to server_scope)]
[nfsd41: fix handling of various exchange id scenarios]
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: reverse use of EXCHGID4_INVAL_FLAG_MASK_A]
[simplify nfsd4_encode_exchange_id error handling]
[nfsd41: embed an xdr_netobj in nfsd4_exchange_id]
[nfsd41: return nfserr_serverfault for spa_how == SP4_MACH_CRED]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Define nfsd41_dec_ops vector and add it to nfsd4_minorversion for
minorversion 1.
Note: nfsd4_enc_ops vector is shared for v4.0 and v4.1
since we don't need to filter out obsolete ops as this is
done in the decoding phase.
exchange_id, create_session, destroy_session, and sequence ops are
implemented as stubs returning nfserr_opnotsupp at this stage.
[was nfsd41: xdr stubs]
[get rid of CONFIG_NFSD_V4_1]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Simple sessionid hashing using its monotonically increasing sequence number.
Locking considerations:
sessionid_hashtbl access is controlled by the sessionid_lock spin lock.
It must be taken for insert, delete, and lookup.
nfsd4_sequence looks up the session id and if the session is found,
it calls nfsd4_get_session (still under the sessionid_lock).
nfsd4_destroy_session calls nfsd4_put_session after unhashing
it, so when the session's kref reaches zero it's going to get freed.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[we don't use a prime for sessionid hash table size]
[use sessionid_lock spin lock]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This patch provides basic data structures representing the nfs41
sessions and slots, plus helpers for keeping a reference count
on the session and freeing it.
Note that our server only support a headerpadsz of 0 and
it ignores backchannel attributes at the moment.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: remove headerpadsz from channel attributes]
[nfsd41: embed nfsd4_channel in nfsd4_session]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use bool inuse for slot state]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41 remove sl_session from nfsd4_slot]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
On an NFSv4.1 server cache miss that causes an upcall, NFS4ERR_DELAY will be
returned. It is up to the NFSv4.1 client to resend only the operations that
have not been processed.
Initialize rq_usedeferral to 1 in svc_process(). It sill be turned off in
nfsd4_proc_compound() only when NFSv4.1 Sessions are used.
Note: this isn't an adequate solution on its own. It's acceptable as a way
to get some minimal 4.1 up and working, but we're going to have to find a
way to avoid returning DELAY in all common cases before 4.1 can really be
considered ready.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: reverse rq_nodeferral negative logic]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[sunrpc: initialize rq_usedeferral]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
Remove two unneeded exports and make two symbols static in fs/mpage.c
Cleanup after commit 585d3bc06f
Trim includes of fdtable.h
Don't crap into descriptor table in binfmt_som
Trim includes in binfmt_elf
Don't mess with descriptor table in load_elf_binary()
Get rid of indirect include of fs_struct.h
New helper - current_umask()
check_unsafe_exec() doesn't care about signal handlers sharing
New locking/refcounting for fs_struct
Take fs_struct handling to new file (fs/fs_struct.c)
Get rid of bumping fs_struct refcount in pivot_root(2)
Kill unsharing fs_struct in __set_personality()
Pure code move; two new helper functions for nfsd and daemonize
(unshare_fs_struct() and daemonize_fs_struct() resp.; for now -
the same code as used to be in callers). unshare_fs_struct()
exported (for nfsd, as copy_fs_struct()/exit_fs() used to be),
copy_fs_struct() and exit_fs() don't need exports anymore.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Remove the allocation of struct nfsd4_compound_state.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Since an RPC service listener's protocol family is specified now via
svc_create_xprt(), it no longer needs to be passed to svc_create() or
svc_create_pooled(). Remove that argument from the synopsis of those
functions, and remove the sv_family field from the svc_serv struct.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The sv_family field is going away. Pass a protocol family argument to
svc_create_xprt() instead of extracting the family from the passed-in
svc_serv struct.
Again, as this is a listener socket and not an address, we make this
new argument an "int" protocol family, instead of an "sa_family_t."
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Make sure port value read from user space by write_ports is valid before
passing it to svc_find_xprt(). If it wasn't, the writer would get ENOENT
instead of EINVAL.
Noticed-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6: (27 commits)
ext2: Zero our b_size in ext2_quota_read()
trivial: fix typos/grammar errors in fs/Kconfig
quota: Coding style fixes
quota: Remove superfluous inlines
quota: Remove uppercase aliases for quota functions.
nfsd: Use lowercase names of quota functions
jfs: Use lowercase names of quota functions
udf: Use lowercase names of quota functions
ufs: Use lowercase names of quota functions
reiserfs: Use lowercase names of quota functions
ext4: Use lowercase names of quota functions
ext3: Use lowercase names of quota functions
ext2: Use lowercase names of quota functions
ramfs: Remove quota call
vfs: Use lowercase names of quota functions
quota: Remove dqbuf_t and other cleanups
quota: Remove NODQUOT macro
quota: Make global quota locks cacheline aligned
quota: Move quota files into separate directory
ext4: quota reservation for delayed allocation
...
* 'bkl-removal' of git://git.lwn.net/linux-2.6:
Rationalize fasync return values
Move FASYNC bit handling to f_op->fasync()
Use f_lock to protect f_flags
Rename struct file->f_ep_lock
Use lowercase names of quota functions instead of old uppercase ones.
CC: bfields@fieldses.org
CC: neilb@suse.de
Signed-off-by: Jan Kara <jack@suse.cz>
There is an inconsistency seen in the behaviour of nfs compared to other local
filesystems on linux when changing owner or group of a directory. If the
directory has SUID/SGID flags set, on changing owner or group on the directory,
the flags are stripped off on nfs. These flags are maintained on other
filesystems such as ext3.
To reproduce on a nfs share or local filesystem, run the following commands
mkdir test; chmod +s+g test; chown user1 test; ls -ld test
On the nfs share, the flags are stripped and the output seen is
drwxr-xr-x 2 user1 root 4096 Feb 23 2009 test
On other local filesystems(ex: ext3), the flags are not stripped and the output
seen is
drwsr-sr-x 2 user1 root 4096 Feb 23 13:57 test
chown_common() called from sys_chown() will only strip the flags if the inode is
not a directory.
static int chown_common(struct dentry * dentry, uid_t user, gid_t group)
{
..
if (!S_ISDIR(inode->i_mode))
newattrs.ia_valid |=
ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
..
}
See: http://www.opengroup.org/onlinepubs/7990989775/xsh/chown.html
"If the path argument refers to a regular file, the set-user-ID (S_ISUID) and
set-group-ID (S_ISGID) bits of the file mode are cleared upon successful return
from chown(), unless the call is made by a process with appropriate privileges,
in which case it is implementation-dependent whether these bits are altered. If
chown() is successfully invoked on a file that is not a regular file, these
bits may be cleared. These bits are defined in <sys/stat.h>."
The behaviour as it stands does not appear to violate POSIX. However the
actions performed are inconsistent when comparing ext3 and nfs.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The spec allows clients to change ip address, so we shouldn't be
requiring that setclientid always come from the same address. For
example, a client could reboot and get a new dhcpd address, but still
present the same clientid to the server. In that case the server should
revoke the client's previous state and allow it to continue, instead of
(as it currently does) returning a CLID_INUSE error.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Add /proc/fs/nfsd/pool_stats to export to userspace various
statistics about the operation of rpc server thread pools.
This patch is based on a forward-ported version of
knfsd-add-pool-thread-stats which has been shipping in the SGI
"Enhanced NFS" product since 2006 and which was previously
posted:
http://article.gmane.org/gmane.linux.nfs/10375
It has also been updated thus:
* moved EXPORT_SYMBOL() to near the function it exports
* made the new struct struct seq_operations const
* used SEQ_START_TOKEN instead of ((void *)1)
* merged fix from SGI PV 990526 "sunrpc: use dprintk instead of
printk in svc_pool_stats_*()" by Harshula Jayasuriya.
* merged fix from SGI PV 964001 "Crash reading pool_stats before
nfsds are started".
Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: Harshula Jayasuriya <harshula@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Stop gathering the data that feeds the 'th' line in /proc/net/rpc/nfsd
because the questionable data provided is not worth the scalability
impact of calculating it. Instead, always report zeroes. The current
approach suffers from three major issues:
1. update_thread_usage() increments buckets by call service
time or call arrival time...in jiffies. On lightly loaded
machines, call service times are usually < 1 jiffy; on
heavily loaded machines call arrival times will be << 1 jiffy.
So a large portion of the updates to the buckets are rounded
down to zero, and the histogram is undercounting.
2. As seen previously on the nfs mailing list, the format in which
the histogram is presented is cryptic, difficult to explain,
and difficult to use.
3. Updating the histogram requires taking a global spinlock and
dirtying the global variables nfsd_last_call, nfsd_busy, and
nfsdstats *twice* on every RPC call, which is a significant
scaling limitation.
Testing on a 4 CPU 4 NIC Altix using 4 IRIX clients each doing
1K streaming reads at full line rate, shows the stats update code
(inlined into nfsd()) takes about 1.7% of each CPU. This patch drops
the contribution from nfsd() into the profile noise.
This patch is a forward-ported version of knfsd-remove-nfsd-threadstats
which has been shipping in the SGI "Enhanced NFS" product since 2006.
In that time, exactly one customer has noticed that the threadstats
were missing. It has been previously posted:
http://article.gmane.org/gmane.linux.nfs/10376
and more recently requested to be posted again.
Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The main nfsd code was recently modified to no longer do lookups from
withing the readdir callback, to avoid locking problems on certain
filesystems.
This (rather hacky, and overdue for replacement) NFSv4 recovery code has
the same problem. Fix it to build up a list of names (instead of
dentries) and do the lookups afterwards.
Reported symptoms were a deadlock in the xfs code (called from
nfsd4_recdir_load), with /var/lib/nfs on xfs.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Reported-by: David Warren <warren@atmos.washington.edu>
Currently putpubfh returns NFSERR_OPNOTSUPP, which isn't actually
allowed for v4. The right error is probably NFSERR_NOTSUPP.
But let's just implement it; though rarely seen, it can be used by
Solaris (with a special mount option), is mandated by the rfc, and is
trivial for us to support.
Thanks to Yang Hongyang for pointing out the original problem, and to
Mike Eisler, Tom Talpey, Trond Myklebust, and Dave Noveck for further
argument....
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
If a filesystem being written to via NFS returns a short write count
(as opposed to an error) to nfsd, nfsd treats that as a success for
the entire write, rather than the short count that actually succeeded.
For example, given a 8192 byte write, if the underlying filesystem
only writes 4096 bytes, nfsd will ack back to the nfs client that all
8192 bytes were written. The nfs client does have retry logic for
short writes, but this is never called as the client is told the
complete write succeeded.
There are probably other ways it could happen, but in my case it
happened with a fuse (filesystem in userspace) filesystem which can
rather easily have a partial write.
Here is a patch to properly return the short write count to the
client.
Signed-off-by: David Shaw <dshaw@jabberwocky.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Thanks for Bill Baker at sun.com for catching this
at Connectathon 2009.
This bug was introduced in 2.6.27
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The errors returned aren't used. Just return 0 and make them available
to a dprintk(). Also, consistently use -ERRNO errors instead of nfs
errors.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Reviewed-by: Benny Halevy <bhalevy@panasas.com>
As part of reducing the scope of the client_mutex, and in order to
remove the need for mutexes from the callback code (so that callbacks
can be done as asynchronous rpc calls), move manipulations of the
file_hashtable under the recall_lock.
Update the relevant comments while we're here.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Alexandros Batsakis <batsakis@netapp.com>
Reviewed-by: Benny Halevy <bhalevy@panasas.com>
Since free_client() is guaranteed to only be called once, and to only
touch the client structure itself (not any common data structures), it
has no need for the state lock.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Alexandros Batsakis <batsakis@netapp.com>
Previous cleanup reveals an obvious (though harmless) bug: when
delegreturn gets a stateid that isn't for a delegation, it should return
an error rather than doing nothing.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Delegreturn is enough a special case for preprocess_stateid_op to
warrant just open-coding it in delegreturn.
There should be no change in behavior here; we're just reshuffling code.
Thanks to Yang Hongyang for catching a critical typo.
Reviewed-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
I can't recall ever seeing these printk's used to debug a problem. I'll
happily put them back if we see a case where they'd be useful. (Though
if we do that the find_XXX() errors would probably be better
reported in find_XXX() functions themselves.)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Note that we exit this first big "if" with stp == NULL if and only if we
took the first branch; therefore, the second "if" is redundant, and we
can just combine the two, simplifying the logic.
Reviewed-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
sometimes HPUX nfs client sends a create request to linux nfs server(v2/v3).
the dump of the request is like:
obj_attributes
mode: value follows
set_it: value follows (1)
mode: 00
uid: no value
set_it: no value (0)
gid: value follows
set_it: value follows (1)
gid: 8030
size: value follows
set_it: value follows (1)
size: 0
atime: don't change
set_it: don't change (0)
mtime: don't change
set_it: don't change (0)
note that mode is 00(havs no rwx privilege even for the owner) and it requires
to set size to 0.
as current nfsd(v2/v3) implementation, the server does mainly 2 steps:
1) creates the file in mode specified by calling vfs_create().
2) sets attributes for the file by calling nfsd_setattr().
at step 2), it finally calls file system specific setattr() function which may
fail when checking permission because changing size needs WRITE privilege but
it has none since mode is 000.
for this case, a new file created, we may simply ignore the request of
setting size to 0, so that WRITE privilege is not needed and the open
succeeds.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
--
vfs.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
not having the state locked before putting the client/delegation causes a bug.
Also removed the comment from the function header about the state being already locked
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The use of |= is confusing--the bitmask is always initialized to zero in
this case, so we're effectively just doing an assignment here.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Enable NFSD only when FILE_LOCKING is enabled, since we don't want to
support NFSD without FILE_LOCKING.
Signed-off-by: Manish Katiyar <mkatiyar@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
MSDOS_SUPER_MAGIC is defined in <linux/magic.h>,
so use MSDOS_SUPER_MAGIC directly.
Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The caller always knows specifically whether it's releasing a lockowner
or an openowner, and the code is simpler if we use separate functions
(and the apparent recursion is gone).
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The flags here attempt to make the code more general, but I find it
actually just adds confusion.
I think it's clearer to separate the logic for the open and lock cases
entirely. And eventually we may want to separate the stateowner and
stateid types as well, as many of the fields aren't shared between the
lock and open cases.
Also move to eliminate forward references.
Start with the stateid's.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Reviewed-by: Benny Halevy <bhalevy@panasas.com>
Although this operation is unsupported by our implementation
we still need to provide an encode routine for it to
merely encode its (error) status back in the compound reply.
Thanks for Bill Baker at sun.com for testing with the Sun
OpenSolaris' client, finding, and reporting this bug at
Connectathon 2009.
This bug was introduced in 2.6.27
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Traditionally, changes to struct file->f_flags have been done under BKL
protection, or with no protection at all. This patch causes all f_flags
changes after file open/creation time to be done under protection of
f_lock. This allows the removal of some BKL usage and fixes a number of
longstanding (if microscopic) races.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
nfsd4_lockt does a search for a lockstateowner when building the lock
struct to test. If one is found, it'll set fl_owner to it. Regardless of
whether that happens, it'll also set fl_lmops. Given that this lock is
basically a "lightweight" lock that's just used for checking conflicts,
setting fl_lmops is probably not appropriate for it.
This behavior exposed a bug in DLM's GETLK implementation where it
wasn't clearing out the fields in the file_lock before filling in
conflicting lock info. While we were able to fix this in DLM, it
still seems pointless and dangerous to set the fl_lmops this way
when we may have a NULL lockstateowner.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@pig.fieldses.org>
Since override_creds() took its own reference on new, we need to release
our own reference.
(Note the put_cred on the return value puts the *old* value of
current->creds, not the new passed-in value).
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
refactor the nfs4 server lock code to use last_byte_offset
to compute the last byte covered by the lock. Check for overflow
so that the last byte is set to NFS4_MAX_UINT64 if offset + len
wraps around.
Also, use NFS4_MAX_UINT64 for ~(u64)0 where appropriate.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
There's no use for nfs4_cb_null_ops's declaration in fs/nfsd/nfs4callback.c
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
When determining the fsid_type in fh_compose(), the setting of the FID
via fsid= export option needs to take precedence over using the UUID
device id.
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
A number of nfsd operations depend on the i_mutex to cover more code
than just the fsync, so the approach of 4c728ef583 "add a vfs_fsync
helper" doesn't work for nfsd. Revert the parts of those patches that
touch nfsd.
Note: we can't, however, remove the logic from vfs_fsync that was needed
only for the special case of nfsd, because a vfs_fsync(NULL,...) call
can still result indirectly from a stackable filesystem that was called
by nfsd. (Thanks to Christoph Hellwig for pointing this out.)
Reported-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Fix a regression in NFSD's permission checking introduced by the credentials
patches. There are two parts to the problem, both in nfsd_setuser():
(1) The return value of set_groups() is -ve if in error, not 0, and should be
checked appropriately. 0 indicates success.
(2) The UID to use for fs accesses is in new->fsuid, not new->uid (which is
0). This causes CAP_DAC_OVERRIDE to always be set, rather than being
cleared if the UID is anything other than 0 after squashing.
Reported-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Since nfsv4 allows LOCKT without an open, but the ->lock() method is a
file method, we fake up a struct file in the nfsv4 code with just the
fields we need initialized. But we forgot to initialize the file
operations, with the result that LOCKT never results in a call to the
filesystem's ->lock() method (if it exists).
We could just add that one more initialization. But this hack of faking
up a struct file with only some fields initialized seems the kind of
thing that might cause more problems in the future. We should either do
an open and get a real struct file, or make lock-testing an inode (not a
file) method.
This patch does the former.
Reported-by: Marc Eshel <eshel@almaden.ibm.com>
Tested-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Document the NFSD sysctl interface laid out in fs/nfsd/nfsctl.c.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: Instead of open-coding 2049, use the NFS_PORT macro.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: Rename recently-added failover functions to match the naming
convention in fs/nfsd/nfsctl.c.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
cksum.data is not freed up in one error case. Compile tested.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Minor cleanup/rewrite of find_stateid. Compile tested.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Fsync currently has a fdatawrite/fdatawait pair around the method call,
and a mutex_lock/unlock of the inode mutex. All callers of fsync have
to duplicate this, but we have a few and most of them don't quite get
it right. This patch adds a new vfs_fsync that takes care of this.
It's a little more complicated as usual as ->fsync might get a NULL file
pointer and just a dentry from nfsd, but otherwise gets afile and we
want to take the mapping and file operations from it when it is there.
Notes on the fsync callers:
- ecryptfs wasn't calling filemap_fdatawrite / filemap_fdatawait on the
lower file
- coda wasn't calling filemap_fdatawrite / filemap_fdatawait on the host
file, and returning 0 when ->fsync was missing
- shm wasn't calling either filemap_fdatawrite / filemap_fdatawait nor
taking i_mutex. Now given that shared memory doesn't have disk
backing not doing anything in fsync seems fine and I left it out of
the vfs_fsync conversion for now, but in that case we might just
not pass it through to the lower file at all but just call the no-op
simple_sync_file directly.
[and now actually export vfs_fsync]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We used to have rather schizophrenic set of checks for NULL ->i_op even
though it had been eliminated years ago. You'd need to go out of your
way to set it to NULL explicitly _and_ a bunch of code would die on
such inodes anyway. After killing two remaining places that still
did that bogosity, all that crap can go away.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
net: Allow dependancies of FDDI & Tokenring to be modular.
igb: Fix build warning when DCA is disabled.
net: Fix warning fallout from recent NAPI interface changes.
gro: Fix potential use after free
sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
sfc: When disabling the NIC, close the device rather than unregistering it
sfc: SFT9001: Add cable diagnostics
sfc: Add support for multiple PHY self-tests
sfc: Merge top-level functions for self-tests
sfc: Clean up PHY mode management in loopback self-test
sfc: Fix unreliable link detection in some loopback modes
sfc: Generate unique names for per-NIC workqueues
802.3ad: use standard ethhdr instead of ad_header
802.3ad: generalize out mac address initializer
802.3ad: initialize ports LACPDU from const initializer
802.3ad: remove typedef around ad_system
802.3ad: turn ports is_individual into a bool
802.3ad: turn ports is_enabled into a bool
802.3ad: make ntt bool
ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
...
Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
This patch adds server-side support for callbacks other than AUTH_SYS.
Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The rpc client needs to know the principal that the setclientid was done
as, so it can tell gssd who to authenticate to.
Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Two principals are involved in krb5 authentication: the target, who we
authenticate *to* (normally the name of the server, like
nfs/server.citi.umich.edu@CITI.UMICH.EDU), and the source, we we
authenticate *as* (normally a user, like bfields@UMICH.EDU)
In the case of NFSv4 callbacks, the target of the callback should be the
source of the client's setclientid call, and the source should be the
nfs server's own principal.
Therefore we allow svcgssd to pass down the name of the principal that
just authenticated, so that on setclientid we can store that principal
name with the new client, to be used later on callbacks.
Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Conflicts:
fs/nfsd/nfs4recover.c
Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().
Signed-off-by: James Morris <jmorris@namei.org>
If nfsd was shut down before the grace period ended, we could end up
with a freed object still on grace_list. Thanks to Jeff Moyer for
reporting the resulting list corruption warnings.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Tested-by: Jeff Moyer <jmoyer@redhat.com>
Conflicts:
security/keys/internal.h
security/keys/process_keys.c
security/keys/request_key.c
Fixed conflicts above by using the non 'tsk' versions.
Signed-off-by: James Morris <jmorris@namei.org>
Differentiate the objective and real subjective credentials from the effective
subjective credentials on a task by introducing a second credentials pointer
into the task_struct.
task_struct::real_cred then refers to the objective and apparent real
subjective credentials of a task, as perceived by the other tasks in the
system.
task_struct::cred then refers to the effective subjective credentials of a
task, as used by that task when it's actually running. These are not visible
to the other tasks in the system.
__task_cred(task) then refers to the objective/real credentials of the task in
question.
current_cred() refers to the effective subjective credentials of the current
task.
prepare_creds() uses the objective creds as a base and commit_creds() changes
both pointers in the task_struct (indeed commit_creds() requires them to be the
same).
override_creds() and revert_creds() change the subjective creds pointer only,
and the former returns the old subjective creds. These are used by NFSD,
faccessat() and do_coredump(), and will by used by CacheFiles.
In SELinux, current_has_perm() is provided as an alternative to
task_has_perm(). This uses the effective subjective context of current,
whereas task_has_perm() uses the objective/real context of the subject.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Inaugurate copy-on-write credentials management. This uses RCU to manage the
credentials pointer in the task_struct with respect to accesses by other tasks.
A process may only modify its own credentials, and so does not need locking to
access or modify its own credentials.
A mutex (cred_replace_mutex) is added to the task_struct to control the effect
of PTRACE_ATTACHED on credential calculations, particularly with respect to
execve().
With this patch, the contents of an active credentials struct may not be
changed directly; rather a new set of credentials must be prepared, modified
and committed using something like the following sequence of events:
struct cred *new = prepare_creds();
int ret = blah(new);
if (ret < 0) {
abort_creds(new);
return ret;
}
return commit_creds(new);
There are some exceptions to this rule: the keyrings pointed to by the active
credentials may be instantiated - keyrings violate the COW rule as managing
COW keyrings is tricky, given that it is possible for a task to directly alter
the keys in a keyring in use by another task.
To help enforce this, various pointers to sets of credentials, such as those in
the task_struct, are declared const. The purpose of this is compile-time
discouragement of altering credentials through those pointers. Once a set of
credentials has been made public through one of these pointers, it may not be
modified, except under special circumstances:
(1) Its reference count may incremented and decremented.
(2) The keyrings to which it points may be modified, but not replaced.
The only safe way to modify anything else is to create a replacement and commit
using the functions described in Documentation/credentials.txt (which will be
added by a later patch).
This patch and the preceding patches have been tested with the LTP SELinux
testsuite.
This patch makes several logical sets of alteration:
(1) execve().
This now prepares and commits credentials in various places in the
security code rather than altering the current creds directly.
(2) Temporary credential overrides.
do_coredump() and sys_faccessat() now prepare their own credentials and
temporarily override the ones currently on the acting thread, whilst
preventing interference from other threads by holding cred_replace_mutex
on the thread being dumped.
This will be replaced in a future patch by something that hands down the
credentials directly to the functions being called, rather than altering
the task's objective credentials.
(3) LSM interface.
A number of functions have been changed, added or removed:
(*) security_capset_check(), ->capset_check()
(*) security_capset_set(), ->capset_set()
Removed in favour of security_capset().
(*) security_capset(), ->capset()
New. This is passed a pointer to the new creds, a pointer to the old
creds and the proposed capability sets. It should fill in the new
creds or return an error. All pointers, barring the pointer to the
new creds, are now const.
(*) security_bprm_apply_creds(), ->bprm_apply_creds()
Changed; now returns a value, which will cause the process to be
killed if it's an error.
(*) security_task_alloc(), ->task_alloc_security()
Removed in favour of security_prepare_creds().
(*) security_cred_free(), ->cred_free()
New. Free security data attached to cred->security.
(*) security_prepare_creds(), ->cred_prepare()
New. Duplicate any security data attached to cred->security.
(*) security_commit_creds(), ->cred_commit()
New. Apply any security effects for the upcoming installation of new
security by commit_creds().
(*) security_task_post_setuid(), ->task_post_setuid()
Removed in favour of security_task_fix_setuid().
(*) security_task_fix_setuid(), ->task_fix_setuid()
Fix up the proposed new credentials for setuid(). This is used by
cap_set_fix_setuid() to implicitly adjust capabilities in line with
setuid() changes. Changes are made to the new credentials, rather
than the task itself as in security_task_post_setuid().
(*) security_task_reparent_to_init(), ->task_reparent_to_init()
Removed. Instead the task being reparented to init is referred
directly to init's credentials.
NOTE! This results in the loss of some state: SELinux's osid no
longer records the sid of the thread that forked it.
(*) security_key_alloc(), ->key_alloc()
(*) security_key_permission(), ->key_permission()
Changed. These now take cred pointers rather than task pointers to
refer to the security context.
(4) sys_capset().
This has been simplified and uses less locking. The LSM functions it
calls have been merged.
(5) reparent_to_kthreadd().
This gives the current thread the same credentials as init by simply using
commit_thread() to point that way.
(6) __sigqueue_alloc() and switch_uid()
__sigqueue_alloc() can't stop the target task from changing its creds
beneath it, so this function gets a reference to the currently applicable
user_struct which it then passes into the sigqueue struct it returns if
successful.
switch_uid() is now called from commit_creds(), and possibly should be
folded into that. commit_creds() should take care of protecting
__sigqueue_alloc().
(7) [sg]et[ug]id() and co and [sg]et_current_groups.
The set functions now all use prepare_creds(), commit_creds() and
abort_creds() to build and check a new set of credentials before applying
it.
security_task_set[ug]id() is called inside the prepared section. This
guarantees that nothing else will affect the creds until we've finished.
The calling of set_dumpable() has been moved into commit_creds().
Much of the functionality of set_user() has been moved into
commit_creds().
The get functions all simply access the data directly.
(8) security_task_prctl() and cap_task_prctl().
security_task_prctl() has been modified to return -ENOSYS if it doesn't
want to handle a function, or otherwise return the return value directly
rather than through an argument.
Additionally, cap_task_prctl() now prepares a new set of credentials, even
if it doesn't end up using it.
(9) Keyrings.
A number of changes have been made to the keyrings code:
(a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have
all been dropped and built in to the credentials functions directly.
They may want separating out again later.
(b) key_alloc() and search_process_keyrings() now take a cred pointer
rather than a task pointer to specify the security context.
(c) copy_creds() gives a new thread within the same thread group a new
thread keyring if its parent had one, otherwise it discards the thread
keyring.
(d) The authorisation key now points directly to the credentials to extend
the search into rather pointing to the task that carries them.
(e) Installing thread, process or session keyrings causes a new set of
credentials to be created, even though it's not strictly necessary for
process or session keyrings (they're shared).
(10) Usermode helper.
The usermode helper code now carries a cred struct pointer in its
subprocess_info struct instead of a new session keyring pointer. This set
of credentials is derived from init_cred and installed on the new process
after it has been cloned.
call_usermodehelper_setup() allocates the new credentials and
call_usermodehelper_freeinfo() discards them if they haven't been used. A
special cred function (prepare_usermodeinfo_creds()) is provided
specifically for call_usermodehelper_setup() to call.
call_usermodehelper_setkeys() adjusts the credentials to sport the
supplied keyring as the new session keyring.
(11) SELinux.
SELinux has a number of changes, in addition to those to support the LSM
interface changes mentioned above:
(a) selinux_setprocattr() no longer does its check for whether the
current ptracer can access processes with the new SID inside the lock
that covers getting the ptracer's SID. Whilst this lock ensures that
the check is done with the ptracer pinned, the result is only valid
until the lock is released, so there's no point doing it inside the
lock.
(12) is_single_threaded().
This function has been extracted from selinux_setprocattr() and put into
a file of its own in the lib/ directory as join_session_keyring() now
wants to use it too.
The code in SELinux just checked to see whether a task shared mm_structs
with other tasks (CLONE_VM), but that isn't good enough. We really want
to know if they're part of the same thread group (CLONE_THREAD).
(13) nfsd.
The NFS server daemon now has to use the COW credentials to set the
credentials it is going to use. It really needs to pass the credentials
down to the functions it calls, but it can't do that until other patches
in this series have been applied.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: James Morris <jmorris@namei.org>
Pass credentials through dentry_open() so that the COW creds patch can have
SELinux's flush_unauthorized_files() pass the appropriate creds back to itself
when it opens its null chardev.
The security_dentry_open() call also now takes a creds pointer, as does the
dentry_open hook in struct security_operations.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: James Morris <jmorris@namei.org>
Separate the task security context from task_struct. At this point, the
security data is temporarily embedded in the task_struct with two pointers
pointing to it.
Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
entry.S via asm-offsets.
With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.
Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
Commit 8d7c4203 "nfsd: fix failure to set eof in readdir in some
situations" introduced a bug: on a directory in an exported ext3
filesystem with dir_index unset, a READDIR will only return about 250
entries, even if the directory was larger.
Bisected it back to this commit; reverting it fixes the problem.
It turns out that in this case ext3 reads a block at a time, then
returns from readdir, which means we can end up with buf.full==0 but
with more entries in the directory still to be read. Before 8d7c4203
(but after c002a6c797 "Optimise NFS readdir hack slightly"), this would
cause us to return the READDIR result immediately, but with the eof bit
unset. That could cause a performance regression (because the client
would need more roundtrips to the server to read the whole directory),
but no loss in correctness, since the cleared eof bit caused the client
to send another readdir. After 8d7c4203, the setting of the eof bit
made this a correctness problem.
So, move nfserr_eof into the loop and remove the buf.full check so that
we loop until buf.used==0. The following seems to do the right thing
and reduces the network traffic since we don't return a READDIR result
until the buffer is full.
Tested on an empty directory & large directory; eof is properly sent and
there are no more short buffers.
Signed-off-by: Doug Nazar <nazard@dragoninc.ca>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u
can be replaced with %pI4
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before 14f7dd6320 "[PATCH] Copy XFS
readdir hack into nfsd code", readdir_cd->err was reset to eof before
each call to vfs_readdir; afterwards, it is set only once. Similarly,
c002a6c797 "[PATCH] Optimise NFS readdir
hack slightly", can cause us to exit without nfserr_eof set. Fix this.
This ensures the "eof" bit is set when needed in readdir replies. (The
particular case I saw was an nfsv4 readdir of an empty directory, which
returned with no entries (the protocol requires "." and ".." to be
filtered out), but with eof unset.)
Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Avoid calling the underlying ->readdir() again when we reached the end
already; keep going round the loop only if we stopped due to our own
buffer being full.
[AV: tidy the things up a bit, while we are there]
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
It's not the final state, but it allows moving ->readdir() instances
to passing filldir return value to caller of vfs_readdir().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Some file systems with their own internal locking have problems with the
way that nfsd calls the ->lookup() method from within a filldir function
called from their ->readdir() method. The recursion back into the file
system code can cause deadlock.
XFS has a fairly hackish solution to this which involves doing the
readdir() into a locally-allocated buffer, then going back through it
calling the filldir function afterwards. It's not ideal, but it works.
It's particularly suboptimal because XFS does this for local file
systems too, where it's completely unnecessary.
Copy this hack into the NFS code where it can be used only for NFS
export. In response to feedback, use it unconditionally rather than only
for the affected file systems.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We might as well do all of these at the end. Fix up a couple minor
style nits while we're there.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Drop reference to export key on error. Compile tested.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Fix a memory leak in nfsd_getxattr. nfsd_getxattr should free up memory
that it allocated if vfs_getxattr fails.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The Linux NFS server can be started via a user-space write to
/proc/fs/nfs/threads or to /proc/fs/nfs/portlist. In the first case,
all default listeners are started (both UDP and TCP). In the second,
a listener is started only for one specified transport.
The NFS server has to make sure lockd stays up until the last listener
transport goes away. To support both start-up interfaces, it should
do one lockd_up() for each NFSD listener.
The nfsd_init_socks() function used to do one lockd_up() call for each
svc_create_xprt(). Recently commit
26a4140923 mistakenly changed
nfsd_init_socks() to do only one lockd_up() call even though it still
does two svc_create_xprt() calls.
The end result is a lockd_down() BUG during NFSD shutdown processing
because nfsd_last_threads() does a lockd_down() call for each entry
on the sv_permsocks list, but the start-up code doesn't do a matching
number of lockd_up() calls.
Add a second lockd_up() in nfsd_init_socks() to make sure the number
of lockd_up() calls matches the number of entries on the NFS servers's
sv_permsocks list.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: The svc_addsock() function no longer uses its "proto"
argument, so remove it.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: Now that lockd_up() starts listeners for both transports, the
"proto" argument is no longer needed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Rewrite grace period code to unify management of grace period across
lockd and nfsd. The current code has lockd and nfsd cooperate to
compute a grace period which is satisfactory to them both, and then
individually enforce it. This creates a slight race condition, since
the enforcement is not coordinated. It's also more complicated than
necessary.
Here instead we have lockd and nfsd each inform common code when they
enter the grace period, and when they're ready to leave the grace
period, and allow normal locking only after both of them are ready to
leave.
We also expect the locks_start_grace()/locks_end_grace() interface here
to be simpler to build on for future cluster/high-availability work,
which may require (for example) putting individual filesystems into
grace, or enforcing grace periods across multiple cluster nodes.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
since commit ff7d9756b5
"nfsd: use static memory for callback program and stats"
do_probe_callback uses a static callback program
(NFS4_CALLBACK) rather than the one set in clp->cl_callback.cb_prog
as passed in by the client in setclientid (4.0)
or create_session (4.1).
This patches introduces rpc_create_args.prognumber that allows
overriding program->number when creating rpc_clnt.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Now that cb_stats are static (since commit
ff7d9756b5)
there's no need to clear them.
Initially I thought it might make sense to do
that every callback probing but since the stats
are per-program and they are shared between possibly
several client callback instances, zeroing them out
seems like the wrong thing to do.
Note that that commit also introduced a bug
since stats.program is also being cleared in the process
and it is not restored after the memset as it used to be.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
I had a report from someone building a large NFS server that they were
unable to start more than 585 nfsd threads. It was reported against an
older kernel using the slab allocator, and I tracked it down to the
large allocation in nfsd_racache_init failing.
It appears that the slub allocator handles large allocations better,
but large contiguous allocations can often be problematic. There
doesn't seem to be any reason that the racache has to be allocated as a
single large chunk. This patch breaks this up so that the racache is
built up from separate allocations.
(Thanks also to Takashi Iwai for a bugfix.)
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Takashi Iwai <tiwai@suse.de>
After using the encode_stateid helper the "p" pointer declared
by ENCODE_SEQID_OP_HEAD is warned as unused.
In the single site where it is still needed it can be declared
separately using the ENCODE_HEAD macro.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
nfsd4_encode_open first reservation is currently for 36 + sizeof(stateid_t)
while it writes after the stateid a cinfo (20 bytes) and 5 more 4-bytes
words, for a total of 40 + sizeof(stateid_t).
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
RFC 2623 section 2.3.2 permits the server to bypass gss authentication
checks for certain operations that a client may perform when mounting.
In the case of a client that doesn't have some form of credentials
available to it on boot, this allows it to perform the mount unattended.
(Presumably real file access won't be needed until a user with
credentials logs in.)
Being slightly more lenient allows lots of old clients to access
krb5-only exports, with the only loss being a small amount of
information leaked about the root directory of the export.
This affects only v2 and v3; v4 still requires authentication for all
access.
Thanks to Peter Staubach testing against a Solaris client, which
suggesting addition of v3 getattr, to the list, and to Trond for noting
that doing so exposes no additional information.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Peter Staubach <staubach@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Introduce and initialize an address family field in the svc_serv structure.
This field will determine what family to use for the service's listener
sockets and what families are advertised via the local rpcbind daemon.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The array we kmalloc() here is not large enough.
Thanks to Johann Dahm and David Richter for bug report and testing.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: David Richter <richterd@citi.umich.edu>
Tested-by: Johann Dahm <jdahm@umich.edu>
Move the cstate_alloc call so that if it fails, the response is setup to
encode the NFS error. The out label now means that the
nfsd4_compound_state has not been allocated.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
There doesn't seem to be a compelling reason why nfsd4_op_name() is
marked as "inline":
It's only used in a dprintk(), and as long as it has only one caller
non-ancient gcc versions anyway inline it automatically.
This patch fixes the following compile error with gcc 3.4:
...
CC fs/nfsd/nfs4proc.o
nfs4proc.c: In function `nfsd4_proc_compound':
nfs4proc.c:854: sorry, unimplemented: inlining failed in call to
nfs4proc.c:897: sorry, unimplemented: called from here
make[3]: *** [fs/nfsd/nfs4proc.o] Error 1
Reported-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
[ Also made it "const char *" - Linus]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Once clp is assigned, it never becomes NULL, so we can make a label for it
in the error handling code. Because the call to path_lookup follows the
call to auth_domain_find, its error handling code should jump to this new
label.
The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@r@
expression x,E;
statement S;
position p1,p2,p3;
@@
(
if ((x = auth_domain_find@p1(...)) == NULL || ...) S
|
x = auth_domain_find@p1(...)
... when != x
if (x == NULL || ...) S
)
<...
if@p3 (...) { ... when != auth_domain_put(x)
when != if (x) { ... auth_domain_put(x); ...}
return@p2 ...;
}
...>
(
return x;
|
return 0;
|
x = E
|
E = x
|
auth_domain_put(x)
)
@exists@
position r.p1,r.p2,r.p3;
expression x;
int ret != 0;
statement S;
@@
* x = auth_domain_find@p1(...)
<...
* if@p3 (...)
S
...>
* return@p2 \(NULL\|ret\);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
fs.h needs path.h, not namei.h; nfs_fs.h doesn't need it at all.
Several places in the tree needed direct include.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Remove the unused mode parameter from vfs_symlink and callers.
Thanks to Tetsuo Handa for noticing.
CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Fix nlm_fopen() to return NLM_FAILED (or NLM_LCK_DENIED_NOLOCKS) instead
of NLM_LCK_DENIED. The latter means the lock request failed because of a
conflicting lock (i.e. a temporary error), which is wrong in this case.
Also fix the client to return ENOLCK instead of EAGAIN if a blocking lock
request returns with NLM_LOCK_DENIED.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: David Teigland <teigland@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-2.6.27' of git://linux-nfs.org/~bfields/linux: (51 commits)
nfsd: nfs4xdr.c do-while is not a compound statement
nfsd: Use C99 initializers in fs/nfsd/nfs4xdr.c
lockd: Pass "struct sockaddr *" to new failover-by-IP function
lockd: get host reference in nlmsvc_create_block() instead of callers
lockd: minor svclock.c style fixes
lockd: eliminate duplicate nlmsvc_lookup_host call from nlmsvc_lock
lockd: eliminate duplicate nlmsvc_lookup_host call from nlmsvc_testlock
lockd: nlm_release_host() checks for NULL, caller needn't
file lock: reorder struct file_lock to save space on 64 bit builds
nfsd: take file and mnt write in nfs4_upgrade_open
nfsd: document open share bit tracking
nfsd: tabulate nfs4 xdr encoding functions
nfsd: dprint operation names
svcrdma: Change WR context get/put to use the kmem cache
svcrdma: Create a kmem cache for the WR contexts
svcrdma: Add flush_scheduled_work to module exit function
svcrdma: Limit ORD based on client's advertised IRD
svcrdma: Remove unused wait q from svcrdma_xprt structure
svcrdma: Remove unneeded spin locks from __svc_rdma_free
svcrdma: Add dma map count and WARN_ON
...
The WRITEMEM macro produces sparse warnings of the form:
fs/nfsd/nfs4xdr.c:2668:2: warning: do-while statement is not a compound statement
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Thanks to problem report and original patch from Harvey Harrison.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Benny Halevy <bhalevy@panasas.com>
Pass a more generic socket address type to nlmsvc_unlock_all_by_ip() to
allow for future support of IPv6. Also provide additional sanity
checking in failover_unlock_ip() when constructing the server's IP
address.
As an added bonus, provide clean kerneldoc comments on related NLM
interfaces which were recently added.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The cl_chatty flag alows us to control whether a given rpc client leaves
"server X not responding, timed out"
messages in the syslog. Such messages make sense for ordinary nfs
clients (where an unresponsive server means applications on the
mountpoint are probably hanging), but not for the callback client (which
can fail more commonly, with the only result just of disabling some
optimizations).
Previously cl_chatty was removed, do to lack of users; reinstate it, and
use it for the nfsd's callback client.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
It's not immediately obvious from the code why we're doing this.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Benny Halevy <bhalevy@panasas.com>
In preparation for minorversion 1
All encoders now return an nfserr status (typically their
nfserr argument). Unsupported ops go through nfsd4_encode_operation
too, so use nfsd4_encode_noop to encode nothing for their reply body.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Have separate vectors of operation decoders for each minorversion.
Obsolete ops in newer minorversions have default implementation returning
nfserr_opnotsupp.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Check minorversion once before decoding any operation and reject with
nfserr_minor_vers_mismatch if != 0 (this still happens in nfsd4_proc_compound).
In this case return a zero length resultdata array as required by RFC3530.
minorversion 1 processing will have its own vector of decoders.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Multiple mnt_want_write() calls in the switch statement looks really
ugly.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
knfsd currently uses 2 signal masks when processing requests. A "loose"
mask (SHUTDOWN_SIGS) that it uses when receiving network requests, and
then a more "strict" mask (ALLOWED_SIGS, which is just SIGKILL) that it
allows when doing the actual operation on the local storage.
This is apparently unnecessarily complicated. The underlying filesystem
should be able to sanely handle a signal in the middle of an operation.
This patch removes the signal mask handling from knfsd altogether. When
knfsd is started as a kthread, all signals are ignored. It then allows
all of the signals in SHUTDOWN_SIGS. There's no need to set the mask
as well.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Thanks to Frank Van Maarseveen for the original problem report: "A
privileged process on an NFS client which drops privileges after using
them to change the current working directory, will experience incorrect
EACCES after an NFS server reboot. This problem can also occur after
memory pressure on the server, particularly when the client side is
quiet for some time."
This occurs because the filehandle points to a directory whose parents
are no longer in the dentry cache, and we're attempting to reconnect the
directory to its parents without adequate permissions to perform lookups
in the parent directories.
We can therefore fix the problem by acquiring the necessary capabilities
before attempting the reconnection. We do this only in the
no_subtree_check case, since the documented behavior of the
subtree_check export option requires the server to check that the user
has lookup permissions on all parents.
The subtree_check case still has a problem, since reconnect_path()
unnecessarily requires both read and lookup permissions on all parent
directories. However, a fix in that case would be more delicate, and
use of subtree_check is already discouraged for other reasons.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Frank van Maarseveen <frankvm@frankvm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Rename nfsd_permission() specific MAY_* flags to NFSD_MAY_* to make it
clear, that these are not used outside nfsd, and to avoid name and
number space conflicts with the VFS.
[comment from hch: rename MAY_READ, MAY_WRITE and MAY_EXEC as well]
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
OCFS2 can return -ERESTARTSYS from write requests (and possibly
elsewhere) if there is a signal pending.
If nfsd is shutdown (by sending a signal to each thread) while there
is still an IO load from the client, each thread could handle one last
request with a signal pending. This can result in -ERESTARTSYS
which is not understood by nfserrno() and so is reflected back to
the client as nfserr_io aka -EIO. This is wrong.
Instead, interpret ERESTARTSYS to mean "try again later" by returning
nfserr_jukebox. The client will resend and - if the server is
restarted - the write will (hopefully) be successful and everyone will
be happy.
The symptom that I narrowed down to this was:
copy a large file via NFS to an OCFS2 filesystem, and restart
the nfs server during the copy.
The 'cp' might get an -EIO, and the file will be corrupted -
presumably holes in the middle where writes appeared to fail.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
We need the nfsd_mutex before accessing nfsd_serv->sv_nrthreads or we
can't even guarantee nfsd_serv will still be there.
Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Since we no longer make any distinction between shutdown signals with
nfsd, then it becomes easier to just standardize on a particular signal
to use to bring it down (SIGINT, in this case).
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This patch is rather large, but I couldn't figure out a way to break it
up that would remain bisectable. It does several things:
- change svc_thread_fn typedef to better match what kthread_create expects
- change svc_pool_map_set_cpumask to be more kthread friendly. Make it
take a task arg and and get rid of the "oldmask"
- have svc_set_num_threads call kthread_create directly
- eliminate __svc_create_thread
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The special handling for SIGHUP in knfsd is a holdover from much
earlier versions of Linux where reloading the export table was
more expensive. That facility is not really needed anymore and
to my knowledge, is seldom-used.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Several of the nfsd filesystem interfaces allow changes to parameters
that don't have any effect on a running nfsd service. They are only ever
checked when nfsd is started. This patch fixes it so that changes to
those procfiles return -EBUSY if nfsd is already running to make it
clear that changes on the fly don't work.
The patch should also close some relatively harmless races between
changing the info in those interfaces and starting nfsd, since these
variables are being moved under the protection of the nfsd_mutex.
Finally, the nfsv4recoverydir file always returns -EINVAL if read. This
patch fixes it to return the recoverydir path as expected.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This removes the BKL from the RPC service creation codepath. The BKL
really isn't adequate for this job since some of this info needs
protection across sleeps.
Also, add some comments to try and clarify how the locking should work
and to make it clear that the BKL isn't necessary as long as there is
adequate locking between tasks when touching the svc_serv fields.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
WRITEMEM zeroes the last word in the destination buffer
for padding purposes, but this must not be done if
no bytes are to be copied, as it would result
in zeroing of the word right before the array.
The current implementation works since it's always called
with non zero nbytes or it follows an encoding of the
string (or opaque) length which, if equal to zero,
can be overwritten with zero.
Nevertheless, it seems safer to check for this case.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
We already print each operation of the compound when debugging is turned
on; printing the result could also help with remote debugging.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
These bit operations don't need to be atomic. They're all done under a
single big mutex anyway.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use proc_create() to make sure that ->proc_fops be setup before gluing PDE to
main tree.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Presumably this is left over from earlier drafts of v4, which listed
TIME_METADATA as writeable. It's read-only in rfc 3530, and shouldn't
be modifiable anyway.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The file_lock structure is used both as a heavy-weight representation of
an active lock, with pointers to reference-counted structures, etc., and
as a simple container for parameters that describe a file lock.
The conflicting lock returned from __posix_lock_file is an example of
the latter; so don't call the filesystem or lock manager callbacks when
copying to it. This also saves the need for an unnecessary
locks_init_lock in the nfsv4 server.
Thanks to Trond for pointing out the error.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Add /proc/fs/nfsd/unlock_filesystem, which allows e.g.:
shell> echo /mnt/sfs1 > /proc/fs/nfsd/unlock_filesystem
so that a filesystem can be unmounted before allowing a peer nfsd to
take over nfs service for the filesystem.
Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Cc: Lon Hohberger <lhh@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
fs/lockd/svcsubs.c | 66 +++++++++++++++++++++++++++++++++++++++-----
fs/nfsd/nfsctl.c | 65 +++++++++++++++++++++++++++++++++++++++++++
include/linux/lockd/lockd.h | 7 ++++
3 files changed, 131 insertions(+), 7 deletions(-)
For high-availability NFS service, we generally need to be able to drop
file locks held on the exported filesystem before moving clients to a
new server. Currently the only way to do that is by shutting down lockd
entirely, which is often undesireable (for example, if you want to
continue exporting other filesystems).
This patch allows the administrator to release all locks held by clients
accessing the client through a given server ip address, by echoing that
address to a new file, /proc/fs/nfsd/unlock_ip, as in:
shell> echo 10.1.1.2 > /proc/fs/nfsd/unlock_ip
The expected sequence of events can be:
1. Tear down the IP address
2. Unexport the path
3. Write IP to /proc/fs/nfsd/unlock_ip to unlock files
4. Signal peer to begin take-over.
For now we only support IPv4 addresses and NFSv2/v3 (NFSv4 locks are not
affected).
Also, if unmounting the filesystem is required, we assume at step 3 that
clients using the given server ip are the only clients holding locks on
the given filesystem; otherwise, an additional patch is required to
allow revoking all locks held by lockd on a given filesystem.
Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Cc: Lon Hohberger <lhh@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
fs/lockd/svcsubs.c | 66 +++++++++++++++++++++++++++++++++++++++-----
fs/nfsd/nfsctl.c | 65 +++++++++++++++++++++++++++++++++++++++++++
include/linux/lockd/lockd.h | 7 ++++
3 files changed, 131 insertions(+), 7 deletions(-)
Currently, knfsd only clears the setuid bit if the owner of a file is
changed on a SETATTR call, and only clears the setgid bit if the group
is changed. POSIX says this in the spec for chown():
"If the specified file is a regular file, one or more of the
S_IXUSR, S_IXGRP, or S_IXOTH bits of the file mode are set, and the
process does not have appropriate privileges, the set-user-ID
(S_ISUID) and set-group-ID (S_ISGID) bits of the file mode shall
be cleared upon successful return from chown()."
If I'm reading this correctly, then knfsd is doing this wrong. It should
be clearing both the setuid and setgid bit on any SETATTR that changes
the uid or gid. This wasn't really as noticable before, but now that the
ATTR_KILL_S*ID bits are a no-op for the NFS client, it's more evident.
This patch corrects the nfsd_setattr logic so that this occurs. It also
does a bit of cleanup to the function.
There is also one small behavioral change. If a SETATTR call comes in
that changes the uid/gid and the mode, then we now only clear the setgid
bit if the group execute bit isn't set. The setgid bit without a group
execute bit signifies mandatory locking and we likely don't want to
clear the bit in that case. Since there is no call in POSIX that should
generate a SETATTR call like this, then this should rarely happen, but
it's worth noting.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
There's no need to dynamically allocate this memory, and doing so may
create the possibility of races on shutdown of the rpc client. (We've
witnessed it only after adding rpcsec_gss support to the server, after
which the rpc code can send destroys calls that expect to still be able
to access the rpc_stats structure after it has been destroyed.)
Such races are in theory possible if the module containing this "static"
memory is removed very quickly after an rpc client is destroyed, but
we haven't seen that happen.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Move the code that actually parses the filehandle and looks up the
dentry and export to a separate function. This simplifies the reference
counting a little and moves fh_verify() a little closer to the kernel
ideal of small, minimally-indentended functions. Clean up a few other
minor style sins along the way.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
While lease is correctly checked by supplying the type argument to
vfs_setlease(), it's stored with fl_type uninitialized. This breaks the
logic when checking the type of the lease. The fix is to initialize
fl_type.
The old code still happened to function correctly since F_RDLCK is zero,
and we only implement read delegations currently (nor write
delegations). But that's no excuse for not fixing this.
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
fs/nfsd/vfs.c:991:27: warning: Using plain integer as NULL pointer
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Add extern to nfsd/nfsd.h
fs/nfsd/nfssvc.c:146:5: warning: symbol 'nfsd_nrthreads' was not declared. Should it be static?
fs/nfsd/nfssvc.c:261:5: warning: symbol 'nfsd_nrpools' was not declared. Should it be static?
fs/nfsd/nfssvc.c:269:5: warning: symbol 'nfsd_get_nrthreads' was not declared. Should it be static?
fs/nfsd/nfssvc.c:281:5: warning: symbol 'nfsd_set_nrthreads' was not declared. Should it be static?
fs/nfsd/export.c:1534:23: warning: symbol 'nfs_exports_op' was not declared. Should it be static?
Add include of auth.h
fs/nfsd/auth.c:27:5: warning: symbol 'nfsd_setuser' was not declared. Should it be static?
Make static, move forward declaration closer to where it's needed.
fs/nfsd/nfs4state.c:1877:1: warning: symbol 'laundromat_main' was not declared. Should it be static?
Make static, forward declaration was already marked static.
fs/nfsd/nfs4idmap.c:206:1: warning: symbol 'idtoname_parse' was not declared. Should it be static?
fs/nfsd/vfs.c:1156:1: warning: symbol 'nfsd_create_setattr' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This patch makes the needlessly global nfsd_create_setattr() static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Likewise, distros usually leave CONFIG_NFSD_TCP enabled.
TCP support in the Linux NFS server is stable enough that we can leave it
on always. CONFIG_NFSD_TCP adds about 10 lines of code, and defaults to
"Y" anyway.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The code here is difficult to understand; attempt to clarify somewhat by
pulling out one of the more mystifying conditionals into a separate
function.
While we're here, also add lease_time to the list of attributes that we
don't really need to cross a mountpoint to fetch.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Peter Staubach <staubach@redhat.com>
This adds IPv6 support to the interfaces that are used to express nfsd
exports. All addressed are stored internally as IPv6; backwards
compatibility is maintained using mapped addresses.
Thanks to Bruce Fields, Brian Haley, Neil Brown and Hideaki Joshifuji
for comments
Signed-off-by: Aurelien Charbon <aurelien.charbon@bull.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Brian Haley <brian.haley@hp.com>
Cc: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
If we depend on the inodes for writeability, we will not catch the r/o mounts
when implemented.
This patches uses __mnt_want_write(). It does not guarantee that the mount
will stay writeable after the check. But, this is OK for one of the checks
because it is just for a printk().
The other two are probably unnecessary and duplicate existing checks in the
VFS. This won't make them better checks than before, but it will make them
detect r/o mounts.
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This basically audits the callers of xattr_permission(), which calls
permission() and can perform writes to the filesystem.
[AV: add missing parts - removexattr() and nfsd posix acls, plug for a leak
spotted by Miklos]
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This also uses the little helper in the NFS code to make an if() a little bit
less ugly. We introduced the helper at the beginning of the series.
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This takes care of all of the direct callers of vfs_mknod().
Since a few of these cases also handle normal file creation
as well, this also covers some calls to vfs_create().
So that we don't have to make three mnt_want/drop_write()
calls inside of the switch statement, we move some of its
logic outside of the switch and into a helper function
suggested by Christoph.
This also encapsulates a fix for mknod(S_IFREG) that Miklos
found.
[AV: merged mkdir handling, added missing nfsd pieces]
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Elevate the write count during the vfs_rmdir() and vfs_unlink().
[AV: merged rmdir and unlink parts, added missing pieces in nfsd]
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If someone decides to demote a file from r/w to just
r/o, they can use this same code as __fput().
NFS does just that, and will use this in the next
patch.
AV: drop write access in __fput() only after we evict from file list.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Cc: Erez Zadok <ezk@cs.sunysb.edu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J Bruce Fields" <bfields@fieldses.org>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This bug was always here, but before my commit 6fa02839bf
("recheck for secure ports in fh_verify"), it could only be triggered by
failure of a kmalloc(). After that commit it could be triggered by a
client making a request from a non-reserved port for access to an export
marked "secure". (Exports are "secure" by default.)
The result is a struct svc_export with a reference count one too low,
resulting in likely oopses next time the export is accessed.
The reference counting here is not straightforward; a later patch will
clean up fh_verify().
Thanks to Lukas Hejtmanek for the bug report and followup.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Lukas Hejtmanek <xhejtman@ics.muni.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sorry for the noise, but here's the v3 of this compilation fix :)
There are some places, which declare the char buf[...] on the stack
to push it later into dprintk(). Since the dprintk sometimes (if the
CONFIG_SYSCTL=n) becomes an empty do { } while (0) stub, these buffers
cause gcc to produce appropriate warnings.
Wrap these buffers with RPC_IFDEBUG macro, as Trond proposed, to
compile them out when not needed.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
d_path() is used on a <dentry,vfsmount> pair. Lets use a struct path to
reflect this.
[akpm@linux-foundation.org: fix build in mm/memory.c]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Bryan Wu <bryan.wu@analog.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
seq_path() is always called with a dentry and a vfsmount from a struct path.
Make seq_path() take it directly as an argument.
Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I'm embedding struct path into struct svc_expkey.
Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Add path_put() functions for releasing a reference to the dentry and
vfsmount of a struct path in the right order
* Switch from path_release(nd) to path_put(&nd->path)
* Rename dput_path() to path_put_conditional()
[akpm@linux-foundation.org: fix cifs]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the central patch of a cleanup series. In most cases there is no good
reason why someone would want to use a dentry for itself. This series reflects
that fact and embeds a struct path into nameidata.
Together with the other patches of this series
- it enforced the correct order of getting/releasing the reference count on
<dentry,vfsmount> pairs
- it prepares the VFS for stacking support since it is essential to have a
struct path in every place where the stack can be traversed
- it reduces the overall code size:
without patch series:
text data bss dec hex filename
5321639 858418 715768 6895825 6938d1 vmlinux
with patch series:
text data bss dec hex filename
5320026 858418 715768 6894212 693284 vmlinux
This patch:
Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix cifs]
[akpm@linux-foundation.org: fix smack]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The patch supports legacy (32-bit) capability userspace, and where possible
translates 32-bit capabilities to/from userspace and the VFS to 64-bit
kernel space capabilities. If a capability set cannot be compressed into
32-bits for consumption by user space, the system call fails, with -ERANGE.
FWIW libcap-2.00 supports this change (and earlier capability formats)
http://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.6/
[akpm@linux-foundation.org: coding-syle fixes]
[akpm@linux-foundation.org: use get_task_comm()]
[ezk@cs.sunysb.edu: build fix]
[akpm@linux-foundation.org: do not initialise statics to 0 or NULL]
[akpm@linux-foundation.org: unused var]
[serue@us.ibm.com: export __cap_ symbols]
Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: James Morris <jmorris@namei.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Neil Brown points out that we're checking buf[size-1] in a couple places
without first checking whether size is zero.
Actually, given the implementation of simple_transaction_get(), buf[-1]
is zero, so in both of these cases the subsequent check of the value of
buf[size-1] will catch this case.
But it seems fragile to depend on that, so add explicit checks for this
case.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: NeilBrown <neilb@suse.de>
Neither EPERM and ENOENT map to valid errors for PUTROOTFH according to
rfc 3530, and, if anything, ENOENT is likely to be slightly more
informative; so don't bother mapping ENOENT to EPERM. (Probably this
was originally done because one likely cause was that there is an fsid=0
export but that it isn't permitted to this particular client. Now that
we allow WRONGSEC returns, this is somewhat less likely.)
In the long term we should work to make this situation less likely,
perhaps by turning off nfsv4 service entirely in the absence of the
pseudofs root, or constructing a pseudofilesystem root ourselves in the
kernel as necessary.
Thanks to Benny Halevy <bhalevy@panasas.com> for pointing out this
problem.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Benny Halevy <bhalevy@panasas.com>
Create a transport independent version of the svc_sock_names function.
The toclose capability of the svc_sock_names service can be implemented
using the svc_xprt_find and svc_xprt_close services.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Update the write handler for the portlist file to allow creating new
listening endpoints on a transport. The general form of the string is:
<transport_name><space><port number>
For example:
echo "tcp 2049" > /proc/fs/nfsd/portlist
This is intended to support the creation of a listening endpoint for
RDMA transports without adding #ifdef code to the nfssvc.c file.
Transports can also be removed as follows:
'-'<transport_name><space><port number>
For example:
echo "-tcp 2049" > /proc/fs/nfsd/portlist
Attempting to add a listener with an invalid transport string results
in EPROTONOSUPPORT and a perror string of "Protocol not supported".
Attempting to remove an non-existent listener (.e.g. bad proto or port)
results in ENOTCONN and a perror string of
"Transport endpoint is not connected"
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Move sk_list and sk_ready to svc_xprt. This involves close because these
lists are walked by svcs when closing all their transports. So I combined
the moving of these lists to svc_xprt with making close transport independent.
The svc_force_sock_close has been changed to svc_close_all and takes a list
as an argument. This removes some svc internals knowledge from the svcs.
This code races with module removal and transport addition.
Thanks to Simon Holm Thøgersen for a compile fix.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Simon Holm Thøgersen <odie@cs.aau.dk>
Modify the various kernel RPC svcs to use the svc_create_xprt service.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Document these checks a little better and inline, as suggested by Neil
Brown (note both functions have two callers). Remove an obviously bogus
check while we're there (checking whether unsigned value is negative).
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
The server silently ignores attempts to set the uid and gid on create.
Based on the comment, this appears to have been done to prevent some
overly-clever IRIX client from causing itself problems.
Perhaps we should remove that hack completely. For now, at least, it
makes sense to allow root (when no_root_squash is set) to set uid and
gid.
While we're there, since nfsd_create and nfsd_create_v3 share the same
logic, pull that out into a separate function. And spell out the
individual modifications of ia_valid instead of doing them both at once
inside a conditional.
Thanks to Roger Willcocks <roger@filmlight.ltd.uk> for the bug report
and original patch on which this is based.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This patch addresses a compatibility issue with a Linux NFS server and
AIX NFS client.
I have exported /export as fsid=0 with sec=krb5:krb5i
I have mount --bind /home onto /export/home
I have exported /export/home with sec=krb5i
The AIX client mounts / -o sec=krb5:krb5i onto /mnt
If I do an ls /mnt, the AIX client gets a permission error. Looking at
the network traceIwe see a READDIR looking for attributes
FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID. The response gives a
NFS4ERR_WRONGSEC which the AIX client is not expecting.
Since the AIX client is only asking for an attribute that is an
attribute of the parent file system (pseudo root in my example), it
seems reasonable that there should not be an error.
In discussing this issue with Bruce Fields, I initially proposed
ignoring the error in nfsd4_encode_dirent_fattr() if all that was being
asked for was FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID, however,
Bruce suggested that we avoid calling cross_mnt() if only these
attributes are requested.
The following patch implements bypassing cross_mnt() if only
FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID are called. Since there
is some complexity in the code in nfsd4_encode_fattr(), I didn't want to
duplicate code (and introduce a maintenance nightmare), so I added a
parameter to nfsd4_encode_fattr() that indicates whether it should
ignore cross mounts and simply fill in the attribute using the passed in
dentry as opposed to it's parent.
Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The failure to return a stateowner from nfs4_preprocess_seqid_op() means
in the case where a lock request is of a type incompatible with an open
(due to, e.g., an application attempting a write lock on a file open for
read), means that fs/nfsd/nfs4xdr.c:ENCODE_SEQID_OP_TAIL() never bumps
the seqid as it should. The client, attempting to close the file
afterwards, then gets an (incorrect) bad sequence id error. Worse, this
prevents the open file from ever being closed, so we leak state.
Thanks to Benny Halevy and Trond Myklebust for analysis, and to Steven
Wilton for the report and extensive data-gathering.
Cc: Benny Halevy <bhalevy@panasas.com>
Cc: Steven Wilton <steven.wilton@team.eftel.com.au>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
When the callback channel fails, we inform the client of that by
returning a cb_path_down error the next time it tries to renew its
lease.
If we wait most of a lease period before deciding that a callback has
failed and that the callback channel is down, then we decrease the
chances that the client will find out in time to do anything about it.
So, mark the channel down as soon as we recognize that an rpc has
failed. However, continue trying to recall delegations anyway, in hopes
it will come back up. This will prevent more delegations from being
given out, and ensure cb_path_down is returned to renew calls earlier,
while still making the best effort to deliver recalls of existing
delegations.
Also fix a couple comments and remove a dprink that doesn't seem likely
to be useful.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Declare this variable in the one function where it's used, and clean up
some minor style problems.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
We generate a unique cl_confirm for every new client; so if we've
already checked that this cl_confirm agrees with the cl_confirm of
unconf, then we already know that it does not agree with the cl_confirm
of conf.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Again, the only way conf and unconf can have the same clientid is if
they were created in the "probable callback update" case of setclientid,
in which case we already know that the cl_verifier fields must agree.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
If conf and unconf are both found in the lookup by cl_clientid, then
they share the same cl_clientid. We always create a unique new
cl_clientid field when creating a new client--the only exception is the
"probable callback update" case in setclientid, where we copy the old
cl_clientid from another clientid with the same name.
Therefore two clients with the same cl_client field also always share
the same cl_name field, and a couple of the checks here are redundant.
Thanks to Simon Holm Thøgersen for a compile fix.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Simon Holm Thøgersen <odie@cs.aau.dk>
Using a counter instead of the nanoseconds value seems more likely to
produce a unique cl_confirm.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
We're supposed to generate a different cl_confirm verifier for each new
client, so these to cl_confirm values should never be the same.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Most of these comments just summarize the code.
The matching of code to the cases described in the RFC may still be
useful, though; add specific section references to make that easier to
follow. Also update references to the outdated RFC 3010.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
While we're here, let's remove the redundant (and now wrong) pathname in
the comment, and the #ifdef __KERNEL__'s.
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This header is used only in a few places in fs/nfsd, so there seems to
be little point to having it in include/. (Thanks to Robert Day for
pointing this out.)
Cc: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Newer server features such as nfsv4 and gss depend on proc to work, so a
failure to initialize the proc files they need should be treated as
fatal.
Thanks to Andrew Morton for style fix and compile fix in case where
CONFIG_NFSD_V4 is undefined.
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
I assume the reason failure of creation was ignored here was just to
continue support embedded systems that want nfsd but not proc.
However, in cases where proc is supported it would be clearer to fail
entirely than to come up with some features disabled.
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
There's really nothing much the caller can do if cache unregistration
fails. And indeed, all any caller does in this case is print an error
and continue. So just return void and move the printk's inside
cache_unregister.
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
If the reply cache initialization fails due to a kmalloc failure,
currently we try to soldier on with a reduced (or nonexistant) reply
cache.
Better to just fail immediately: the failure is then much easier to
understand and debug, and it could save us complexity in some later
code. (But actually, it doesn't help currently because the cache is
also turned off in some odd failure cases; we should probably find a
better way to handle those failure cases some day.)
Fix some minor style problems while we're at it, and rename
nfsd_cache_init() to remove the need for a comment describing it.
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Handle the failure case here with something closer to the standard
kernel style.
Doesn't really matter for now, but I'd like to add a few more failure
cases, and then this'll help.
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
We forgot to shut down the nfs4 state and idmapping code in this case.
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The length "nbytes" passed into read_buf should never be negative, but
we check only for too-large values of "nbytes", not for too-small
values. Make nbytes unsigned, so it's clear that the former tests are
sufficient. (Despite this read_buf() currently correctly returns an xdr
error in the case of a negative length, thanks to an unsigned
comparison with size_of() and bounds-checking in kmalloc(). This seems
very fragile, though.)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: path name lengths are unsigned on the wire, negative lengths
are not meaningful natively either.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: adjust the sign of the length argument of nfsd_lookup and
nfsd_lookup_dentry, for consistency with recent changes. NFSD version
4 callers already pass an unsigned file name length.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: file name lengths are unsigned on the wire, negative lengths
are not meaningful natively either.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Obviously at some point we thought "error" represented the length when
positive. This appears to be a long-standing typo.
Thanks to Prasad Potluri <pvp@us.ibm.com> for finding the problem and
proposing an earlier version of this patch.
Cc: Steve French <smfltc@us.ibm.com>
Cc: Prasad V Potluri <pvp@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Dereferenced pointer "dentry" without checking and assigned to inode
in the declaration.
(We could just delete the NULL checks that follow instead, as we never
get to the encode function in this particular case. But it takes a
little detective work to verify that fact, so it's probably safer to
leave the checks in place.)
Cc: Steve French <smfltc@us.ibm.com>
Signed-off-by: Prasad V Potluri <pvp@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The whole reason to move this callback-channel probe into a separate
thread was because (for now) we don't have an easy way to create the
rpc_client asynchronously. But I forgot to move the rpc_create() to the
spawned thread. Doh! Fix that.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Our callback code doesn't actually handle concurrent attempts to probe
the callback channel. Some rethinking of the locking may be required.
However, we can also just move the callback probing to this case. Since
this is the only time a client is "confirmed" (and since that can only
happen once in the lifetime of a client), this ensures we only probe
once.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
When RPCSEC/GSS and krb5i is used, requests are padded, typically to a multiple
of 8 bytes. This can make the request look slightly longer than it
really is.
As of
f34b95689d "The NFSv2/NFSv3 server does not handle zero
length WRITE request correctly",
the xdr decode routines for NFSv2 and NFSv3 reject requests that aren't
the right length, so krb5i (for example) WRITE requests can get lost.
This patch relaxes the appropriate test and enhances the related comment.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Peter Staubach <staubach@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As with commit 7fc90ec93a ("knfsd: nfsd:
call nfsd_setuser() on fh_compose(), fix nfsd4 permissions problem")
this is a case where we need to redo a security check in fh_verify()
even though the filehandle already has an associated dentry--if the
filehandle was created by fh_compose() in an earlier operation of the
nfsv4 compound, then we may not have done these checks yet.
Without this fix it is possible, for example, to traverse from an export
without the secure ports requirement to one with it in a single
compound, and bypass the secure port check on the new export.
While we're here, fix up some minor style problems and change a printk()
to a dprintk(), to make it harder for random unprivileged users to spam
the logs.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Reviewed-By: NeilBrown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The v2/v3 acl code in nfsd is translating any return from fh_verify() to
nfserr_inval. This is particularly unfortunate in the case of an
nfserr_dropit return, which is an internal error meant to indicate to
callers that this request has been deferred and should just be dropped
pending the results of an upcall to mountd.
Thanks to Roland <devzero@web.de> for bug report and data collection.
Cc: Roland <devzero@web.de>
Acked-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Reviewed-By: NeilBrown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Not architecture specific code should not #include <asm/scatterlist.h>.
This patch therefore either replaces them with
#include <linux/scatterlist.h> or simply removes them if they were
unused.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* 'sg' of git://git.kernel.dk/linux-2.6-block:
Add CONFIG_DEBUG_SG sg validation
Change table chaining layout
Update arch/ to use sg helpers
Update swiotlb to use sg helpers
Update net/ to use sg helpers
Update fs/ to use sg helpers
[SG] Update drivers to use sg helpers
[SG] Update crypto/ to sg helpers
[SG] Update block layer to use sg helpers
[SG] Add helpers for manipulating SG entries
Now that all filesystems are converted remove support for the old methods.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: <linux-ext4@vger.kernel.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: David Chinner <dgc@sgi.com>
Cc: Timothy Shimmin <tes@sgi.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Chris Mason <mason@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset is a medium scale rewrite of the export operations interface.
The goal is to make the interface less complex, and easier to understand from
the filesystem side, aswell as preparing generic support for exporting of
64bit inode numbers.
This touches all nfs exporting filesystems, and I've done testing on all of
the filesystems I have here locally (xfs, ext2, ext3, reiserfs, jfs)
This patch:
Add a structured fid type so that we don't have to pass an array of u32 values
around everywhere. It's a union of possible layouts.
As a start there's only the u32 array and the traditional 32bit inode format,
but there will be more in one of my next patchset when I start to document the
various filehandle formats we have in lowlevel filesystems better.
Also add an enum that gives the various filehandle types human- readable
names.
Note: Some people might think the struct containing an anonymous union is
ugly, but I didn't want to pass around a raw union type.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: <linux-ext4@vger.kernel.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: David Chinner <dgc@sgi.com>
Cc: Timothy Shimmin <tes@sgi.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Chris Mason <mason@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The task_struct->pid member is going to be deprecated, so start
using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in
the kernel.
The first thing to start with is the pid, printed to dmesg - in
this case we may safely use task_pid_nr(). Besides, printks produce
more (much more) than a half of all the explicit pid usage.
[akpm@linux-foundation.org: git-drm went and changed lots of stuff]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's theoretically possible for a single SETATTR call to come in that sets the
mode and the uid/gid. In that case, don't set the ATTR_KILL_S*ID bits since
that would trip the BUG() in notify_change. Just fix up the mode to have the
same effect.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement file posix capabilities. This allows programs to be given a
subset of root's powers regardless of who runs them, without having to use
setuid and giving the binary all of root's powers.
This version works with Kaigai Kohei's userspace tools, found at
http://www.kaigai.gr.jp/index.php. For more information on how to use this
patch, Chris Friedhoff has posted a nice page at
http://www.friedhoff.org/fscaps.html.
Changelog:
Nov 27:
Incorporate fixes from Andrew Morton
(security-introduce-file-caps-tweaks and
security-introduce-file-caps-warning-fix)
Fix Kconfig dependency.
Fix change signaling behavior when file caps are not compiled in.
Nov 13:
Integrate comments from Alexey: Remove CONFIG_ ifdef from
capability.h, and use %zd for printing a size_t.
Nov 13:
Fix endianness warnings by sparse as suggested by Alexey
Dobriyan.
Nov 09:
Address warnings of unused variables at cap_bprm_set_security
when file capabilities are disabled, and simultaneously clean
up the code a little, by pulling the new code into a helper
function.
Nov 08:
For pointers to required userspace tools and how to use
them, see http://www.friedhoff.org/fscaps.html.
Nov 07:
Fix the calculation of the highest bit checked in
check_cap_sanity().
Nov 07:
Allow file caps to be enabled without CONFIG_SECURITY, since
capabilities are the default.
Hook cap_task_setscheduler when !CONFIG_SECURITY.
Move capable(TASK_KILL) to end of cap_task_kill to reduce
audit messages.
Nov 05:
Add secondary calls in selinux/hooks.c to task_setioprio and
task_setscheduler so that selinux and capabilities with file
cap support can be stacked.
Sep 05:
As Seth Arnold points out, uid checks are out of place
for capability code.
Sep 01:
Define task_setscheduler, task_setioprio, cap_task_kill, and
task_setnice to make sure a user cannot affect a process in which
they called a program with some fscaps.
One remaining question is the note under task_setscheduler: are we
ok with CAP_SYS_NICE being sufficient to confine a process to a
cpuset?
It is a semantic change, as without fsccaps, attach_task doesn't
allow CAP_SYS_NICE to override the uid equivalence check. But since
it uses security_task_setscheduler, which elsewhere is used where
CAP_SYS_NICE can be used to override the uid equivalence check,
fixing it might be tough.
task_setscheduler
note: this also controls cpuset:attach_task. Are we ok with
CAP_SYS_NICE being used to confine to a cpuset?
task_setioprio
task_setnice
sys_setpriority uses this (through set_one_prio) for another
process. Need same checks as setrlimit
Aug 21:
Updated secureexec implementation to reflect the fact that
euid and uid might be the same and nonzero, but the process
might still have elevated caps.
Aug 15:
Handle endianness of xattrs.
Enforce capability version match between kernel and disk.
Enforce that no bits beyond the known max capability are
set, else return -EPERM.
With this extra processing, it may be worth reconsidering
doing all the work at bprm_set_security rather than
d_instantiate.
Aug 10:
Always call getxattr at bprm_set_security, rather than
caching it at d_instantiate.
[morgan@kernel.org: file-caps clean up for linux/capability.h]
[bunk@kernel.org: unexport cap_inode_killpriv]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Andrew Morgan <morgan@kernel.org>
Signed-off-by: Andrew Morgan <morgan@kernel.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I'm going to be modifying nfsd_rename() shortly to support read-only bind
mounts. This #ifdef is around the area I'm patching, and it starts to get
really ugly if I just try to add my new code by itself. Using this little
helper makes things a lot cleaner to use.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch makes the following needlessly global functions static:
- exp_get_by_name()
- exp_parent()
- exp_find()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'locks' of git://linux-nfs.org/~bfields/linux:
nfsd: remove IS_ISMNDLCK macro
Rework /proc/locks via seq_files and seq_list helpers
fs/locks.c: use list_for_each_entry() instead of list_for_each()
NFS: clean up explicit check for mandatory locks
AFS: clean up explicit check for mandatory locks
9PFS: clean up explicit check for mandatory locks
GFS2: clean up explicit check for mandatory locks
Cleanup macros for distinguishing mandatory locks
Documentation: move locks.txt in filesystems/
locks: add warning about mandatory locking races
Documentation: move mandatory locking documentation to filesystems/
locks: Fix potential OOPS in generic_setlease()
Use list_first_entry in locks_wake_up_blocks
locks: fix flock_lock_file() comment
Memory shortage can result in inconsistent flocks state
locks: kill redundant local variable
locks: reverse order of posix_locks_conflict() arguments
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (131 commits)
NFSv4: Fix a typo in nfs_inode_reclaim_delegation
NFS: Add a boot parameter to disable 64 bit inode numbers
NFS: nfs_refresh_inode should clear cache_validity flags on success
NFS: Fix a connectathon regression in NFSv3 and NFSv4
NFS: Use nfs_refresh_inode() in ops that aren't expected to change the inode
SUNRPC: Don't call xprt_release in call refresh
SUNRPC: Don't call xprt_release() if call_allocate fails
SUNRPC: Fix buggy UDP transmission
[23/37] Clean up duplicate includes in
[2.6 patch] net/sunrpc/rpcb_clnt.c: make struct rpcb_program static
SUNRPC: Use correct type in buffer length calculations
SUNRPC: Fix default hostname created in rpc_create()
nfs: add server port to rpc_pipe info file
NFS: Get rid of some obsolete macros
NFS: Simplify filehandle revalidation
NFS: Ensure that nfs_link() returns a hashed dentry
NFS: Be strict about dentry revalidation when doing exclusive create
NFS: Don't zap the readdir caches upon error
NFS: Remove the redundant nfs_reval_fsid()
NFSv3: Always use directory post-op attributes in nfs3_proc_lookup
...
Fix up trivial conflict due to sock_owned_by_user() cleanup manually in
net/sunrpc/xprtsock.c
This macro is only used in one place; in this place it seems simpler to
put open-code it and move the comment to where it's used.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The combination of S_ISGID bit set and S_IXGRP bit unset is used to mark the
inode as "mandatory lockable" and there's a macro for this check called
MANDATORY_LOCK(inode). However, fs/locks.c and some filesystems still perform
the explicit i_mode checking. Besides, Andrew pointed out, that this macro is
buggy itself, as it dereferences the inode arg twice.
Convert this macro into static inline function and switch its users to it,
making the code shorter and more readable.
The __mandatory_lock() helper is to be used in places where the IS_MANDLOCK()
for superblock is already known to be true.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Without this we always return 2^32-1 as the the maximum namelength.
Thanks to Andreas Gruenbacher for bug report and testing.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Andreas Gruenbacher <agruen@suse.de>
It's not enough to take a reference on the delegation object itself; we
need to ensure that the rpc_client won't go away just as we're about to
make an rpc call.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
If a callback still holds a reference on the client, then it may be
about to perform an rpc call, so it isn't safe to call rpc_shutdown().
(Though rpc_shutdown() does wait for any outstanding rpc's, it can't
know if a new rpc is about to be issued with that client.)
So, wait to shutdown the rpc_client until the reference count on the
client has gone to zero.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Currently there's a race that can cause an oops in generic_setlease.
(In detail: nfsd, when it removes a lease, does so by calling
vfs_setlease() with F_UNLCK and a pointer to the fl_flock field, which
in turn points to nfsd's existing lease; but the first thing the
setlease code does is call time_out_leases(). If the lease happens to
already be beyond the lease break time, that will free the lease and (in
nfsd's release_private callback) set fl_flock to NULL, leading to a NULL
deference soon after in vfs_setlease().)
There are probably other things to fix here too, but it seems inherently
racy to allow either locks.c or nfsd to time out this lease. Instead
just set the fl_break_time to 0 (preventing locks.c from ever timing out
this lock) and leave it up to nfsd's laundromat thread to deal with it.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Modify the NFS server code to support 64 bit ino's, as
appropriate for the system and the NFS protocol version.
The gist of the changes is to query the underlying file system
for attributes and not just to use the cached attributes in the
inode. For this specific purpose, the inode only contains an
ino field which unsigned long, which is large enough on 64 bit
platforms, but is not large enough on 32 bit platforms.
I haven't been able to find any reason why ->getattr can't be called
while i_mutex. The specification indicates that i_mutex is not
required to be held in order to invoke ->getattr, but it doesn't say
that i_mutex can't be held while invoking ->getattr.
I also haven't come to any conclusions regarding the value of
lease_get_mtime() and whether it should or should not be invoked
by fill_post_wcc() too. I chose not to change this because I
thought that it was safer to leave well enough alone. If we
decide to make a change, it can be done separately.
Signed-off-by: Peter Staubach <staubach@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Each branch of this if-then-else has a bunch of duplicated code that we
could just put at the end.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
fs/nfsd/nfsctl.c: In function 'write_filehandle':
fs/nfsd/nfsctl.c:301: warning: 'maxsize' may be used uninitialized in this function
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
It doesn't make sense to make the callback with credentials that the
client made the setclientid with. Instead the spec requires that the
callback occur with the credentials the client authenticated *to*.
It probably doesn't matter what we use for auth_unix, and some more
infrastructure will be needed for auth_gss, so let's just remove the
cred lookup for now.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
We have some slabs that the nfs4 server uses to store state objects.
We're currently creating and destroying those slabs whenever the server
is brought up or down. That seems excessive; may as well just do that
in module initialization and exit.
Also add some minor header cleanup. (Thanks to Andrew Morton for that
and a compile fix.)
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
We want to allow gss on the callback channel, so people using krb5 can
still get the benefits of delegations.
But looking up the rpc credential can take some time in that case. And
we shouldn't delay the response to setclientid_confirm while we wait.
It may be inefficient, but for now the simplest solution is just to
spawn a new thread as necessary for the purpose.
(Thanks to Adrian Bunk for catching a missing static here.)
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Cc: Adrian Bunk <bunk@kernel.org>
Note that qword_get() returns length or -1, not an -ERROR.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
To quote a recent mail from Andrew Morton:
Look: if there's a way in which an unprivileged user can trigger
a printk we fix it, end of story.
OK. I assume that goes double for printk()s that might be triggered by
random hosts on the internet. So, disable some printk()s that look like
they could be triggered by malfunctioning or malicious clients. For
now, just downgrade them to dprintk()s.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Benny Halevy suggested renaming cmp_* to same_* to make the meaning of
the return value clearer.
Fix some nearby style deviations while we're at it, including a small
swath of creative indentation in nfs4_preprocess_seqid_op().
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
I moved this check into map_new_errors, but forgot to delete the
original. Oops.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
The nfserr_dropit happens routinely on upcalls (so a kmalloc failure is
almost never the actual cause), but I occasionally get a complant from
some tester that's worried because they ran across this message after
turning on debugging to research some unrelated problem.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Due to recent edict to remove or replace printk's that can flood the system
log.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
fsid_source decided where to get the 'fsid' number to
return for a GETATTR based on the type of filehandle.
It can be from the device, from the fsid, or from the
UUID.
It is possible for the filehandle to be inconsistent
with the export information, so make sure the export information
actually has the info implied by the value returned by
fsid_source.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: "Luiz Fernando N. Capitulino" <lcapitulino@gmail.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recent changes in NFSd cause a directory which is mounted-on
to not appear properly when the filesystem containing it is exported.
*exp_get* now returns -ENOENT rather than NULL and when
commit 5d3dbbeaf5
removed the NULL checks, it didn't add a check for -ENOENT.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A succesful downcall with a negative result (which indicates that the given
filesystem is not exported to the given user) should not return an error.
Currently mountd is depending on stdio to write these downcalls. With some
versions of libc this appears to cause subsequent writes to attempt to write
all accumulated data (for which writes previously failed) along with any new
data. This can prevent the kernel from seeing responses to later downcalls.
Symptoms will be that nfsd fails to respond to certain requests.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We shouldn't be using negative uid's and gid's in the idmap upcalls.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
RFC 3530 says:
If the server uses an attribute to store the exclusive create verifier, it
will signify which attribute by setting the appropriate bit in the attribute
mask that is returned in the results.
Linux uses the atime and mtime to store the verifier, but sends a zeroed out
bitmask back to the client. This patch makes sure that we set the correct
bits in the bitmask in this situation.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For display purposes, treat uid's and gid's as unsigned ints for now.
Also fix a typo.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.
This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Share a little common code, reverse the arguments for consistency, drop the
unnecessary "inline", and lowercase the name.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
EX_RDONLY is only called in one place; just put it there.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We can now assume that rqst_exp_get_by_name() does not return NULL; so clean
up some unnecessary checks.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I converted the various export-returning functions to return -ENOENT instead
of NULL, but missed a few cases.
This particular case could cause actual bugs in the case of a krb5 client that
doesn't match any ip-based client and that is trying to access a filesystem
not exported to krb5 clients.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The value of nperbucket calculated here is too small--we should be rounding up
instead of down--with the result that the index j in the following loop can
overflow the raparm_hash array. At least in my case, the next thing in memory
turns out to be export_table, so the symptoms I see are crashes caused by the
appearance of four zeroed-out export entries in the first bucket of the hash
table of exports (which were actually entries in the readahead cache, a
pointer to which had been written to the export table in this initialization
code).
It looks like the bug was probably introduced with commit
fce1456a19 ("knfsd: make the readahead params
cache SMP-friendly").
Cc: <stable@kernel.org>
Cc: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We've been using the convention that vfs_foo is the function that calls
a filesystem-specific foo method if it exists, or falls back on a
generic method if it doesn't; thus vfs_foo is what is called when some
other part of the kernel (normally lockd or nfsd) wants to get a lock,
whereas foo is what filesystems call to use the underlying local
functionality as part of their lock implementation.
So rename setlease to vfs_setlease (which will call a
filesystem-specific setlease after a later patch) and __setlease to
setlease.
Also, vfs_setlease need only be GPL-exported as long as it's only needed
by lockd and nfsd.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Allow root squashing to vary per-pseudoflavor, so that you can (for example)
allow root access only when sufficiently strong security is in use.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Our clients (like other clients, as far as I know) use only auth_sys for nlm,
even when using rpcsec_gss for the main nfs operations.
Administrators that want to deny non-kerberos-authenticated locking requests
will need to turn off NFS protocol versions less than 4....
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We could return some sort of error in the case where someone asks for secinfo
on an export without the secinfo= option set--that'd be no worse than what
we've been doing. But it's not really correct. So, hack up an approximate
secinfo response in that case--it may not be complete, but it'll tell the
client at least one acceptable security flavor.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement the secinfo operation.
(Thanks to Usha Ketineni wrote an earlier version of this support.)
Cc: Usha Ketineni <uketinen@us.ibm.com>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add secinfo information to the display in proc/net/sunrpc/nfsd.export/content.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Factor out some code to be shared by secinfo display code. Remove some
unnecessary conditional printing of commas where we know the condition is
true.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allow readonly access to vary depending on the pseudoflavor, using the flag
passed with each pseudoflavor in the export downcall. The rest of the flags
are ignored for now, though some day we might also allow id squashing to vary
based on the flavor.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make the first actual use of the secinfo information by using it to return
nfserr_wrongsec when an export is found that doesn't allow the flavor used on
this request.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Factor nfsd_lookup into nfsd_lookup_dentry, which finds the right dentry and
export, and a second part which composes the filehandle (and which will later
check the security flavor on the new export).
No change in behavior.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With this patch, we fall back on using the gss/pseudoflavor only if we fail to
find a matching auth_unix export that has a secinfo list.
As long as sec= options aren't used, there's still no change in behavior here
(except possibly for some additional auth_unix cache lookups, whose results
will be ignored).
The sec= option, however, is not actually enforced yet; later patches will add
the necessary checks.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We want it to be possible for users to restrict exports both by IP address and
by pseudoflavor. The pseudoflavor information has previously been passed
using special auth_domains stored in the rq_client field. After the preceding
patch that stored the pseudoflavor in rq_pflavor, that's now superfluous; so
now we use rq_client for the ip information, as auth_null and auth_unix do.
However, we keep around the special auth_domain in the rq_gssclient field for
backwards compatibility purposes, so we can still do upcalls using the old
"gss/pseudoflavor" auth_domain if upcalls using the unix domain to give us an
appropriate export. This allows us to continue supporting old mountd.
In fact, for this first patch, we always use the "gss/pseudoflavor"
auth_domain (and only it) if it is available; thus rq_client is ignored in the
auth_gss case, and this patch on its own makes no change in behavior; that
will be left to later patches.
Note on idmap: I'm almost tempted to just replace the auth_domain in the idmap
upcall by a dummy value--no version of idmapd has ever used it, and it's
unlikely anyone really wants to perform idmapping differently depending on the
where the client is (they may want to perform *credential* mapping
differently, but that's a different matter--the idmapper just handles id's
used in getattr and setattr). But I'm updating the idmapd code anyway, just
out of general backwards-compatibility paranoia.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Split the callers of exp_get_by_name(), exp_find(), and exp_parent() into
those that are processing requests and those that are doing other stuff (like
looking up filehandles for mountd).
No change in behavior, just a (fairly pointless, on its own) cleanup.
(Note this has the effect of making nfsd_cross_mnt() pass rqstp->rq_client
instead of exp->ex_client into exp_find_by_name(). However, the two should
have the same value at this point.)
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The "err" variable will only be used in the final return, which always happens
after either the preceding
err = fh_compose(...);
or after the following
err = nfserrno(host_err);
So the earlier assignment to err is ignored.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We're passing three arguments to exp_pseudoroot, two of which are just fields
of the svc_rqst. Soon we'll want to pass in a third field as well. So let's
just give up and pass in the whole struct svc_rqst.
Also sneak in some minor style cleanups while we're at it.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We add a list of pseudoflavors to each export downcall, which will be used
both as a list of security flavors allowed on that export, and (in the order
given) as the list of pseudoflavors to return on secinfo calls.
This patch parses the new downcall information and adds it to the export
structure, but doesn't use it for anything yet.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently exp_find(), exp_get_by_name(), and friends, return an export on
success, and on failure return:
errors -EAGAIN (drop this request pending an upcall) or
-ETIMEDOUT (an upcall has timed out), or
return NULL, which can mean either that there was a memory allocation
failure, or that an export was not found, or that a passed-in
export lacks an auth_domain.
Many callers seem to assume that NULL means that an export was not found,
which may lead to bugs in the case of a memory allocation failure.
Modify these functions to distinguish between the two NULL cases by returning
either -ENOENT or -ENOMEM. They now never return NULL. We get to simplify
some code in the process.
We return -ENOENT in the case of a missing auth_domain. This case should
probably be removed (or converted to a bug) after confirming that it can never
happen.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One more incremental delegation policy improvement: don't give out a
delegation on a file if conflicting access has previously required that a
delegation be revoked on that file. (In practice we'll forget about the
conflict when the struct nfs4_file is removed on close, so this is of limited
use for now, though it should at least solve a temporary problem with
self-conflicts on write opens from the same client.)
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Our original NFSv4 delegation policy was to give out a read delegation on any
open when it was possible to.
Since the lifetime of a delegation isn't limited to that of an open, a client
may quite reasonably hang on to a delegation as long as it has the inode
cached. This becomes an obvious problem the first time a client's inode cache
approaches the size of the server's total memory.
Our first quick solution was to add a hard-coded limit. This patch makes a
mild incremental improvement by varying that limit according to the server's
total memory size, allowing at most 4 delegations per megabyte of RAM.
My quick back-of-the-envelope calculation finds that in the worst case (where
every delegation is for a different inode), a delegation could take about
1.5K, which would make the worst case usage about 6% of memory. The new limit
works out to be about the same as the old on a 1-gig server.
[akpm@linux-foundation.org: Don't needlessly bloat vmlinux]
[akpm@linux-foundation.org: Make it right for highmem machines]
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It looks like Al Viro gutted this header file five years ago and it hasn't
been touched since.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
nfs4_acl_nfsv4_to_posix() returns an error and returns any posix acls
calculated in two caller-provided pointers. It was setting these pointers to
-errno in some error cases, resulting in nfsd4_set_nfs4_acl() calling
posix_acl_release() with a -errno as an argument.
Fix both the caller and the callee, by modifying nfsd4_set_nfs4_acl() to
stop relying on the passed-in-pointers being left as NULL in the error
case, and by modifying nfs4_acl_nfsv4_to_posix() to stop returning
garbage in those pointers.
Thanks to Alex Soule for reporting the bug.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Cc: Alexander Soule <soule@umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
enc_stateid_sz should be given in u32 words units, not bytes, so we were
overestimating the buffer space needed here.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Silence a compiler warning in the ACL code, and add a comment making clear the
initialization serves no other purpose.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Both lockd and (in the nfsv4 case) nfsd enforce a "grace period" after reboot,
during which clients may reclaim locks from the previous server instance, but
may not acquire new locks.
Currently the lockd and nfsd enforce grace periods of different lengths. This
may cause problems when we reboot a server with both v2/v3 and v4 clients.
For example, if the lockd grace period is shorter (as is likely the case),
then a v3 client might acquire a new lock that conflicts with a lock already
held (but not yet reclaimed) by a v4 client.
This patch calculates a lease time that lockd and nfsd can both use.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
gcc-4.3:
fs/nfsd/nfsctl.c: In function 'write_getfs':
fs/nfsd/nfsctl.c:248: warning: cast from pointer to integer of different size
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently NFSD calls directly into filesystems through the export_operations
structure. I plan to change this interface in various ways in later patches,
and want to avoid the export of the default operations to NFSD, so this patch
adds two simple exportfs_encode_fh/exportfs_decode_fh helpers for NFSD to call
instead of poking into exportfs guts.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
currently the export_operation structure and helpers related to it are in
fs.h. fs.h is already far too large and there are very few places needing the
export bits, so split them off into a separate header.
[akpm@linux-foundation.org: fix cifs build]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves. This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.
It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.
The patch causes all kernel threads to be nonfreezable by default (ie. to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE. It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear. Additionally, it updates documentation to
describe the freezing of tasks more accurately.
[akpm@linux-foundation.org: build fixes]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (122 commits)
sunrpc: drop BKL around wrap and unwrap
NFSv4: Make sure unlock is really an unlock when cancelling a lock
NLM: fix source address of callback to client
SUNRPC client: add interface for binding to a local address
SUNRPC server: record the destination address of a request
SUNRPC: cleanup transport creation argument passing
NFSv4: Make the NFS state model work with the nosharedcache mount option
NFS: Error when mounting the same filesystem with different options
NFS: Add the mount option "nosharecache"
NFS: Add support for mounting NFSv4 file systems with string options
NFS: Add final pieces to support in-kernel mount option parsing
NFS: Introduce generic mount client API
NFS: Add enums and match tables for mount option parsing
NFS: Improve debugging output in NFS in-kernel mount client
NFS: Clean up in-kernel NFS mount
NFS: Remake nfsroot_mount as a permanent part of NFS client
SUNRPC: Add a convenient default for the hostname when calling rpc_create()
SUNRPC: Rename rpcb_getport to be consistent with new rpcb_getport_sync name
SUNRPC: Rename rpcb_getport_external routine
SUNRPC: Allow rpcbind requests to be interrupted by a signal.
...
When nfsd was transitioned to use splice instead of sendfile() for data
transfers, a line setting the page index was lost. Restore it, so that
nfsd is functional when that path is used.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A couple of callers just use a stringified IP address for the rpc client's
hostname. Move the logic for constructing this into rpc_create(), so it can
be shared.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The name 'pin' was badly chosen, it doesn't pin a pipe buffer
in the most commonly used sense in the kernel. So change the
name to 'confirm', after debating this issue with Hugh
Dickins a bit.
A good return from ->confirm() means that the buffer is really
there, and that the contents are good.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
We need to move even more stuff into the header so that folks can use
the splice_to_pipe() implementation instead of open-coding a lot of
pipe knowledge (see relay implementation), so move to our own header
file finally.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.
This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
getting them indirectly
Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
they don't need sched.h
b) sched.h stops being dependency for significant number of files:
on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
after patch it's only 3744 (-8.3%).
Cross-compile tested on
all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
alpha alpha-up
arm
i386 i386-up i386-defconfig i386-allnoconfig
ia64 ia64-up
m68k
mips
parisc parisc-up
powerpc powerpc-up
s390 s390-up
sparc sparc-up
sparc64 sparc64-up
um-x86_64
x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig
as well as my two usual configs.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a lookup request arrives, nfsd uses information provided by userspace
(mountd) to find the right filesystem.
It then assumes that the same filehandle type as the incoming filehandle can
be used to create an outgoing filehandle.
However if mountd is buggy, or maybe just being creative, the filesystem may
not support that filesystem type, and the kernel could oops, particularly if
'ex_uuid' is NULL but a FSID_UUID* filehandle type is used.
So add some proper checking that the fsid version/type from the incoming
filehandle is actually supportable, and ignore that information if it isn't
supportable.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1/ decode_sattr and decode_sattr3 never return NULL, so remove
several checks for that. ditto for xdr_decode_hyper.
2/ replace some open coded XDR_QUADLEN calls with calls to
XDR_QUADLEN
3/ in decode_writeargs, simply an 'if' to use a single
calculation.
.page_len is the length of that part of the packet that did
not fit in the first page (the head).
So the length of the data part is the remainder of the
head, plus page_len.
3/ other minor cleanups.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kbuild directly interprets <modulename>-y as objects to build into a module,
no need to assign it to the old foo-objs variable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to zero various parts of 'exp' before any 'goto out', otherwise when
we go to free the contents... we die.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the kernel calls svc_reserve to downsize the expected size of an RPC
reply, it fails to account for the possibility of a checksum at the end of
the packet. If a client mounts a NFSv2/3 with sec=krb5i/p, and does I/O
then you'll generally see messages similar to this in the server's ring
buffer:
RPC request reserved 164 but used 208
While I was never able to verify it, I suspect that this problem is also
the root cause of some oopses I've seen under these conditions:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=227726
This is probably also a problem for other sec= types and for NFSv4. The
large reserved size for NFSv4 compound packets seems to generally paper
over the problem, however.
This patch adds a wrapper for svc_reserve that accounts for the possibility
of a checksum. It also fixes up the appropriate callers of svc_reserve to
call the wrapper. For now, it just uses a hardcoded value that I
determined via testing. That value may need to be revised upward as things
change, or we may want to eventually add a new auth_op that attempts to
calculate this somehow.
Unfortunately, there doesn't seem to be a good way to reliably determine
the expected checksum length prior to actually calculating it, particularly
with schemes like spkm3.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The NFSv2 and NFSv3 servers do not handle WRITE requests for 0 bytes
correctly. The specifications indicate that the server should accept the
request, but it should mostly turn into a no-op. Currently, the server
will return an XDR decode error, which it should not.
Attached is a patch which addresses this issue. It also adds some boundary
checking to ensure that the request contains as much data as was requested
to be written. It also correctly handles an NFSv3 request which requests
to write more data than the server has stated that it is prepared to
handle. Previously, there was some support which looked like it should
work, but wasn't quite right.
Signed-off-by: Peter Staubach <staubach@redhat.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
nfs4_acl_add_ace() can now be removed.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Neil Brown <neilb@cse.unsw.edu.au>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.
Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linux:
gfs2: nfs lock support for gfs2
lockd: add code to handle deferred lock requests
lockd: always preallocate block in nlmsvc_lock()
lockd: handle test_lock deferrals
lockd: pass cookie in nlmsvc_testlock
lockd: handle fl_grant callbacks
lockd: save lock state on deferral
locks: add fl_grant callback for asynchronous lock return
nfsd4: Convert NFSv4 to new lock interface
locks: add lock cancel command
locks: allow {vfs,posix}_lock_file to return conflicting lock
locks: factor out generic/filesystem switch from setlock code
locks: factor out generic/filesystem switch from test_lock
locks: give posix_test_lock same interface as ->lock
locks: make ->lock release private data before returning in GETLK case
locks: create posix-to-flock helper functions
locks: trivial removal of unnecessary parentheses
Convert NFSv4 to the new lock interface. We don't define any callback for now,
so we're not taking advantage of the asynchronous feature--that's less critical
for the multi-threaded nfsd then it is for the single-threaded lockd. But this
does allow a cluster filesystems to export cluster-coherent locking to NFS.
Note that it's cluster filesystems that are the issue--of the filesystems that
define lock methods (nfs, cifs, etc.), most are not exportable by nfsd.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The nfsv4 protocol's lock operation, in the case of a conflict, returns
information about the conflicting lock.
It's unclear how clients can use this, so for now we're not going so far as to
add a filesystem method that can return a conflicting lock, but we may as well
return something in the local case when it's easy to.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
posix_test_lock() and ->lock() do the same job but have gratuitously
different interfaces. Modify posix_test_lock() so the two agree,
simplifying some code in the process.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
The RPC buffer size estimation logic in net/sunrpc/clnt.c always
significantly overestimates the requirements for the buffer size.
A little instrumentation demonstrated that in fact rpc_malloc was never
allocating the buffer from the mempool, but almost always called kmalloc.
To compute the size of the RPC buffer more precisely, split p_bufsiz into
two fields; one for the argument size, and one for the result size.
Then, compute the sum of the exact call and reply header sizes, and split
the RPC buffer precisely between the two. That should keep almost all RPC
buffers within the 2KiB buffer mempool limit.
And, we can finally be rid of RPC_SLACK_SPACE!
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This cancel_delayed_work call is called from a function that is only called
from a piece of code that immediate follows a cancel and destruction of the
workqueue, so it's clearly a mistake.
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The reused clientid here is a more of a problem for the client than the
server, and the client can report the problem itself if it's serious.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A regression introduced in the last set of acl patches removed the
INHERIT_ONLY flag from aces derived from the posix acl. Fix.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
->readdir passes lofft_t offsets (used as nfs cookies) to
nfs3svc_encode_entry{,_plus}, but when they pass it on to encode_entry it
becomes an 'off_t', which isn't good.
So filesystems that returned 64bit offsets would lose.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
not needed and actually breaks build on frv, while we are at it
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Due to type confusion, when an nfsacl verison 2 'ACCESS' request
finishes and tries to clean up, it calls fh_put on entiredly the
wrong thing and this can cause an oops.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When setting an ACL that lacks inheritable ACEs on a directory, we should set
a default ACL of zero length, not a default ACL with all bits denied.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We're inserting deny's between some ACEs in order to enforce posix draft acl
semantics which prevent permissions from accumulating across entries in an
acl.
That's fine, but we're doing that by inserting a deny after *every* allow,
which is overkill. We shouldn't be adding them in places where they actually
make no difference.
Also replaced some helper functions for creating acl entries; I prefer just
assigning directly to the struct fields--it takes a few more lines, but the
field names provide some documentation that I think makes the result easier
understand.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Return just the effective permissions, and forget about the mask. It isn't
worth the complexity.
WARNING: This breaks backwards compatibility with overly-picky nfsv4->posix
acl translation, as may has been included in some patched versions of libacl.
To our knowledge no such version was every distributed by anyone outside citi.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We should be returning ATTRNOTSUPP, not NOTSUPP, when acls are unsupported.
Also fix a comment.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The wrong pointer is being kfree'd in savemem() when defer_free returns with
an error.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Simplify the memory management and code a bit by representing acls with an
array instead of a linked list.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The code that splits an incoming nfsv4 ACL into inheritable and effective
parts can be combined with the the code that translates each to a posix acl,
resulting in simpler code that requires one less pass through the ACL.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The rfc allows us to be more permissive about the ACL inheritance bits we
accept:
"If the server supports a single "inherit ACE" flag that applies to
both files and directories, the server may reject the request
(i.e., requiring the client to set both the file and directory
inheritance flags). The server may also accept the request and
silently turn on the ACE4_DIRECTORY_INHERIT_ACE flag."
Let's take the latter option--the ACL is a complex attribute that could be
rejected for a wide variety of reasons, and the protocol gives us little
ability to explain the reason for the rejection, so erroring out is a
user-unfriendly last resort.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The server name is expected to be a null-terminated string, so we can't pass
in the raw client identifier.
What's more, the client identifier is just a binary, not necessarily
printable, blob. Let's just use the ip address instead. The server name
appears to exist just to help debugging by making some printk's more
informative.
Note that the string is copies into the rpc client structure, so the pointer
to the local variable does not outlive the function call.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there. Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.
To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.
Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm. I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add support for using a filesystem UUID to identify and export point in the
filehandle.
For NFSv2, this UUID is xor-ed down to 4 or 8 bytes so that it doesn't take up
too much room. For NFSv3+, we use the full 16 bytes, and possibly also a
64bit inode number for exports beneath the root of a filesystem.
When generating an fsid to return in 'stat' information, use the UUID (hashed
down to size) if it is available and a small 'fsid' was not specifically
provided.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we are using the same version/fsid as a current filehandle, then there is
no need to verify the the numbers are valid for this export, and they must be
(we used them to find this export).
This allows us to simplify the fsid selection code.
Also change "ref_fh_version" and "ref_fh_fsid_type" to "version" and
"fsid_type", as the important thing isn't that they are the version/type of
the reference filehandle, but they are the chosen type for the new filehandle.
And tidy up some indenting.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Most files in the 'nfsd' filesystem are transactional. When you write, a
reply is generated that can be read back only on the same 'file'.
If the reply has zero length, the 'write' will incorrectly return a value of
'0' instead of the length that was written. This causes 'rpc.nfsd' to give an
annoying warning.
This patch fixes the test.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Expand the rq_addr field to allow it to contain larger addresses.
Specifically, we replace a 'sockaddr_in' with a 'sockaddr_storage', then
everywhere the 'sockaddr_in' was referenced, we use instead an accessor
function (svc_addr_in) which safely casts the _storage to _in.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are loads of places where the RPC server assumes that the rq_addr fields
contains an IPv4 address. Top among these are error and debugging messages
that display the server's IP address.
Let's refactor the address printing into a separate function that's smart
enough to figure out the difference between IPv4 and IPv6 addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sometimes we need to create an RPC service but not register it with the local
portmapper. NFSv4 delegation callback, for example.
Change the svc_makesock() API to allow optionally creating temporary or
permanent sockets, optionally registering with the local portmapper, and make
it return the ephemeral port of the new socket.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Also remove {NFSD,RPC}_PARANOIA as having the defines doesn't really add
anything.
The printks covered by RPC_PARANOIA were triggered by badly formatted
packets and so should be ratelimited.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
nfsd defines a type 'encode_dent_fn' which is much like 'filldir_t' except
that the first pointer is 'struct readdir_cd *' rather than 'void *'. It
then casts encode_dent_fn points to 'filldir_t' as needed. This hides any
other type mismatches between the two such as the fact that the 'ino' arg
recently changed from ino_t to u64.
So: get rid of 'encode_dent_fn', get rid of the cast of the function type,
change the first arg of various functions from 'struct readdir_cd *' to
'void *', and live with the fact that we have a little less type checking
on the calling of these functions now. Less internal (to nfsd) checking
offset by more external checking, which is more important.
Thanks to Gabriel Paubert <paubert@iram.es> for discovering this and
providing an initial patch.
Signed-off-by: Gabriel Paubert <paubert@iram.es>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NFS V3 (and V4) support exclusive create by passing a 'cookie' which can get
stored with the file. If the file exists but has exactly the right cookie
stored, then we assume this is a retransmit and the exclusive create was
successful.
The cookie is 64bits and is traditionally stored in the mtime and atime
fields. This causes a problem with Solaris7 as negative mtime or atime
confuse it. So we moved two bits into the mode word instead.
But inherited ACLs sometimes overwrite the mode word on create, so this is a
problem.
So we give up and just store 62 of the 64 bits and assume that is close
enough.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NFSd assumes that largest number of pages that will be needed for a
request+response is 2+N where N pages is the size of the largest permitted
read/write request. The '2' are 1 for the non-data part of the request, and 1
for the non-data part of the reply.
However, when a read request is not page-aligned, and we choose to use
->sendfile to send it directly from the page cache, we may need N+1 pages to
hold the whole reply. This can overflow and array and cause an Oops.
This patch increases size of the array for holding pages by one and makes sure
that entry is NULL when it is not in use.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Due to silly typos, if the nfs versions are explicitly set, no NFSACL versions
get enabled.
Also improve an error message that would have made this bug a little easier to
find.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The nfsservctl system call isn't used but recent nfs-utils releases for
exporting filesystems, and consequently the code that is uses - exp_export -
has suffered some bitrot.
Particular:
- some newly added fields in 'struct svc_export' are being initialised
properly.
- the return value is now always -ENOMEM ...
This patch fixes both these problems.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Kill another big "if" clause.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I'm not too fond of these big if conditions. Replace them by checks of a flag
in the operation descriptor. To my eye this makes the code a bit more
self-documenting, and makes the complicated part of the code (proc_compound) a
little more compact.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Define an op descriptor struct, use it to simplify nfsd4_proc_compound().
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make wrappers for verify and nverify, for consistency with other ops.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The inlining contributes to bloating the stack of nfsd4_compound, and I want
to change the compound op functions to function pointers anyway.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Tuck away the replay_owner in the cstate while we're at it.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
OK, this is embarassing--I've even looked back at the history, and cannot for
the life of me figure out why I added this check.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pass the saved and current filehandles together into all the nfsd4 compound
operations.
I want a unified interface to these operations so we can just call them by
pointer and throw out the huge switch statement.
Also I'll eventually want a structure like this--that holds the state used
during compound processing--for deferral.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
To avoid tying up server threads when nfsd makes an upcall (to mountd, to get
export options, to idmapd, for nfsv4 name<->id mapping, etc.), we temporarily
"drop" the request and save enough information so that we can revisit it
later.
Certain failures during the deferral process can cause us to really drop the
request and never revisit it.
This is often less than ideal, and is unacceptable in the NFSv4 case--rfc 3530
forbids the server from dropping a request without also closing the
connection.
As a first step, we modify the deferral code to return -ETIMEDOUT (which is
translated to nfserr_jukebox in the v3 and v4 cases, and remains a drop in the
v2 case).
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch on its own causes no change in behavior, since nfsd_cross_mnt()
only returns -EAGAIN; but in the future I'd like it to also be able to return
-ETIMEDOUT, so we may as well handle any possible error here.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Note there's no need for special handling of -EAGAIN here; nfserrno() does
what we want already. So this is a pure cleanup with no change in
functionality.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since exp_parent can fail by returning an error (-EAGAIN) in addition to by
returning NULL, we should check for that case in exp_rootfh.
(TODO: we should check that userland handles these errors too.)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A comment here incorrectly states that "slack_space" is measured in words, not
bytes. Remove the comment, and adjust a variable name and a few comments to
clarify the situation.
This is pure cleanup; there should be no change in functionality.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This dprintk is printing the wrong error now, but it's probably an unnecessary
dprintk anyway; just remove it.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Annotated, all places switched to keeping status net-endian.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace kmalloc+memset with kcalloc and simplify
Signed-off-by: Yan Burman <burman.yan@gmail.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NFS3: Calculate 'w' a bit later in nfs3svc_encode_getaclres()
This is a small performance optimization since we can return before
needing 'w'. It also saves a few bytes of .text :
Before:
text data bss dec hex filename
1632 140 0 1772 6ec fs/nfsd/nfs3acl.o
After:
text data bss dec hex filename
1624 140 0 1764 6e4 fs/nfsd/nfs3acl.o
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NFS2: Calculate 'w' a bit later in nfsaclsvc_encode_getaclres()
This is a small performance optimization since we can return before
needing 'w'. It also saves a few bytes of .text :
Before:
text data bss dec hex filename
2406 212 0 2618 a3a fs/nfsd/nfs2acl.o
After:
text data bss dec hex filename
2400 212 0 2612 a34 fs/nfsd/nfs2acl.o
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in the nfs
server code.
Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes the unused NFSD_OPTIMIZE_SPACE.
Additionally, it does differently what NFSD_OPTIMIZE_SPACE was supposed to do:
Nowadays, gcc knows best when to inline code, and CONFIG_CC_OPTIMIZE_FOR_SIZE
even tells gcc globally whether to optimize for size or for speed. Therefore,
this patch also removes all inline's from these files.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace all uses of kmem_cache_t with struct kmem_cache.
The patch was generated using the following script:
#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#
set -e
for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done
The script was run like this
sh replace kmem_cache_t "struct kmem_cache"
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Commit 6264d69d7d modified the nfsd_create()
error handling in such a way that nfsd_create will usually return
nfserr_perm even when succesful, if the export has the async export option.
This introduced a regression that could cause mkdir() to always return a
permissions error, even though the directory in question was actually
succesfully created.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the case where an open creates the file, we shouldn't be rechecking
permissions to open the file; the open succeeds regardless of what the new
file's mode bits say.
This patch fixes the problem, but only by introducing yet another parameter
to nfsd_create_v3. This is ugly. This will be fixed by later patches.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Minor rearrangement, cleanup of do_open_lookup(). No change in behavior.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When I was performing some operations on NFS, I got below error on server
side.
=============================================
[ INFO: possible recursive locking detected ]
2.6.19-prep #1
---------------------------------------------
nfsd4/3525 is trying to acquire lock:
(&inode->i_mutex){--..}, at: [<c0611e5a>] mutex_lock+0x21/0x24
but task is already holding lock:
(&inode->i_mutex){--..}, at: [<c0611e5a>] mutex_lock+0x21/0x24
other info that might help us debug this:
2 locks held by nfsd4/3525:
#0: (client_mutex){--..}, at: [<c0611e5a>] mutex_lock+0x21/0x24
#1: (&inode->i_mutex){--..}, at: [<c0611e5a>] mutex_lock+0x21/0x24
stack backtrace:
[<c04051ed>] show_trace_log_lvl+0x58/0x16a
[<c04057fa>] show_trace+0xd/0x10
[<c0405913>] dump_stack+0x19/0x1b
[<c043b6f1>] __lock_acquire+0x778/0x99c
[<c043be86>] lock_acquire+0x4b/0x6d
[<c0611ceb>] __mutex_lock_slowpath+0xbc/0x20a
[<c0611e5a>] mutex_lock+0x21/0x24
[<c047fd7e>] vfs_rmdir+0x76/0xf8
[<f94b7ce9>] nfsd4_clear_clid_dir+0x2c/0x41 [nfsd]
[<f94b7de9>] nfsd4_remove_clid_dir+0xb1/0xe8 [nfsd]
[<f94b307b>] laundromat_main+0x9b/0x1c3 [nfsd]
[<c04333d6>] run_workqueue+0x7a/0xbb
[<c0433d0b>] worker_thread+0xd2/0x107
[<c0436285>] kthread+0xc3/0xf2
[<c0402005>] kernel_thread_helper+0x5/0xb
===================================================================
Cause for this problem was,2 successive mutex_lock calls on 2 diffrent inodes ,as shown below
static int
nfsd4_clear_clid_dir(struct dentry *dir, struct dentry *dentry)
{
int status;
/* For now this directory should already be empty, but we empty it of
* any regular files anyway, just in case the directory was created by
* a kernel from the future.... */
nfsd4_list_rec_dir(dentry, nfsd4_remove_clid_file);
mutex_lock(&dir->d_inode->i_mutex);
status = vfs_rmdir(dir->d_inode, dentry);
...
int vfs_rmdir(struct inode *dir, struct dentry *dentry)
{
int error = may_delete(dir, dentry, 1);
if (error)
return error;
if (!dir->i_op || !dir->i_op->rmdir)
return -EPERM;
DQUOT_INIT(dir);
mutex_lock(&dentry->d_inode->i_mutex);
...
So I have developed the patch to overcome this problem.
Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We are using NFS_REPLAY_ME as a special error value that is never leaked to
clients. That works fine; the only problem is mixing host- and network-
endian values in the same objects. Network-endian equivalent would work just
as fine; switch to it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
don't use the same variable to store NFS and host error values
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
don't use the same variable to store NFS and host error values
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
svc_procfunc instances return __be32, not int
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
a) ERR_PTR(nfserr_something) is a bad idea;
IS_ERR() will be false for it.
b) mixing nfserr_.... with -EOPNOTSUPP is
even worse idea.
nfsd4_path() does both; caller expects to get NFS protocol error out it if
anything goes wrong, but if it does we either do not notice (see (a)) or get
host-endian negative (see (b)).
IOW, that's a case when we can't use ERR_PTR() to return error, even though we
return a pointer in case of success.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It is possible for the ->fopen callback from lockd into nfsd to find that an
answer cannot be given straight away (an upcall is needed) and so the request
has to be 'dropped', to be retried later. That error status is not currently
propagated back.
So:
Change nlm_fopen to return nlm error codes (rather than a private
protocol) and define a new nlm_drop_reply code.
Cause nlm_drop_reply to cause the rpc request to get rpc_drop_reply
when this error comes back.
Cause svc_process to drop a request which returns a status of
rpc_drop_reply.
[akpm@osdl.org: fix warning storm]
Cc: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Coverity noticed that the error handling code in the NFSv4 callback client
sets cb->cb_client to NULL, then calls rpc_shutdown_client with the NULL
pointer.
Coverity: #cid 1397
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We weren't actually checking for SHARE_ACCESS_WRITE, with the result that the
owner could open a non-writeable file for write!
Continue to allow DENY_WRITE only with write access.
Thanks to Jim Rees for reporting the bug.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If a client creates a file using an open which sets the mode to 000, or if a
chmod changes permissions after a file is opened, then situations may arise
where an NFS client knows that some IO is permitted (because a process holds
the file open), but the NFS server does not (because it doesn't know about the
open, and only sees that the IO conflicts with the current mode of the file).
As a hack to solve this problem, NFS servers normally allow the owner to
override permissions on IO. The client can still enforce correct
permissions-checking on open by performing an explicit access check.
In NFSv4 the client can rely on the explicit on-the-wire open instead of an
access check.
Therefore we should not be allowing the owner to override permissions on an
over-the-wire open!
However, we should still allow the owner to override permissions in the case
where the client is claiming an open that it already made either before a
reboot, or while it was holding a delegation.
Thanks to Jim Rees for reporting the bug.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is some confusion about the meaning of 'bufsz' for a sunrpc server.
In some cases it is the largest message that can be sent or received. In
other cases it is the largest 'payload' that can be included in a NFS
message.
In either case, it is not possible for both the request and the reply to be
this large. One of the request or reply may only be one page long, which
fits nicely with NFS.
So we remove 'bufsz' and replace it with two numbers: 'max_payload' and
'max_mesg'. Max_payload is the size that the server requests. It is used
by the server to check the max size allowed on a particular connection:
depending on the protocol a lower limit might be used.
max_mesg is the largest single message that can be sent or received. It is
calculated as the max_payload, rounded up to a multiple of PAGE_SIZE, and
with PAGE_SIZE added to overhead. Only one of the request and reply may be
this size. The other must be at most one page.
Cc: Greg Banks <gnb@sgi.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use all the pieces set up so far to implement referral support, allowing
return of NFS4ERR_MOVED and fs_locations attribute.
Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Encode fs_locations attribute.
Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Define FS locations structures, some functions to manipulate them, and add
code to parse FS locations in downcall and add to the exports structure.
[bfields@fieldses.org: bunch of fixes and cleanups]
Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Store the export path in the svc_export structure instead of storing only the
dentry. This will prevent the need for additional d_path calls to provide
NFSv4 fs_locations support.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
totalram is measured in pages, not bytes, so PAGE_SHIFT must be used when
trying to find 1/4096 of RAM.
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It is legal to have zero-length NFSv4 acls; they just deny everything.
Also, nfs4_acl_nfsv4_to_posix will always return with pacl and dpacl set on
success, so the caller doesn't need to check this.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's no need to handle the case where the caller passes in null for pacl or
dpacl; no caller does that, because it would be a dumb thing to do.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We can be a little more flexible about the flags allowed for inheritance (in
particular, we can deal with either the presence or the absence of
INHERIT_ONLY), but we should probably reject other combinations that we don't
understand.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use a different nfsv4->(draft posix) acl mapping which is
1. completely backwards compatible,
2. accepts any nfsv4 acl, and
3. errs on the side of restricting permissions.
In detail:
1. completely backwards compatible: The new mapping produces the
same result on any acl produced by the existing (draft
posix)->nfsv4 mapping; the one exception is that we no longer
attempt to guess the value of the mask by assuming certain denies
represent the mask. Since the server still keeps track of the mask
locally, sequences of chmod's will still be handled fine; the only
thing this will change is sequences of chmod's with intervening
read-modify-writes of the acl. That last case just isn't worth the
trouble and the possible misrepresentations of the user's intent
(if we guess that a certain deny indicates masking is in effect
when it really isn't).
2. accepts any nfsv4 acl: That's not quite true: we still reject
acls that use combinations of inheritance flags that we don't
support. We also reject acls that attempt to explicitly deny
read_acl or read_attributes permissions, or that attempt to deny
write_acl or write_attributes permissions to the owner of the file.
3. errs on the side of restricting permissions: one exception to
this last rule: we totally ignore some bits (write_owner,
synchronize, read_named_attributes, etc.) that are completely alien
to our filesystem semantics, in some cases even if that would mean
ignoring an explicit deny that we have no intention of enforcing.
Excepting that, the posix acl produced should be the most
permissive acl that is not more permissive than the given nfsv4
acl.
And the new code's shorter, too. Neato.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The previous patch enables some minor simplification here.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We could be using more common code in exp_pseudoroot(). This will also
simplify some changes we need to make later.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The NFSACL patches introduced support for multiple RPC services listening on
the same transport. However, only the first of these services was registered
with portmapper. This was perfectly fine for nfsacl, as you traditionally do
not want these to show up in a portmapper listing.
The patch below changes the default behavior to always register all services
listening on a given transport, but retains the old behavior for nfsacl
services.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make the nfsd read-ahead params cache more SMP-friendly by changing the single
global list and lock into a fixed 16-bucket hashtable with per-bucket locks.
This reduces spinlock contention in nfsd_read() on read-heavy workloads on
multiprocessor servers.
Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients each doing 1K
streaming reads at full line rate. The server had 128 nfsd threads, which
sizes the RA cache at 256 entries, of which only a handful were used. Flat
profiling shows nfsd_read(), including the inlined nfsd_get_raparms(), taking
10.4% of each CPU. This patch drops the contribution from nfsd() to 1.71% for
each CPU.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The max possible is the maximum RPC payload. The default depends on amount of
total memory.
The value can be set within reason as long as no nfsd threads are currently
running. The value can also be ready, allowing the default to be determined
after nfsd has started.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The limit over UDP remains at 32K. Also, make some of the apparently
arbitrary sizing constants clearer.
The biggest change here involves replacing NFSSVC_MAXBLKSIZE by a function of
the rqstp. This allows it to be different for different protocols (udp/tcp)
and also allows it to depend on the servers declared sv_bufsiz.
Note that we don't actually increase sv_bufsz for nfs yet. That comes next.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>