While we may want to skip attempting to connect to a downed mirror
when we're deciding which mirror to select for a read, we do not
want to do so once we've committed to attempting the I/O in
ff_layout_read/write_pagelist(), or ff_layout_initiate_commit()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When a read to the preferred mirror returns an error, the flexfiles
driver records the error in the inode list and currently marks the
layout for return before failing over the attempted read to the next
mirror.
What we actually want to do is fire off a LAYOUTERROR to notify the
MDS that there is an issue with the preferred mirror, then we fail
over. Only once we've failed to read from all mirrors should we
return the layout.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If a layout segment gets invalidated while a pNFS I/O operation
is queued for transmission, then we ideally want to abort
immediately. This is particularly the case when there is a large
number of I/O related RPCs queued in the RPC layer, and the layout
segment gets invalidated due to an ENOSPC error, or an EACCES (because
the client was fenced). We may end up forced to spam the MDS with a
lot of otherwise unnecessary LAYOUTERRORs after that I/O fails.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the attempt to instantiate the mirror's layout DS pointer failed,
then that pointer may hold a value of type ERR_PTR(), so we need
to check that before we dereference it.
Fixes: 65990d1afb ("pNFS/flexfiles: Fix a deadlock on LAYOUTGET")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Fix up some compiler warnings about function parameters, etc not being
correctly described or formatted.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Having access to the controlling rpc_rqst means a trace point in the
XDR code can report:
- the XID
- the task ID and client ID
- the p_name of RPC being processed
Subsequent patches will introduce such trace points.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
SUNRPC has two sorts of credentials, both of which appear as
"struct rpc_cred".
There are "generic credentials" which are supplied by clients
such as NFS and passed in 'struct rpc_message' to indicate
which user should be used to authorize the request, and there
are low-level credentials such as AUTH_NULL, AUTH_UNIX, AUTH_GSS
which describe the credential to be sent over the wires.
This patch replaces all the generic credentials by 'struct cred'
pointers - the credential structure used throughout Linux.
For machine credentials, there is a special 'struct cred *' pointer
which is statically allocated and recognized where needed as
having a special meaning. A look-up of a low-level cred will
map this to a machine credential.
Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
We can use cred->groupinfo (from the 'struct cred') instead.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The SUNRPC credential framework was put together before
Linux has 'struct cred'. Now that we have it, it makes sense to
use it.
This first step just includes a suitable 'struct cred *' pointer
in every 'struct auth_cred' and almost every 'struct rpc_cred'.
The rpc_cred used for auth_null has a NULL 'struct cred *' as nothing
else really makes sense.
For rpc_cred, the pointer is reference counted.
For auth_cred it isn't. struct auth_cred are either allocated on
the stack, in which case the thread owns a reference to the auth,
or are part of 'struct generic_cred' in which case gc_base owns the
reference, and "acred" shares it.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Since commit bb21ce0ad2 we always enforce per-mirror stateid.
However, this makes sense only for v4+ servers.
Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
rfc8435 says:
For tight coupling, ffds_stateid provides the stateid to be used by
the client to access the file.
However current implementation replaces per-mirror provided stateid with
by open or lock stateid.
Ensure that per-mirror stateid is used by ff_layout_write_prepare_v4 and
nfs4_ff_layout_prepare_ds.
Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
for tightly coupled DSes client must ignore provided synthetic uid and
gid as stated in draft-ietf-nfsv4-flex-files-19#section-5.1.
Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
For the 'files' and 'flexfiles' layout types, we do not expect the reply
to be any larger than 4k. The block and scsi layout types are a little more
greedy, so we keep allocating the maximum response size for now.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If we get an error while retrieving the layout, then we should
report it rather than falling back to I/O through the MDS.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If a layout segment is carrying layoutstats or layout error information,
then we always want to return it rather than using a forgetful model.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Hightlights include:
Bugfixes:
- Fix an rcu deadlock in nfs_delegation_find_inode()
- Fix NFSv4 deadlocks due to not freeing the session slot in layoutget
- Don't send layoutreturn if the layout is already invalid
- Prevent duplicate XID allocation
- flexfiles: Don't tie up all the rpciod threads in resends
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbK9avAAoJEA4mA3inWBJclq0P/1VCigyDlsbtdby3z2leV84k
l0asrGjOQndljJ7I21awAgEo8KvXOd66cMv6YT3+UqEW18aNblH4/ngjyId6hPVb
RZDX7tsG16ZHEqfe9f9irNZo90mdvuSC4ChJ/CbbPesaK9pblE1d76b/qUVr4FUX
Gj7JPAC5ckoiZXPFfRWfc+o7JnGvs5wkEuDTy+ig6v7BRdL64hdPG3veRNpmLIAZ
uS/NCyRpO+nFN/ukmvuoI2ZQ3qfHubHBD+rHxr1UKT/ad7dywLmL2UBaYQ0Tl3bq
/iSQHutgJYj/80VaRTqdlLt/m4ebUZg+9BEZgM5MvqBWkXcpXND51zxExVJN4cGW
BOytqjLz0gP1OGb8w+Oow58K8l4XyEgHe2CtZ6Yz8Vwof7nchkpv7RSX50hJFIcA
YlikeDyDzfOmTT6ove5kF31WQSa3Bk6OMEei0of6hWU3UVHyEdr9az73pm/CLSHE
/R7w0osU3B9tmQD4btQeJ2DxP+syQwhelOYodyVTwOlkmmGg7DSV7fehnGyH8t8f
I4Yp8f0raiYGbwonYVE2+zDO140VRETEfTE4XQZnn41fZUfB74oIqk77JtgvGMk2
/+XFNCYBGadHdSBdxyJmhSjhoAWrhgChEIz1G12SiHrNvqIRY/uHhdCX1Ut5vlPf
5aqyn/yXm6rUH7aNh/Gd
=tz0M
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"Hightlights include:
- fix an rcu deadlock in nfs_delegation_find_inode()
- fix NFSv4 deadlocks due to not freeing the session slot in
layoutget
- don't send layoutreturn if the layout is already invalid
- prevent duplicate XID allocation
- flexfiles: Don't tie up all the rpciod threads in resends"
* tag 'nfs-for-4.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
pNFS/flexfiles: Process writeback resends from nfsiod context as well
pNFS/flexfiles: Don't tie up all the rpciod threads in resends
sunrpc: Prevent duplicate XID allocation
pNFS: Don't send layoutreturn if the layout is already invalid
pNFS: Always free the session slot on error in nfs4_layoutget_handle_exception
NFS: Fix an rcu deadlock in nfs_delegation_find_inode()
Although the writeback resends are more robust than the reads, since they
are not immediately rescheduled by the same thread, we are better off
processing them in the same place as the reads.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
We do not want to have rpciod threads perform recursive calls into the
RPC layer since that can deadlock. In particular, having to wait for
a layoutget can be nasty... We want rather to defer scheduling those
retries until we're in the rpc_release() callback, since that is
called from the nfsiod workqueue.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
- Additional struct_size() conversions (Matthew, Kees)
- Explicitly reported overflow fixes (Silvio, Kees)
- Add missing kvcalloc() function (Kees)
- Treewide conversions of allocators to use either 2-factor argument
variant when available, or array_size() and array3_size() as needed (Kees)
-----BEGIN PGP SIGNATURE-----
Comment: Kees Cook <kees@outflux.net>
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlsgVtMWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJhsJEACLYe2EbwLFJz7emOT1KUGK5R1b
oVxJog0893WyMqgk9XBlA2lvTBRBYzR3tzsadfYo87L3VOBzazUv0YZaweJb65sF
bAvxW3nY06brhKKwTRed1PrMa1iG9R63WISnNAuZAq7+79mN6YgW4G6YSAEF9lW7
oPJoPw93YxcI8JcG+dA8BC9w7pJFKooZH4gvLUSUNl5XKr8Ru5YnWcV8F+8M4vZI
EJtXFmdlmxAledUPxTSCIojO8m/tNOjYTreBJt9K1DXKY6UcgAdhk75TRLEsp38P
fPvMigYQpBDnYz2pi9ourTgvZLkffK1OBZ46PPt8BgUZVf70D6CBg10vK47KO6N2
zreloxkMTrz5XohyjfNjYFRkyyuwV2sSVrRJqF4dpyJ4NJQRjvyywxIP4Myifwlb
ONipCM1EjvQjaEUbdcqKgvlooMdhcyxfshqJWjHzXB6BL22uPzq5jHXXugz8/ol8
tOSM2FuJ2sBLQso+szhisxtMd11PihzIZK9BfxEG3du+/hlI+2XgN7hnmlXuA2k3
BUW6BSDhab41HNd6pp50bDJnL0uKPWyFC6hqSNZw+GOIb46jfFcQqnCB3VZGCwj3
LH53Be1XlUrttc/NrtkvVhm4bdxtfsp4F7nsPFNDuHvYNkalAVoC3An0BzOibtkh
AtfvEeaPHaOyD8/h2Q==
=zUUp
-----END PGP SIGNATURE-----
Merge tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull more overflow updates from Kees Cook:
"The rest of the overflow changes for v4.18-rc1.
This includes the explicit overflow fixes from Silvio, further
struct_size() conversions from Matthew, and a bug fix from Dan.
But the bulk of it is the treewide conversions to use either the
2-factor argument allocators (e.g. kmalloc(a * b, ...) into
kmalloc_array(a, b, ...) or the array_size() macros (e.g. vmalloc(a *
b) into vmalloc(array_size(a, b)).
Coccinelle was fighting me on several fronts, so I've done a bunch of
manual whitespace updates in the patches as well.
Summary:
- Error path bug fix for overflow tests (Dan)
- Additional struct_size() conversions (Matthew, Kees)
- Explicitly reported overflow fixes (Silvio, Kees)
- Add missing kvcalloc() function (Kees)
- Treewide conversions of allocators to use either 2-factor argument
variant when available, or array_size() and array3_size() as needed
(Kees)"
* tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits)
treewide: Use array_size in f2fs_kvzalloc()
treewide: Use array_size() in f2fs_kzalloc()
treewide: Use array_size() in f2fs_kmalloc()
treewide: Use array_size() in sock_kmalloc()
treewide: Use array_size() in kvzalloc_node()
treewide: Use array_size() in vzalloc_node()
treewide: Use array_size() in vzalloc()
treewide: Use array_size() in vmalloc()
treewide: devm_kzalloc() -> devm_kcalloc()
treewide: devm_kmalloc() -> devm_kmalloc_array()
treewide: kvzalloc() -> kvcalloc()
treewide: kvmalloc() -> kvmalloc_array()
treewide: kzalloc_node() -> kcalloc_node()
treewide: kzalloc() -> kcalloc()
treewide: kmalloc() -> kmalloc_array()
mm: Introduce kvcalloc()
video: uvesafb: Fix integer overflow in allocation
UBIFS: Fix potential integer overflow in allocation
leds: Use struct_size() in allocation
Convert intel uncore to struct_size
...
Driver can set flag to allow LAYOUTGET to be sent with OPEN.
Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Stable bugfixes:
- Revalidate "." and ".." correctly on open
- Avoid RCU usage in tracepoints
- Fix ugly referral attributes
- Fix a typo in nomigration mount option
- Revert "NFS: Move the flock open mode check into nfs_flock()"
Features:
- Implement a stronger send queue accounting system for NFS over RDMA
- Switch some atomics to the new refcount_t type
Other bugfixes and cleanups:
- Clean up access mode bits
- Remove special-case revalidations in nfs_opendir()
- Improve invalidating NFS over RDMA memory for async operations that time out
- Handle NFS over RDMA replies with a worqueue
- Handle NFS over RDMA sends with a workqueue
- Fix up replaying interrupted requests
- Remove dead NFS over RDMA definitions
- Update NFS over RDMA copyright information
- Be more consistent with bool initialization and comparisons
- Mark expected switch fall throughs
- Various sunrpc tracepoint cleanups
- Fix various OPEN races
- Fix a typo in nfs_rename()
- Use common error handling code in nfs_lock_and_join_request()
- Check that some structures are properly cleaned up during net_exit()
- Remove net pointer from dprintk()s
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAloPWGwACgkQ18tUv7Cl
QOtMVhAAufCkDxqO2lmDH+0JyYUKMcoOMYtI8s2J1HrbEzTW/dVtI28fPAKEEd4m
2JjNqnO516Jiv+g3E6eO4uunZRb4IB3AYT6YaTwmBFE+l7tpMdPb1xybOBP02Hji
Y29kzLXwxxvnoxEqFalzCzV2BeRb2kAw6mayY9FxH6AfiEEQZfmxLCYgVuYa2jTC
Z/B5E0GxAf28Aj0bIP8lLKbOkFijo851DB88UffEOZQGKUDlAd3GNUSSHb81Rj0N
4ef7bKoGylkIpZ1PdTChdG1+RKqud02zrmQfmEwXui3eUwhOWy8hrKloNykqR5sj
pgoDz79euAq4TDVyQKtutnbvVxfCcBeMYAXZhXkZLVcl+39in0kuLj4SxU5AmDhf
ErnthG4W7jsLMM96kMvSTaoh4uwioviG1KmZfvuvUoMBSwtiX18hFTWtFKRD6x9e
PNOqBdh8nkKYEFbEO4ksfYaWZJ5AuyFIQiIpj1gm+7sf039oN/zEuPV+jaEJG0oa
Ef9IqHrQbbCUFYFjpBENr3HjU3igTTaxQ5iq+VYl4zg1pw6m6JTojqZ6qtQzqOYS
O3N1ygeShsW934z8QcWjtEyeUXIB3JF9vUS3gEBgWPDyCltGXyq4Cq6Lod4s4JCb
pWGI6wJLX1Fg6nq7cj0S4Or3QBgz2q8ZyBxssamhdvON/Ef5ccI=
=2Zc1
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"Stable bugfixes:
- Revalidate "." and ".." correctly on open
- Avoid RCU usage in tracepoints
- Fix ugly referral attributes
- Fix a typo in nomigration mount option
- Revert "NFS: Move the flock open mode check into nfs_flock()"
Features:
- Implement a stronger send queue accounting system for NFS over RDMA
- Switch some atomics to the new refcount_t type
Other bugfixes and cleanups:
- Clean up access mode bits
- Remove special-case revalidations in nfs_opendir()
- Improve invalidating NFS over RDMA memory for async operations that
time out
- Handle NFS over RDMA replies with a worqueue
- Handle NFS over RDMA sends with a workqueue
- Fix up replaying interrupted requests
- Remove dead NFS over RDMA definitions
- Update NFS over RDMA copyright information
- Be more consistent with bool initialization and comparisons
- Mark expected switch fall throughs
- Various sunrpc tracepoint cleanups
- Fix various OPEN races
- Fix a typo in nfs_rename()
- Use common error handling code in nfs_lock_and_join_request()
- Check that some structures are properly cleaned up during
net_exit()
- Remove net pointer from dprintk()s"
* tag 'nfs-for-4.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (62 commits)
NFS: Revert "NFS: Move the flock open mode check into nfs_flock()"
NFS: Fix typo in nomigration mount option
nfs: Fix ugly referral attributes
NFS: super: mark expected switch fall-throughs
sunrpc: remove net pointer from messages
nfs: remove net pointer from messages
sunrpc: exit_net cleanup check added
nfs client: exit_net cleanup check added
nfs/write: Use common error handling code in nfs_lock_and_join_requests()
NFSv4: Replace closed stateids with the "invalid special stateid"
NFSv4: nfs_set_open_stateid must not trigger state recovery for closed state
NFSv4: Check the open stateid when searching for expired state
NFSv4: Clean up nfs4_delegreturn_done
NFSv4: cleanup nfs4_close_done
NFSv4: Retry NFS4ERR_OLD_STATEID errors in layoutreturn
pNFS: Retry NFS4ERR_OLD_STATEID errors in layoutreturn-on-close
NFSv4: Don't try to CLOSE if the stateid 'other' field has changed
NFSv4: Retry CLOSE and DELEGRETURN on NFS4ERR_OLD_STATEID.
NFS: Fix a typo in nfs_rename()
NFSv4: Fix open create exclusive when the server reboots
...
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable nfs_client.cl_count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable nfs4_ff_layout_mirror.ref is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The client was freeing the nfs4_ff_layout_ds, but not the contained
nfs4_ff_ds_version array.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If the layout has expired due to a fencing event, then we should not
attempt to commit to the DS.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Upon receiving a stateid error such as BAD_STATEID, the client
should retry the operation against the MDS before deciding to
do stateid recovery.
Previously, the code would initiate state recovery and it could
lead to a race in a state manager that could chose an incorrect
recovery method which would lead to the EIO failure for the
application.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If xdr_inline_decode() fails then we end up returning ERR_PTR(0). The
caller treats NULL returns as -ENOMEM so it doesn't really hurt runtime,
but obviously we intended to set an error code here.
Fixes: d67ae825a5 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Layoutstats is always desirable when using the flexfiles driver, so
we should enable it if that driver is being loaded. It is safe to do
so, because even when the mount specifies NFSv4.1, we will turn it
off if the server tells us it is unsupported.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If the attempt to write through pNFS fails, we need to use the same
failure semantics as for the read path: If the FF_FLAGS_NO_IO_THRU_MDS
flag is set or we have sufficient valid DSes, then we must retry through
pNFS
Fixes: d67ae825a5 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If we have a layout segment cached in pgio->pg_lseg, we should check it
for validity before reusing it in a new RPC request. Otherwise, if we
recoalesce, we can end up looping forever.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The check in nfs4_ff_layout_prepare_ds() seems to be missing.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Fixes: a33e4b036d ("pNFS: return status from nfs4_pnfs_ds_connect")
Cc: Weston Andros Adamson <dros@primarydata.com>
Cc: stable@vger.kernel.org # v4.11
Flexfilelayout supports data servers which talk NFS v3 and v4.{0,1,2}.
However, this code path is disabled and v3 only servers are accepted.
This change removes this limitation.
Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Stable Bugfixes:
- Fix infinite loop on BAD_STATEID error
Other Bugfixes:
- Fix old dentry rehash after move
- Fix pnfs GETDEVINFO hangs
- Fix pnfs fallback to MDS on commit errors
- Fix flexfiles kernel oops
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAljeko0ACgkQ18tUv7Cl
QOsZbw/+MXYEZCaILgaAjzoWO4qJhNuAzlblh2jSX2nitY6NAva2MORZoAnxqS/G
2qVWdYLvfQ2rKkIazInktaqBgsnl5Got9EbrD2hdV5LvM953U3KJkeXZD67ncvV7
YYmxaFipLfTfmLZDXlQ1h5wTKXXXw0VA2v2YL+sRZhAzVhcTyMyf1n89lT6H9Fqx
UPitMuBbRskCPFMOZ6xP+T6MsOpeIGVHHOvYWSSydCT6IoujnTjNRaZ9+VFSm5iU
rubL/qokg2VIJ60xbmv/toq61FkhI2xtJTtBFxHZo47En3RdwB53zOez/OmvTFLZ
lKSCh/Xkk/DDhUQWrYDaydPyGE6mWR+E/18/BZh0tx0n+yvHPg0ax56CTSJwpvg7
f6pydxQo0RAH3S/IWb0JB3jyi++EKznWK2OokllOdmT3DrYyKD3LiTY4T2MO8Wlu
+miFq6yk1iQHZR4R4JDkCWzpD7JeeR6lkg19kRZlBJm2Pv+8Rzg4qjBgqAo5OPxV
RiQ0CuyKoR2rtMz/4pepBai4fM42S/Nc59QJI/45mTUJ40whyoygJEov7s3SdTZi
H6cL7Ewe8m++MNW2aAsM2M0CVzoyv0D4mnPxgE8t8lNbkcT14bdi1WFwAHo/ICMd
KpyhsDVltakqUbelD3WWJtTXbAPCgwlOzuPdfbqtBMQJNP1aONs=
=1Xgv
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.11-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
"Here are a few more bugfixes that came in over the last couple of
weeks. Most of these fix various hangs and loops that people found,
but we also had a few error handling fixes.
Stable Bugfixes:
- fix infinite loop on BAD_STATEID error
Other Bugfixes:
- fix old dentry rehash after move
- fix pnfs GETDEVINFO hangs
- fix pnfs fallback to MDS on commit errors
- fix flexfiles kernel oops"
* tag 'nfs-for-4.11-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
nfs: flexfiles: fix kernel OOPS if MDS returns unsupported DS type
NFSv4.1 fix infinite loop on IO BAD_STATEID error
PNFS fix fallback to MDS if got error on commit to DS
NFS filelayout:call GETDEVICEINFO after pnfs_layout_process completes
NFS store nfs4_deviceid in struct nfs4_filelayout_segment
NFS cleanup struct nfs4_filelayout_segment
NFS: Fix old dentry rehash after move
Stable Bugfixes:
- Fix decrementing nrequests in NFS v4.2 COPY to fix kernel warnings
- Prevent a double free in async nfs4_exchange_id()
- Squelch a kbuild sparse complaint for xprtrdma
Other Bugfixes:
- Fix a typo (NFS_ATTR_FATTR_GROUP_NAME) that causes a memory leak
- Fix a reference leak that causes kernel warnings
- Make nfs4_cb_sv_ops static to fix a sparse warning
- Respect a server's max size in CREATE_SESSION
- Handle errors from nfs4_pnfs_ds_connect
- Flexfiles layout shouldn't mark devices as unavailable
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAljMQsUACgkQ18tUv7Cl
QOt0FA//eSieOojEm9uJIxfydrJY2VkPgqg0xmIxLhcMmXi/d4kO9GpS9YeJJZi4
r5oClq1afhXVR83JmNDCvIYUNwf5/lluckuzXZEYlC3qUbXjQt/Nn/FHfrqW8qXV
HJy4PVwV+BHnfU6Y7p14zzucGPrMeWZQJO+7mRpboe1jcizHOMdcw+Aim7pr44y6
BI3QcLPtQGY4CnPOEkpDNuEWtO7iMME3bRJOJ2lOWz5smG0KAQo80OTHGXIe4OqR
d+gHhoHZ2LbZwdbs+rsjAIMFsFrgXqZmXYbQCZ9SEsr4ysj3PesHPdGFrKXCZCSM
0MjlEcznGl6ooPDD9tO5Bi047Xhq2TlUWF+FIVYOdFur+7oIcJcnJB7epoYEQ2d2
6RMvddeKmEgW5Y77myIb3G6jTnk7S8dMq5aAGSyUmKoVhybfw0PGFMbZ2gDEpaTG
HweeaPmR7J0e+MZBiShTBH2zulFcI1qG3kowu/oKccU9jGi/uA7vkXOSkaxkSzST
+vS30JwArNOj7OFqhGZbi1YzoK0ixxxXLD4DaEDKKm4mOt7g1Zmb0QgVnGSx1V6X
Or4Y4xTKn0vCt3e61O9dsBRApBCEVSBJMgYb9Z+LUSdQIKoUj+sQPMzY3iGTefcx
r7qiUddBZerQ0CZCsRxXk/otJawGCO9XFuSY4CksvlReTeyl1Tc=
=JY3W
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.11-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
"We have a handful of stable fixes to fix kernel warnings and other
bugs that have been around for a while. We've also found a few other
reference counting bugs and memory leaks since the initial 4.11 pull.
Stable Bugfixes:
- Fix decrementing nrequests in NFS v4.2 COPY to fix kernel warnings
- Prevent a double free in async nfs4_exchange_id()
- Squelch a kbuild sparse complaint for xprtrdma
Other Bugfixes:
- Fix a typo (NFS_ATTR_FATTR_GROUP_NAME) that causes a memory leak
- Fix a reference leak that causes kernel warnings
- Make nfs4_cb_sv_ops static to fix a sparse warning
- Respect a server's max size in CREATE_SESSION
- Handle errors from nfs4_pnfs_ds_connect
- Flexfiles layout shouldn't mark devices as unavailable"
* tag 'nfs-for-4.11-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
pNFS/flexfiles: never nfs4_mark_deviceid_unavailable
pNFS: return status from nfs4_pnfs_ds_connect
NFSv4.1 respect server's max size in CREATE_SESSION
NFS prevent double free in async nfs4_exchange_id
nfs: make nfs4_cb_sv_ops static
xprtrdma: Squelch kbuild sparse complaint
NFS: fix the fault nrequests decreasing for nfs_inode COPY
NFSv4: fix a reference leak caused WARNING messages
nfs4: fix a typo of NFS_ATTR_FATTR_GROUP_NAME
The flexfiles layout should never mark a device unavailable.
Move nfs4_mark_deviceid_unavailable out of nfs4_pnfs_ds_connect and call
directly from files layout where it's still needed.
The flexfiles driver still handles marked devices in error paths, but will
now print a rate limited warning.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The nfs4_pnfs_ds_connect path can call rpc_create which can fail or it
can wait on another context to reach the same failure.
This checks that the rpc_create succeeded and returns the error to the
caller.
When an error is returned, both the files and flexfiles layouts will return
NULL from _prepare_ds(). The flexfiles layout will also return the layout
with the error NFS4ERR_NXIO.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Stable bugfixes:
- NFSv4: Fix memory and state leak in _nfs4_open_and_get_state
- xprtrdma: Fix Read chunk padding
- xprtrdma: Per-connection pad optimization
- xprtrdma: Disable pad optimization by default
- xprtrdma: Reduce required number of send SGEs
- nlm: Ensure callback code also checks that the files match
- pNFS/flexfiles: If the layout is invalid, it must be updated before retrying
- NFSv4: Fix reboot recovery in copy offload
- Revert "NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE"
- NFSv4: fix getacl head length estimation
- NFSv4: fix getacl ERANGE for sum ACL buffer sizes
Features:
- Add and use dprintk_cont macros
- Various cleanups to NFS v4.x to reduce code duplication and complexity
- Remove unused cr_magic related code
- Improvements to sunrpc "read from buffer" code
- Clean up sunrpc timeout code and allow changing TCP timeout parameters
- Remove duplicate mw_list management code in xprtrdma
- Add generic functions for encoding and decoding xdr streams
Bugfixes:
- Clean up nfs_show_mountd_netid
- Make layoutreturn_ops static and use NULL instead of 0 to fix sparse warnings
- Properly handle -ERESTARTSYS in nfs_rename()
- Check if register_shrinker() failed during rpcauth_init()
- Properly clean up procfs/pipefs entries
- Various NFS over RDMA related fixes
- Silence unititialized variable warning in sunrpc
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAli3F7YACgkQ18tUv7Cl
QOvzrQ//dL+nnBaqsm9bA2wwuVJSQ2R1zdkwHOCWghEWROZrQHzpi0VHu0ZKBLzr
YsYFhHvIPax9Q8USY4B/QFQ3eUuZILEVn+xDruRxZaJPnsA4Zmr16VJwGF2F68Lh
CGekA5qybqy8lAG6v96Gyjbi+JqjHNCmelYWRv7SX9IZcDjNJpsEbrSI4LkabTWh
70WtCl3LBzVMRYRxe8+f0mcx4g4XCQ8pDaQRgRnfKtNeQk/+PgWz66xSNinDakVb
A8AkaiUadPRgUTpap6HfBSicpRvtLQeLhARC0E4YE5pXp2H/kUt2MFe5szblfSCv
zf2nrPUbNEHjBypFhERzCZZk6EonY6FeOojyW0g2C+rmPdK7WLlKbwTQFxdRGvsx
78fIiPRdlDHDp9CXzD8V4xxRBJX/KkicA1Vp8CoyQtmpzpu2fjwT0kr9HeD+aEe6
293+72QUfk05re2HYWF9MCGGVVLdnLLjrKCgwwRQ0HX5WF6GNQxX/yVgBVlqFeV3
xc8m7ltKco5N9JxIqwlIpySq2e114EQOqsmHYz3gxd7ID9J1NJz+9H2z2EvgAKZ7
wIPSLoZrdBdnoXG8ZDDTAvPKeB8l6egi6wjrvGKxewVlMbjzogdARsMKWoifnCfG
HMkH+IEvLGvFc1pPeLbscJGEdVWXVn0thO+8fkS9F9sE/zMX9PA=
=01DU
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"Highlights include:
Stable bugfixes:
- NFSv4: Fix memory and state leak in _nfs4_open_and_get_state
- xprtrdma: Fix Read chunk padding
- xprtrdma: Per-connection pad optimization
- xprtrdma: Disable pad optimization by default
- xprtrdma: Reduce required number of send SGEs
- nlm: Ensure callback code also checks that the files match
- pNFS/flexfiles: If the layout is invalid, it must be updated before
retrying
- NFSv4: Fix reboot recovery in copy offload
- Revert "NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION
replies to OP_SEQUENCE"
- NFSv4: fix getacl head length estimation
- NFSv4: fix getacl ERANGE for sum ACL buffer sizes
Features:
- Add and use dprintk_cont macros
- Various cleanups to NFS v4.x to reduce code duplication and
complexity
- Remove unused cr_magic related code
- Improvements to sunrpc "read from buffer" code
- Clean up sunrpc timeout code and allow changing TCP timeout
parameters
- Remove duplicate mw_list management code in xprtrdma
- Add generic functions for encoding and decoding xdr streams
Bugfixes:
- Clean up nfs_show_mountd_netid
- Make layoutreturn_ops static and use NULL instead of 0 to fix
sparse warnings
- Properly handle -ERESTARTSYS in nfs_rename()
- Check if register_shrinker() failed during rpcauth_init()
- Properly clean up procfs/pipefs entries
- Various NFS over RDMA related fixes
- Silence unititialized variable warning in sunrpc"
* tag 'nfs-for-4.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (64 commits)
NFSv4: fix getacl ERANGE for some ACL buffer sizes
NFSv4: fix getacl head length estimation
Revert "NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE"
NFSv4: Fix reboot recovery in copy offload
pNFS/flexfiles: If the layout is invalid, it must be updated before retrying
NFSv4: Clean up owner/group attribute decode
SUNRPC: Add a helper function xdr_stream_decode_string_dup()
NFSv4: Remove bogus "struct nfs_client" argument from decode_ace()
NFSv4: Fix the underestimation of delegation XDR space reservation
NFSv4: Replace callback string decode function with a generic
NFSv4: Replace the open coded decode_opaque_inline() with the new generic
NFSv4: Replace ad-hoc xdr encode/decode helpers with xdr_stream_* generics
SUNRPC: Add generic helpers for xdr_stream encode/decode
sunrpc: silence uninitialized variable warning
nlm: Ensure callback code also checks that the files match
sunrpc: Allow xprt->ops->timer method to sleep
xprtrdma: Refactor management of mw_list field
xprtrdma: Handle stale connection rejection
xprtrdma: Properly recover FRWRs with in-flight FASTREG WRs
xprtrdma: Shrink send SGEs array
...
Now that %z is standartised in C99 there is no reason to support %Z.
Unlike %L it doesn't even make format strings smaller.
Use BUILD_BUG_ON in a couple ATM drivers.
In case anyone didn't notice lib/vsprintf.o is about half of SLUB which
is in my opinion is quite an achievement. Hopefully this patch inspires
someone else to trim vsprintf.c more.
Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we see that our pNFS READ/WRITE/COMMIT operation failed, but we
also see that our layout segment is no longer valid, then we need to
get a new layout segment before retrying.
Fixes: 90816d1dda ("NFSv4.1/flexfiles: Don't mark the entire deviceid...")
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Fixes the following sparse warning:
fs/nfs/flexfilelayout/flexfilelayout.c:2114:34: warning:
symbol 'layoutreturn_ops' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
This does the right thing depending on if we have a session, rather than
needing to handle this manually in multiple places.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
No point in going through loops and hoops instead of just comparing the
values.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.
Get rid of the union and just keep ktime_t as simple typedef of type s64.
The conversion was done with coccinelle and some manual mopping up.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Instead of marking a device inactive, remove it from the cache entirely.
Flexfiles has a way to report errors back to the server, so we don't want
to stop devices from being tried again for 120 seconds.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The flexfiles client can piggyback both layout errors and layoutstats
as part of the layoutreturn. Both these payloads can get large, with
20 layout error entries taking up about 1.2K, and 4 layoutstats entries
taking up another 1K.
This patch allows a maximum payload of 4k by allocating a full page.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
We encountered a deadlock where the SEQUENCE that accompanied the
LAYOUTGET triggered a session drain, while ff_layout_alloc_lseg
triggered a GETDEVICEINFO. The GETDEVICEINFO hung waiting for the
session drain, while the LAYOUTGET held the slot waiting for
alloc_lseg to finish.
Avoid this by moving the call to nfs4_find_get_deviceid out of
ff_layout_alloc_lseg and into nfs4_ff_layout_prepare_ds.
Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
[dros@primarydata.com: pNFS/flexfiles: fix races in ff_layout_mirror_valid]
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
When we're merging an old entry into our new entry, we want to ensure that
we add the list entry in the correct place.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Add the layout error payload to the flexfiles layoutreturn private
data, and set up the encoding mechanisms. This is a refactoring in
preparation for adding the layout iostats payload.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If there have been no reads or writes to a given mirror since the last
layoutstats update, then don't resend the same data.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
This parameter hasn't been used since f8407299 (Linux 3.11-rc2), so
let's remove it from this function and callers.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Both pnfs.c and the flexfiles code have their own versions of the
range intersection testing, and the "end_offset" helper.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
We must not call nfs_pageio_init_read() on a new nfs_pageio_descriptor
while holding a reference to a layout segment, as that can deadlock
pnfs_update_layout().
Fixes: d67ae825a5 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.0+
Allow the callers of nfs_remove_bad_delegation() to specify the stateid
that needs to be marked as bad.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If the attempt to connect to a DS fails inside ff_layout_pg_init_read or
ff_layout_pg_init_write, then we currently end up clearing the layout
segment carried by the struct nfs_pageio_descriptor, causing an Oops
when we later call into ff_layout_read_pagelist/ff_layout_write_pagelist.
The fix is to ensure we return the layout and then retry.
Fixes: 446ca21953 ("pNFS/flexfiles: When initing reads or writes, we...")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Prior to this patch, the retrans value was set at 5, meaning that we
could see a maximum retransmission timeout value of more than 6 minutes.
That's a tad high for NFSv3 where the protocol does allow the server to
drop requests at any time.
Since this is a data channel, let's just set retrans to 0, and the default
timeout to 60s. The user can continue to adjust these defaults using the
dataserver_retrans and dataserver_timeo module parameters.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Putting the periodicity timer in the mirror instances is causing
non-scalable reporting behaviour and missed reporting intervals.
When you recall layouts and/or implement client side mirroring, it
leads to consecutive reports with only a few ms between RPC calls.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Fixes: d0379a5d06 ("pNFS/flexfiles: Support server-supplied...")
So ensure that we mark the layout for commit once the write is done,
and then ensure that the commit to ds is finished before sending
layoutcommit.
Note that by doing this, we're able to optimise away the commit
for the case of servers that don't need layoutcommit in order to
return updated attributes.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
We should always do a layoutcommit after commit to DS, except if
the layout segment we're using has set FF_FLAGS_NO_LAYOUTCOMMIT.
Fixes: d67ae825a5 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
As flexfiles has FF_FLAGS_NO_READ_IO, there is a need to generically
support enforcing that a IOMODE_RW segment will not allow READ I/O.
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The mds can inform the client not to use the IOMODE_RW layout
segment for doing READs. I.e., it is basically a
IOMODE_WRITE layout segment.
It would do this to not interfere with the WRITEs.
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Setting just the NFS_LAYOUT_RETURN_REQUESTED flag doesn't do anything,
unless there are lsegs that are also being marked for return. At the
point where that happens this flag is also set, so these set_bit calls
don't do anything useful.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Otherwise, we'll end up returning layouts that we've just received if
the client issues a new LAYOUTGET prior to the LAYOUTRETURN.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If we are initializing reads or writes and can not connect to a DS, then
check whether or not IO is allowed through the MDS. If it is allowed,
reset to the MDS. Else, fail the layout segment and force a retry
of a new layout segment.
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Whenever we check to see if we have the needed number of DSes for the
action, we may also have to check to see whether IO is allowed to go to
the MDS or not.
[jlayton: fix merge conflict due to lack of localio patches here]
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
This patch fixes a problem whereby the pNFS client falls back to doing
reads and writes through the metadata server even when the layout flag
FF_FLAGS_NO_IO_THRU_MDS is set.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
In order to more easily distinguish what kind of stateid we are dealing
with, introduce a type that can be used to label the stateid structure.
The label will be useful both for debugging, but also when dealing with
operations like SETATTR, READ and WRITE that can take several different
types of stateid as arguments.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
A mirror can be shared between multiple layouts, even with different
iomodes. That makes stats gathering simpler, but it causes a problem
when we get different creds in READ vs. RW layouts.
The current code drops the newer credentials onto the floor when this
occurs. That's problematic when you fetch a READ layout first, and then
a RW. If the READ layout doesn't have the correct creds to do a write,
then writes will fail.
We could just overwrite the READ credentials with the RW ones, but that
would break the ability for the server to fence the layout for reads if
things go awry. We need to be able to revert to the earlier READ creds
if the RW layout is returned afterward.
The simplest fix is to just keep two sets of creds per mirror. One for
READ layouts and one for RW, and then use the appropriate set depending
on the iomode of the layout segment.
Also fix up some RCU nits that sparse found.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
We're just as likely to have allocation problems here as we would if we
delay looking up the credential like we currently do. Fix the code to
get a rpc_cred reference early, as soon as the mirror is set up.
This allows us to eliminate the mirror early if there is a problem
getting an rpc credential. This also allows us to drop the uid/gid
from the layout_mirror struct as well.
In the event that we find an existing mirror where this one would go, we
swap in the new creds unconditionally, and drop the reference to the old
one.
Note that the old ff_layout_update_mirror_cred function wouldn't set
this pointer unless the DS version was 3, but we don't know what the DS
version is at this point. I'm a little unclear on why it did that as you
still need creds to talk to v4 servers as well. I have the code set
it regardless of the DS version here.
Also note the change to using generic creds instead of calling
lookup_cred directly. With that change, we also need to populate the
group_info pointer in the acred as some functions expect that to never
be NULL. Instead of allocating one every time however, we can allocate
one when the module is loaded and share it since the group_info is
refcounted.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
In later patches, we're going to want to allow the creds to be updated
when we get a new layout with updated creds. Have this function take
a reference to the cred that is later put once the call has been
dispatched.
Also, prepare for this change by ensuring we follow RCU rules when
getting a reference to the cred as well.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
All the callers already call that function before calling into here,
so it ends up being a no-op anyway.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Commit ea2cf22 created nfs_commit_info and saved &inode->i_lock inside
this NFS specific structure. This obscures the usage of i_lock.
Instead, save struct inode * so later it's clear the spinlock taken is
i_lock.
Should be no functional change.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
NFS_LAYOUT_RETURN_BEFORE_CLOSE is being used to signal that a
layoutreturn is needed, either due to a layout recall or to a
layout error. Rename it to NFS_LAYOUT_RETURN_REQUESTED in order
to clarify its purpose.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
We must not skip encoding the statistics, or the server will see an
XDR encoding error.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # 4.0+
When we hit 22 errors, we start to overflow the memory buffers allocated
to the LAYOUTRETURN errors. The issue is that currently, RPC call reply
ordering determines how successful we are in merging errors that refer
to contiguous READ or WRITE requests.
Fix is to use an insertion sort to help detect contiguity.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* bugfixes:
SUNRPC: Fixup socket wait for memory
SUNRPC: Fix a missing break in rpc_anyaddr()
pNFS/flexfiles: Fix an Oopsable typo in ff_mirror_match_fh()
NFS: Fix attribute cache revalidation
NFS: Ensure we revalidate attributes before using execute_ok()
NFS: Flush reclaim writes using FLUSH_COND_STABLE
NFS: Background flush should not be low priority
NFSv4.1/pnfs: Fixup an lo->plh_block_lgets imbalance in layoutreturn
NFSv4: Don't perform cached access checks before we've OPENed the file
NFS: Allow the combination pNFS and labeled NFS
NFS42: handle layoutstats stateid error
nfs: Fix race in __update_open_stateid()
nfs: fix missing assignment in nfs4_sequence_done tracepoint
* pnfs_generic:
NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range arguments
NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structures
NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid()
NFSv4.1/pNFS: Fix a race in initiate_file_draining()
NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layout
NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomode
NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateids
NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn()
NFS: Relax requirements in nfs_flush_incompatible
NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid
NFS: Allow multiple commit requests in flight per file
NFS/pNFS: Fix up pNFS write reschedule layering violations and bugs
NFSv4: List stateid information in the callback tracepoints
NFSv4.1/pNFS: Don't return NFS4ERR_DELAY unnecessarily in CB_LAYOUTRECALL
NFSv4.1/pNFS: Ensure we enforce RFC5661 Section 12.5.5.2.1
pNFS: If we have to delay the layout callback, mark the layout for return
NFSv4.1/pNFS: Add a helper to mark the layout as returned
pNFS: Ensure nfs4_layoutget_prepare returns the correct error
The flexfiles layout in particular, seems to want to poke around in the
O_DIRECT flags when retransmitting.
This patch sets up an interface to allow it to call back into O_DIRECT
to handle retransmission correctly. It also fixes a potential bug whereby
we could change the behaviour of O_DIRECT if an error is already pending.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff reports seeing an Oops in ff_layout_alloc_lseg. Turns out
copy+paste has played cruel tricks on a nested loop.
Reported-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org # 4.3+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Currently, we will only record the layoutstats correctly if the
RPC call successfully obtains a slot. If we exit before that
happens, then we may find ourselves starting the busy timer through
the call in ff_layout_(read|write)_prepare_layoutstats, but never stopping it.
The same thing happens if we're doing DA-DS.
The fix is to ensure that we catch these cases in the rpc_release()
callback.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
When we replay a failed read, write or commit to the dataserver, we
need to ensure that we call ff_layout_read_prepare_v3(),
ff_layout_write_prepare_v3 or ff_layout_commit_prepare_v3() so that we
reset the statistics.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>