Provide an option to allow the file or volume location server cursor to be
dumped if the rotation routine falls off the end without managing to
contact a server.
Signed-off-by: David Howells <dhowells@redhat.com>
Implement support for talking to YFS-variant fileservers in the cache
manager and the filesystem client. These implement upgraded services on
the same port as their AFS services.
YFS fileservers provide expanded capabilities over AFS.
Signed-off-by: David Howells <dhowells@redhat.com>
Expand fields in various data structures to support the expanded
information that YFS is capable of returning.
Signed-off-by: David Howells <dhowells@redhat.com>
Get the target vnode in afs_rmdir() and validate it before we attempt the
deletion, The vnode pointer will be passed through to the delivery function
in a later patch so that the delivery function can mark it deleted.
Signed-off-by: David Howells <dhowells@redhat.com>
Calculate the callback expiration time at the point of operation reply
delivery, using the reply time queried from AF_RXRPC on that call as a
base.
Signed-off-by: David Howells <dhowells@redhat.com>
The FS.FetchStatus reply delivery function was updating inode of the
directory in which a lookup had been done with the status of the looked up
file. This corrupts some of the directory state.
Fixes: 5cf9dd55a0 ("afs: Prospectively look up extra files when doing a single lookup")
Signed-off-by: David Howells <dhowells@redhat.com>
Implement the YFS cache manager service which gives extra capabilities on
top of AFS. This is done by listening for an additional service on the
same port and indicating that anyone requesting an upgrade should be
upgraded to the YFS port.
Signed-off-by: David Howells <dhowells@redhat.com>
Remove unnecessary details of a broken callback, such as version, expiry
and type, from the afs_callback_break struct as they're not actually used
and make the list take more memory.
Signed-off-by: David Howells <dhowells@redhat.com>
Call the function to commit the status on a new file, dir or symlink so
that the access rights for the caller's key are cached for that object.
Without this, the next access to the file will cause a FetchStatus
operation to be emitted to retrieve the access rights.
Signed-off-by: David Howells <dhowells@redhat.com>
Increase the sizes of the volume ID to 64 bits and the vnode ID (inode
number equivalent) to 96 bits to allow the support of YFS.
This requires the iget comparator to check the vnode->fid rather than i_ino
and i_generation as i_ino is not sufficiently capacious. It also requires
this data to be placed into the vnode cache key for fscache.
For the moment, just discard the top 32 bits of the vnode ID when returning
it though stat.
Signed-off-by: David Howells <dhowells@redhat.com>
When writing a new page, clear space in the page rather than attempting to
load it from the server if the space is beyond the EOF.
Signed-off-by: David Howells <dhowells@redhat.com>
Fix afs_deliver_to_call() to handle -EIO being returned by the operation
delivery function, indicating that the call found itself in the wrong
state, by printing an error and aborting the call.
Currently, an assertion failure will occur. This can happen, say, if the
delivery function falls off the end without calling afs_extract_data() with
the want_more parameter set to false to collect the end of the Rx phase of
a call.
The assertion failure looks like:
AFS: Assertion failed
4 == 7 is false
0x4 == 0x7 is false
------------[ cut here ]------------
kernel BUG at fs/afs/rxrpc.c:462!
and is matched in the trace buffer by a line like:
kworker/7:3-3226 [007] ...1 85158.030203: afs_io_error: c=0003be0c r=-5 CM_REPLY
Fixes: 98bf40cd99 ("afs: Protect call->state changes against signals")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Currently the TTL on VL server and address lists isn't set in all
circumstances and may be set to poor choices in others, since the TTL is
derived from the SRV/AFSDB DNS record if and when available.
Fix the TTL by limiting the range to a minimum and maximum from the current
time. At some point these can be made into sysctl knobs. Further, use the
TTL we obtained from the upcall to set the expiry on negative results too;
in future a mechanism can be added to force reloading of such data.
Signed-off-by: David Howells <dhowells@redhat.com>
Track VL servers as independent entities rather than lumping all their
addresses together into one set and implement server-level rotation by:
(1) Add the concept of a VL server list, where each server has its own
separate address list. This code is similar to the FS server list.
(2) Use the DNS resolver to retrieve a set of servers and their associated
addresses, ports, preference and weight ratings.
(3) In the case of a legacy DNS resolver or an address list given directly
through /proc/net/afs/cells, create a list containing just a dummy
server record and attach all the addresses to that.
(4) Implement a simple rotation policy, for the moment ignoring the
priorities and weights assigned to the servers.
(5) Show the address list through /proc/net/afs/<cell>/vlservers. This
also displays the source and status of the data as indicated by the
upcall.
Signed-off-by: David Howells <dhowells@redhat.com>
Improve the error handling in FS server rotation by:
(1) Cache the latest useful error value for the fs operation as a whole in
struct afs_fs_cursor separately from the error cached in the
afs_addr_cursor struct. The one in the address cursor gets clobbered
occasionally. Copy over the error to the fs operation only when it's
something we'd be interested in passing to userspace.
(2) Make it so that EDESTADDRREQ is the default that is seen only if no
addresses are available to be accessed.
(3) When calling utility functions, such as checking a volume status or
probing a fileserver, don't let a successful result clobber the cached
error in the cursor; instead, stash the result in a temporary variable
until it has been assessed.
(4) Don't return ETIMEDOUT or ETIME if a better error, such as
ENETUNREACH, is already cached.
(5) On leaving the rotation loop, turn any remote abort code into a more
useful error than ECONNABORTED.
Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
afs_extract_data sets up a temporary iov_iter and passes it to AF_RXRPC
each time it is called to describe the remaining buffer to be filled.
Instead:
(1) Put an iterator in the afs_call struct.
(2) Set the iterator for each marshalling stage to load data into the
appropriate places. A number of convenience functions are provided to
this end (eg. afs_extract_to_buf()).
This iterator is then passed to afs_extract_data().
(3) Use the new ITER_DISCARD iterator to discard any excess data provided
by FetchData.
Signed-off-by: David Howells <dhowells@redhat.com>
Include the site of detection of AFS protocol errors in trace lines to
better be able to determine what went wrong.
Signed-off-by: David Howells <dhowells@redhat.com>
In the iov_iter struct, separate the iterator type from the iterator
direction and use accessor functions to access them in most places.
Convert a bunch of places to use switch-statements to access them rather
then chains of bitwise-AND statements. This makes it easier to add further
iterator types. Also, this can be more efficient as to implement a switch
of small contiguous integers, the compiler can use ~50% fewer compare
instructions than it has to use bitwise-and instructions.
Further, cease passing the iterator type into the iterator setup function.
The iterator function can set that itself. Only the direction is required.
Signed-off-by: David Howells <dhowells@redhat.com>
Use accessor functions to access an iterator's type and direction. This
allows for the possibility of using some other method of determining the
type of iterator than if-chains with bitwise-AND conditions.
Signed-off-by: David Howells <dhowells@redhat.com>
net/sched/cls_api.c has overlapping changes to a call to
nlmsg_parse(), one (from 'net') added rtm_tca_policy instead of NULL
to the 5th argument, and another (from 'net-next') added cb->extack
instead of NULL to the 6th argument.
net/ipv4/ipmr_base.c is a case of a bug fix in 'net' being done to
code which moved (to mr_table_dump)) in 'net-next'. Thanks to David
Ahern for the heads up.
Signed-off-by: David S. Miller <davem@davemloft.net>
fscache_set_key() can incur an out-of-bounds read, reported by KASAN:
BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x5b3/0x680 [fscache]
Read of size 4 at addr ffff88084ff056d4 by task mount.nfs/32615
and also reported by syzbot at https://lkml.org/lkml/2018/7/8/236
BUG: KASAN: slab-out-of-bounds in fscache_set_key fs/fscache/cookie.c:120 [inline]
BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x7a9/0x880 fs/fscache/cookie.c:171
Read of size 4 at addr ffff8801d3cc8bb4 by task syz-executor907/4466
This happens for any index_key_len which is not divisible by 4 and is
larger than the size of the inline key, because the code allocates exactly
index_key_len for the key buffer, but the hashing loop is stepping through
it 4 bytes (u32) at a time in the buf[] array.
Fix this by calculating how many u32 buffers we'll need by using
DIV_ROUND_UP, and then using kcalloc() to allocate a precleared allocation
buffer to hold the index_key, then using that same count as the hashing
index limit.
Fixes: ec0328e46d ("fscache: Maintain a catalogue of allocated cookies")
Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The inline key in struct rxrpc_cookie is insufficiently initialized,
zeroing only 3 of the 4 slots, therefore an index_key_len between 13 and 15
bytes will end up hashing uninitialized memory because the memcpy only
partially fills the last buf[] element.
Fix this by clearing fscache_cookie objects on allocation rather than using
the slab constructor to initialise them. We're going to pretty much fill
in the entire struct anyway, so bringing it into our dcache writably
shouldn't incur much overhead.
This removes the need to do clearance in fscache_set_key() (where we aren't
doing it correctly anyway).
Also, we don't need to set cookie->key_len in fscache_set_key() as we
already did it in the only caller, so remove that.
Fixes: ec0328e46d ("fscache: Maintain a catalogue of allocated cookies")
Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com
Reported-by: Eric Sandeen <sandeen@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
the victim might've been rmdir'ed just before the lock_rename();
unlike the normal callers, we do not look the source up after the
parents are locked - we know it beforehand and just recheck that it's
still the child of what used to be its parent. Unfortunately,
the check is too weak - we don't spot a dead directory since its
->d_parent is unchanged, dentry is positive, etc. So we sail all
the way to ->rename(), with hosting filesystems _not_ expecting
to be asked renaming an rmdir'ed subdirectory.
The fix is easy, fortunately - the lock on parent is sufficient for
making IS_DEADDIR() on child safe.
Cc: stable@vger.kernel.org
Fixes: 9ae326a690 (CacheFiles: A cache that backs onto a mounted filesystem)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The recent patch to fix the afs_server struct leak didn't actually fix the
bug, but rather fixed some of the symptoms. The problem is that an
asynchronous call that holds a resource pointed to by call->reply[0] will
find the pointer cleared in the call destructor, thereby preventing the
resource from being cleaned up.
In the case of the server record leak, the afs_fs_get_capabilities()
function in devel code sets up a call with reply[0] pointing at the server
record that should be altered when the result is obtained, but this was
being cleared before the destructor was called, so the put in the
destructor does nothing and the record is leaked.
Commit f014ffb025 removed the additional ref obtained by
afs_install_server(), but the removal of this ref is actually used by the
garbage collector to mark a server record as being defunct after the record
has expired through lack of use.
The offending clearance of call->reply[0] upon completion in
afs_process_async_call() has been there from the origin of the code, but
none of the asynchronous calls actually use that pointer currently, so it
should be safe to remove (note that synchronous calls don't involve this
function).
Fix this by the following means:
(1) Revert commit f014ffb025.
(2) Remove the clearance of reply[0] from afs_process_async_call().
Without this, afs_manage_servers() will suffer an assertion failure if it
sees a server record that didn't get used because the usage count is not 1.
Fixes: f014ffb025 ("afs: Fix afs_server struct leak")
Fixes: 08e0e7c82e ("[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC.")
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Fix a livelock in dax_layout_busy_page() present since v4.18. The
lockup triggers when truncating an actively mapped huge page out of a
mapping pinned for direct-I/O.
* Fix mprotect() clobbers of _PAGE_DEVMAP. Broken since v4.5 mprotect()
clears this flag that is needed to communicate the liveness of device
pages to the get_user_pages() path.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbwiZhAAoJEB7SkWpmfYgCYFoQAL8ED6c1bfGUPRsWSrTRChU0
ungVZ/Vf1+2ERd3ivUXPQzahNtqH5EWvEVp0aboVpyJUoVllrztInVS2hxaGJE+e
w7WnzaXh36MY0kvLpK+Ny1Cxk7qg2rXnmzOAPRVdSUoSvh0TXOn5HFX1i/OdI7WK
wgJwXraCoyKP9aTItw7oHQy9S36bi1RJVUakOAoEpEx4Vn+fwFxLNIt34G5CRJ+k
iflicM7CPngxlFzwfoiX9v3DhV7toexk1A4LAzzwypG0Aiqd5tW2FG1lwLMPncNk
8FezBm9VjkMwzv6hj7nD9UfU2lbh3GqqGDW0cPX1DPSgDxr/4pOLtKcbYWHh6yas
NtCXk37q90ey3GtD2wYBRkBNly6UWvHJ0d3srtO6ZSl1VN6JQu8rhVhQ6KnON24B
NcWlEVf2brqf0uaW4byCVbdVfIDp96/qgEvCo1pq3olXwCdDyOBJjYxaBcnu5JDV
YsItMCJ49AxS/qoCt3vam7vC5TGhfYHL5xJPaF06cdjYvgfqOIV3VQT1ujBx4cvh
MBFRBKDc6oDiJFgkrdYqHwJfn5fCQVS180Oy5S0AFGsVAzsJalKBZBLx2f2RQn8c
r+kczvvPjpczEeEqzaqsxTgjowo/75Q8PRXc2PbwQzNxfkHuKf+xxQpnUg0mN6Hf
w8zPSaCcCs2Wo21Kd/ua
=VXnU
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-fixes-4.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Dan writes:
"libnvdimm/dax 4.19-rc8
* Fix a livelock in dax_layout_busy_page() present since v4.18. The
lockup triggers when truncating an actively mapped huge page out of
a mapping pinned for direct-I/O.
* Fix mprotect() clobbers of _PAGE_DEVMAP. Broken since v4.5
mprotect() clears this flag that is needed to communicate the
liveness of device pages to the get_user_pages() path."
* tag 'libnvdimm-fixes-4.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
mm: Preserve _PAGE_DEVMAP across mprotect() calls
filesystem-dax: Fix dax_layout_busy_page() livelock
ubifs_assert() is not WARN_ON(), so we have to invert
the checks.
Randy faced this warning with UBIFS being a module, since
most users use UBIFS as builtin because UBIFS is the rootfs
nobody noticed so far. :-(
Including me.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: 54169ddd38 ("ubifs: Turn two ubifs_assert() into a WARN_ON()")
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On non-preempt kernels this loop can take a long time (more than 50 ticks)
processing through entries.
Link: http://lkml.kernel.org/r/20181010172623.57033-1-khazhy@google.com
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix the following compile warning:
fs/ocfs2/dlmglue.c:99:30: warning: lockdep_keys defined but not used [-Wunused-variable]
static struct lock_class_key lockdep_keys[OCFS2_NUM_LOCK_TYPES];
Link: http://lkml.kernel.org/r/1536938148-32110-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Conflicts were easy to resolve using immediate context mostly,
except the cls_u32.c one where I simply too the entire HEAD
chunk.
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a leak of afs_server structs. The routine that installs them in the
various lookup lists and trees gets a ref on leaving the function, whether
it added the server or a server already exists. It shouldn't increment
the refcount if it added the server.
The effect of this that "rmmod kafs" will hang waiting for the leaked
server to become unused.
Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It turns out that the fix in commit 6636c3cc56 is bad; the assertion
that the iomap code no longer creates buffer heads is incorrect for
filesystems that set the IOMAP_F_BUFFER_HEAD flag.
Instead, what's happening is that gfs2_iomap_begin_write treats all
files that have the jdata flag set as journaled files, which is
incorrect as long as those files are inline ("stuffed"). We're handling
stuffed files directly via the page cache, which is why we ended up with
pages without buffer heads in gfs2_page_add_databufs.
Fix this by handling stuffed journaled files correctly in
gfs2_iomap_begin_write.
This reverts commit 6636c3cc5690c11631e6366cf9a28fb99c8b25bb.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Access to the list of cells by /proc/net/afs/cells has a couple of
problems:
(1) It should be checking against SEQ_START_TOKEN for the keying the
header line.
(2) It's only holding the RCU read lock, so it can't just walk over the
list without following the proper RCU methods.
Fix these by using an hlist instead of an ordinary list and using the
appropriate accessor functions to follow it with RCU.
Since the code that adds a cell to the list must also necessarily change,
sort the list on insertion whilst we're at it.
Fixes: 989782dcdc ("afs: Overhaul cell database management")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Update for 4.19-rc7 to fix numerous file clone and deduplication issues.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJbvo1jAAoJEK3oKUf0dfodurwP/3OGvJo11XjE5Wuh5tbcSOUm
duJdJvN+b0gx8Lb1IL2lHaBEt/Pw3PKi+KPx6d+pd47aflCYNMiU7I1kzejxvq9I
TBVdhxffAmNBU02VU/qmhucMnKfpVr/UX19f0sHZcfXbZkpIuHrSpI25MNSYqbEs
bRoMDkgbuu377hCJu6tnBAUm38z4iZdfJaGPVmJmuVMn8JJE8KA3Une/daxg0/HY
zbCMWP3SGJwqx7SyyeurQSkRS/8GG3LgbnYOz0FlaHfBd4JVJscHOZU08nrYmMAN
9SOowvahaPkDcyO8c6gRkyE6NIbOsb79718g+HTm4eqzyChnh+jnFTzitcTMuK2c
vjblXZSDKnN/P4UaEEzIAnQ1Eew4WhLrKuBr+3fCr2bKTGfj0iiX36CjEgk1k+Df
t1DV/Hj6me6crZpWO7GQZVLnDsiJBzaOsIgpYnB3vcJyof/cgm+5bfGFedDZFdpV
XTh0oBN8B7oQztd4MvcfQOGXSktrMtq+6RomO8kv77mVtHFnjnroKHq/wIft2nyI
rs7u7DFmCLyB9Lbm8aoeMPo4+0zxTp9385HSJ+XznHcjGXxkxIIGhhdU3omKxANa
j3UQB1+VRC4lHSluUXOW7vH0j1tX3gaS3z/yJ+gfcAdizUmdp0h0rMx8c9VR8SEJ
xv/fpbrcKHyBwuNvKtBm
=yj1P
-----END PGP SIGNATURE-----
Merge tag 'xfs-fixes-for-4.19-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Dave writes:
"xfs: fixes for 4.19-rc7
Update for 4.19-rc7 to fix numerous file clone and deduplication issues."
* tag 'xfs-fixes-for-4.19-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix data corruption w/ unaligned reflink ranges
xfs: fix data corruption w/ unaligned dedupe ranges
xfs: update ctime and remove suid before cloning files
xfs: zero posteof blocks when cloning above eof
xfs: refactor clonerange preparation into a separate helper
Commit 64bc06bb32 broke buffered writes to journaled files (chattr
+j): we'll try to journal the buffer heads of the page being written to
in gfs2_iomap_journaled_page_done. However, the iomap code no longer
creates buffer heads, so we'll BUG() in gfs2_page_add_databufs. Fix
that by creating buffer heads ourself when needed.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
In the presence of multi-order entries the typical
pagevec_lookup_entries() pattern may loop forever:
while (index < end && pagevec_lookup_entries(&pvec, mapping, index,
min(end - index, (pgoff_t)PAGEVEC_SIZE),
indices)) {
...
for (i = 0; i < pagevec_count(&pvec); i++) {
index = indices[i];
...
}
index++; /* BUG */
}
The loop updates 'index' for each index found and then increments to the
next possible page to continue the lookup. However, if the last entry in
the pagevec is multi-order then the next possible page index is more
than 1 page away. Fix this locally for the filesystem-dax case by
checking for dax-multi-order entries. Going forward new users of
multi-order entries need to be similarly careful, or we need a generic
way to report the page increment in the radix iterator.
Fixes: 5fac7408d8 ("mm, fs, dax: handle layout changes to pinned dax...")
Cc: <stable@vger.kernel.org>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When reflinking sub-file ranges, a data corruption can occur when
the source file range includes a partial EOF block. This shares the
unknown data beyond EOF into the second file at a position inside
EOF, exposing stale data in the second file.
XFS only supports whole block sharing, but we still need to
support whole file reflink correctly. Hence if the reflink
request includes the last block of the souce file, only proceed with
the reflink operation if it lands at or past the destination file's
current EOF. If it lands within the destination file EOF, reject the
entire request with -EINVAL and make the caller go the hard way.
This avoids the data corruption vector, but also avoids disruption
of returning EINVAL to userspace for the common case of whole file
cloning.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
A deduplication data corruption is Exposed by fstests generic/505 on
XFS. It is caused by extending the block match range to include the
partial EOF block, but then allowing unknown data beyond EOF to be
considered a "match" to data in the destination file because the
comparison is only made to the end of the source file. This corrupts
the destination file when the source extent is shared with it.
XFS only supports whole block dedupe, but we still need to appear to
support whole file dedupe correctly. Hence if the dedupe request
includes the last block of the souce file, don't include it in the
actual XFS dedupe operation. If the rest of the range dedupes
successfully, then report the partial last block as deduped, too, so
that userspace sees it as a successful dedupe rather than return
EINVAL because we can't dedupe unaligned blocks.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
In dlm_init_lockres() we access and modify res->tracking and
dlm->tracking_list without holding dlm->track_lock. This can cause list
corruptions and can end up in kernel panic.
Fix this by locking res->tracking and dlm->tracking_list with
dlm->track_lock instead of dlm->spinlock.
Link: http://lkml.kernel.org/r/1529951192-4686-1-git-send-email-ashish.samant@oracle.com
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Acked-by: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, you can use /proc/self/task/*/stack to cause a stack walk on
a task you control while it is running on another CPU. That means that
the stack can change under the stack walker. The stack walker does
have guards against going completely off the rails and into random
kernel memory, but it can interpret random data from your kernel stack
as instruction pointers and stack pointers. This can cause exposure of
kernel stack contents to userspace.
Restrict the ability to inspect kernel stacks of arbitrary tasks to root
in order to prevent a local attacker from exploiting racy stack unwinding
to leak kernel task stack contents. See the added comment for a longer
rationale.
There don't seem to be any users of this userspace API that can't
gracefully bail out if reading from the file fails. Therefore, I believe
that this change is unlikely to break things. In the case that this patch
does end up needing a revert, the next-best solution might be to fake a
single-entry stack based on wchan.
Link: http://lkml.kernel.org/r/20180927153316.200286-1-jannh@google.com
Fixes: 2ec220e27f ("proc: add /proc/*/stack")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ocfs2_duplicate_clusters_by_page() may crash if one of the extent's pages
is dirty. When a page has not been written back, it is still in dirty
state. If ocfs2_duplicate_clusters_by_page() is called against the dirty
page, the crash happens.
To fix this bug, we can just unlock the page and wait until the page until
its not dirty.
The following is the backtrace:
kernel BUG at /root/code/ocfs2/refcounttree.c:2961!
[exception RIP: ocfs2_duplicate_clusters_by_page+822]
__ocfs2_move_extent+0x80/0x450 [ocfs2]
? __ocfs2_claim_clusters+0x130/0x250 [ocfs2]
ocfs2_defrag_extent+0x5b8/0x5e0 [ocfs2]
__ocfs2_move_extents_range+0x2a4/0x470 [ocfs2]
ocfs2_move_extents+0x180/0x3b0 [ocfs2]
? ocfs2_wait_for_recovery+0x13/0x70 [ocfs2]
ocfs2_ioctl_move_extents+0x133/0x2d0 [ocfs2]
ocfs2_ioctl+0x253/0x640 [ocfs2]
do_vfs_ioctl+0x90/0x5f0
SyS_ioctl+0x74/0x80
do_syscall_64+0x74/0x140
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Once we find the page is dirty, we do not wait until it's clean, rather we
use write_one_page() to write it back
Link: http://lkml.kernel.org/r/20180829074740.9438-1-lchen@suse.com
[lchen@suse.com: update comments]
Link: http://lkml.kernel.org/r/20180830075041.14879-1-lchen@suse.com
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Larry Chen <lchen@suse.com>
Acked-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAlu2jLMACgkQiiy9cAdy
T1GLMQwAiUaUTtR0MN2ws+IHNnwFX5YnWFa1Cr/6Tgz9WBo5TrrvQcBGhTIUQok2
7w2+I0rXPO7P7Tq2odWn66BT1/l0o2U10aB4OHYlS7zqfCAp6GuwKGtATivRqjsg
ENn2mmBRuTmz+XtGmlVklmQpHjn4+MTR2GSLA57oo1fqX5YBeSxcq2UEciCGttRQ
nAvn923NrPSbGRvL/eDDWGXas8gMATXnrNlN8st2AsxnoFC6CrzUuRM0s/IFgp6r
Orh0jIlovlT2Q6LdND7n7xXnEVoyUTETG89IjRVwcBJ9LtXrJXpuWo7/sOr11cMy
sGeUkILNlmkIdLtIlUaZcGaclq/yfeKdMxziBT8tWP6wXIH4LrGCzvZ9jDtrzygU
EWL2nLsqMJNh0+wsYSZM/m9VFC1ReeB1AgIC2EHu/xzqWoTeL40i2OhKi/732FEX
H+FlTZUoorhFoI96AMJMrqXjXyqZUZSQ1b5iJ7/jq888Vdey+m7HiMUtxv0+5yzC
+lQksh8X
=4PHD
-----END PGP SIGNATURE-----
Merge tag '4.19-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Steve writes:
"SMB3 fixes
four small SMB3 fixes: one for stable, the others to address a more
recent regression"
* tag '4.19-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb3: fix lease break problem introduced by compounding
cifs: only wake the thread for the very last PDU in a compound
cifs: add a warning if we try to to dequeue a deleted mid
smb2: fix missing files in root share directory listing
Before cloning into a file, update the ctime and remove sensitive
attributes like suid, just like we'd do for a regular file write.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
When we're reflinking between two files and the destination file range
is well beyond the destination file's EOF marker, zero any posteof
speculative preallocations in the destination file so that we don't
expose stale disk contents. The previous strategy of trying to clear
the preallocations does not work if the destination file has the
PREALLOC flag set.
Uncovered by shared/010.
Reported-by: Zorro Lang <zlang@redhat.com>
Bugzilla-id: https://bugzilla.kernel.org/show_bug.cgi?id=201259
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Refactor all the reflink preparation steps into a separate helper
that we'll use to land all the upcoming fixes for insufficient input
checks.
This rework also moves the invalidation of the destination range to
the prep function so that it is done before the range is remapped.
This ensures that nobody can access the data in range being remapped
until the remap is complete.
[dgc: fix xfs_reflink_remap_prep() return value and caller check to
handle vfs_clone_file_prep_inodes() returning 0 to mean "nothing to
do". ]
[dgc: make sure length changed by vfs_clone_file_prep_inodes() gets
propagated back to XFS code that does the remapping. ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCW7YOCQAKCRDh3BK/laaZ
PAmGAQC1/3+mkfHXqyDx+zv4f7A348ZULW26EWFWwMwlEljWowD+PlRFBgUu5v5c
194mCApZU7hl6o0u07aijjz6m81e6QE=
=dM89
-----END PGP SIGNATURE-----
Merge tag 'ovl-fixes-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Miklos writes:
"overlayfs fixes for 4.19-rc7
This update fixes a couple of regressions in the stacked file update
added in this cycle, as well as some older bugs uncovered by
syzkaller.
There's also one trivial naming change that touches other parts of
the fs subsystem."
* tag 'ovl-fixes-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: fix format of setxattr debug
ovl: fix access beyond unterminated strings
ovl: make symbol 'ovl_aops' static
vfs: swap names of {do,vfs}_clone_file_range()
ovl: fix freeze protection bypass in ovl_clone_file_range()
ovl: fix freeze protection bypass in ovl_write_iter()
ovl: fix memory leak on unlink of indexed file
Accumlated regression and bug fixes for 4.19-rc6, including:
o make iomap correctly mark dirty pages for sub-page block sizes
o fix regression in handling extent-to-btree format conversion errors
o fix torn log wrap detection for new logs
o various corrupt inode detection fixes
o various delalloc state fixes
o cleanup all the missed transaction cancel cases missed from changes merged
in 4.19-rc1
o fix lockdep false positive on transaction allocation
o fix locking and reference counting on buffer log items
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJbtVyHAAoJEK3oKUf0dfodc8YP/1hT+TdXZDBVPo9kXQwmrnra
8P1J8IEuGj851PwYQobUhifdDh4eQ+PpcYEIfqTSGhtjTg7cqyQOvTo4uUKHLn50
8mFXQb6JrAFYnjGjorOCde+lpoQrtj3oKASscs7RaL/W1RNffY74UGdiE0yOJnJs
9IC7TpJT4ZzAgYBgC61whfYWmszLmRuKlVZWgXrM8hkMqliHdfrXeZk7MXTeiflz
Q5+9eVnCCgTtC0TWgVAkFMvlZs7UtNMWIIn6zt+HQLB7Vms6q4ArpIT+fF45njwG
94tyTcFT0AcIJ8OIi5i85fOXcDGjTFFf554Yf80PlWGlk3SbXPHFQao/ItDA4S1Z
eCl2eurUOXlNXKuyzWoV5KjOBiiBicbOwl6eX0K996cGehhEdFOvBihvAGjJy5rm
c4plTTy6bhYKZWr3JLuaPDlNWDMr/P/aUkiq9tFXyaMmVKNeyxd6UKzIokl5uUfi
Ycik4Ik8zRgpFEUx5Clvb2W5qH9pqpR0WcFhrcj0mDnJa10TpBWesA0g2F+hM0Jc
0mSuGxfQeJk7tuVcvsBpPNzgVEPKabvnYbOlGK7HvCSOPRkg0Fhmj+iJK721ccVl
Ltiz0Y485eAGLKn9kOWKr47A5GAAFpuK29OrD2a8XizHPZ3Sr2CG5X1jI0gtMb8n
BiBAxO6dojRQATDuTI6w
=5ylA
-----END PGP SIGNATURE-----
Merge tag 'xfs-fixes-for-4.19-rc6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Dave writes:
"XFS fixes for 4.19-rc6
Accumlated regression and bug fixes for 4.19-rc6, including:
o make iomap correctly mark dirty pages for sub-page block sizes
o fix regression in handling extent-to-btree format conversion errors
o fix torn log wrap detection for new logs
o various corrupt inode detection fixes
o various delalloc state fixes
o cleanup all the missed transaction cancel cases missed from changes merged
in 4.19-rc1
o fix lockdep false positive on transaction allocation
o fix locking and reference counting on buffer log items"
* tag 'xfs-fixes-for-4.19-rc6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix error handling in xfs_bmap_extents_to_btree
iomap: set page dirty after partial delalloc on mkwrite
xfs: remove invalid log recovery first/last cycle check
xfs: validate inode di_forkoff
xfs: skip delalloc COW blocks in xfs_reflink_end_cow
xfs: don't treat unknown di_flags2 as corruption in scrub
xfs: remove duplicated include from alloc.c
xfs: don't bring in extents in xfs_bmap_punch_delalloc_range
xfs: fix transaction leak in xfs_reflink_allocate_cow()
xfs: avoid lockdep false positives in xfs_trans_alloc
xfs: refactor xfs_buf_log_item reference count handling
xfs: clean up xfs_trans_brelse()
xfs: don't unlock invalidated buf on aborted tx commit
xfs: remove last of unnecessary xfs_defer_cancel() callers
xfs: don't crash the vfs on a garbage inline symlink