Fix atime update logic in overlayfs.
This patch adds an i_op->update_time() handler to overlayfs inodes. This
forwards atime updates to the upper layer only. No atime updates are done
on lower layers.
Remove implicit atime updates to underlying files and directories with
O_NOATIME. Remove explicit atime update in ovl_readlink().
Clear atime related mnt flags from cloned upper mount. This means atime
updates are controlled purely by overlayfs mount options.
Reported-by: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
When creating directory in workdir, the group/sgid inheritance from the
parent dir was omitted completely. Fix this by calling inode_init_owner()
on overlay inode and using the resulting uid/gid/mode to create the file.
Unfortunately the sgid bit can be stripped off due to umask, so need to
reset the mode in this case in workdir before moving the directory in
place.
Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
The fact that we always do permission checking on the overlay inode and
clear MAY_WRITE for checking access to the lower inode allows cruft to be
removed from ovl_permission().
1) "default_permissions" option effectively did generic_permission() on the
overlay inode with i_mode, i_uid and i_gid updated from underlying
filesystem. This is what we do by default now. It did the update using
vfs_getattr() but that's only needed if the underlying filesystem can
change (which is not allowed). We may later introduce a "paranoia_mode"
that verifies that mode/uid/gid are not changed.
2) splitting out the IS_RDONLY() check from inode_permission() also becomes
unnecessary once we remove the MAY_WRITE from the lower inode check.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Now we have two levels of checks in ovl_permission(). overlay inode
is checked with the creds of task while underlying inode is checked
with the creds of mounter.
Looks like mounter does not have to have WRITE access to files on lower/.
So remove the MAY_WRITE from access mask for checks on underlying
lower inode.
This means task should still have the MAY_WRITE permission on lower
inode and mounter is not required to have MAY_WRITE.
It also solves the problem of read only NFS mounts being used as lower.
If __inode_permission(lower_inode, MAY_WRITE) is called on read only
NFS, it fails. By resetting MAY_WRITE, check succeeds and case of
read only NFS shold work with overlay without having to specify any
special mount options (default permission).
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Given we are now doing checks both on overlay inode as well underlying
inode, we should be able to do checks and operations on underlying file
system using mounter's context.
So modify all operations to do checks/operations on underlying dentry/inode
in the context of mounter.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Right now ovl_permission() calls __inode_permission(realinode), to do
permission checks on real inode and no checks are done on overlay inode.
Modify it to do checks both on overlay inode as well as underlying inode.
Checks on overlay inode will be done with the creds of calling task while
checks on underlying inode will be done with the creds of mounter.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Now we are planning to do DAC permission checks on overlay inode
itself. And to make it work, we will need to make sure we can get acls from
underlying inode. So define ->get_acl() for overlay inodes and this in turn
calls into underlying filesystem to get acls, if any.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
ovl_create_upper() and ovl_create_over_whiteout() seem to be sharing some
common code which can be moved into a separate function. No functionality
change.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Previously this was only done for directory inodes. Doing so for all
inodes makes for a nice cleanup in ovl_permission at zero cost.
Inodes are not shared for hard links on the overlay, so this works fine.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
The hash salting changes meant that we can no longer reuse the hash in the
overlay dentry to look up the underlying dentry.
Instead of lookup_hash(), use lookup_one_len_unlocked() and swith to
mounter's creds (like we do for all other operations later in the series).
Now the lookup_hash() export introduced in 4.6 by 3c9fe8cdff ("vfs: add
lookup_hash() helper") is unused and can possibly be removed; its
usefulness negated by the hash salting and the idea that mounter's creds
should be used on operations on underlying filesystems.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 8387ff2577 ("vfs: make the string hashes salt the hash")
Pull overlayfs fixes from Miklos Szeredi:
"This contains a fix for a potential crash/corruption issue and another
where the suid/sgid bits weren't cleared on write"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: verify upper dentry in ovl_remove_and_whiteout()
ovl: Copy up underlying inode's ->i_mode to overlay inode
ovl: handle ATTR_KILL*
The upper dentry may become stale before we call ovl_lock_rename_workdir.
For example, someone could (mistakenly or maliciously) manually unlink(2)
it directly from upperdir.
To ensure it is not stale, let's lookup it after ovl_lock_rename_workdir
and and check if it matches the upper dentry.
Essentially, it is the same problem and similar solution as in
commit 11f3710417 ("ovl: verify upper dentry before unlink and rename").
Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
Without this check, the following XFS_I invocations would return bad
pointers when used on non-XFS inodes (perhaps pointers into preceding
allocator chunks).
This could be used by an attacker to trick xfs_swap_extents into
performing locking operations on attacker-chosen structures in kernel
memory, potentially leading to code execution in the kernel. (I have
not investigated how likely this is to be usable for an attack in
practice.)
Signed-off-by: Jann Horn <jann@thejh.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs fixes from Al Viro.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
posix_acl: de-union a_refcount and a_rcu
nfs_atomic_open(): prevent parallel nfs_lookup() on a negative hashed
Use the right predicate in ->atomic_open() instances
- Provide a more concise fix for CVE-2016-1583
+ Additionally fixes linux-stable regressions caused by the cherry-picking of
the original fix
- Some very minor changes that have queued up
+ Fix typos in code comments
+ Remove unnecessary check for NULL before destroying kmem_cache
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCgAGBQJXf8nnAAoJENaSAD2qAscKwXgP/0awhY1z40dL/igP6fPv2ack
HbqrOjUVO2DzxinvKB3vRLNy93zwESxe8UpwPsl84IJ85zOQjkUkJ8PYk1oyBf0N
dVWqO11g6AKNZ+VQFspconvMhZATwSrsv8z3BzvwNGLsPhPuUQ+JmbBe8xMdrsZ5
qVaWswsMtMlhM3p/zFh57vWO64fT1xiabpxSkKpG2LHJN6h6QAQxkfBfa2FuXCsN
hZIw+ULcUJfdawXGq8lAfcYzbDmFpNt70fFquJgfJHrXFrOuensYfLcWUvhrSNbc
HZ6imRK9LCG4IKjJTBNmCmBR8ho71yGzdKuup81Eap+2zx2kC7twokS1d5fha8iL
Kzkx0NMDriY2N+tIfufHYk2IIenFzWG6Yuj0STswtJX4YhQGBc0H6VxcgrxE0PgW
k1iKUV7jnJGxxN+d6lmV4+fX0vKGgBMsQq1Q76CkYLN1BAvdwz6GnWSfqP8hWz3o
sNVyNtYh+/TXY8JMWKDBlps7Ib8W88qDW3K7YcAf2VPYAqIWm5Va1MR5m5s+UIeR
QiCD32X/0PfDp13QRiKAHJ6C9CInyu0r+fF/g8ZMqLuWgLxoahxpr6ML/CnHoGl5
IcDydJO3/bLBq9If8WxYsOQvVKCa4e7N7o7ZHPKd8U7O39mCGNfbQx7/FlMjtvf6
+4HAxamUC1ogpLTkpWxI
=Bt4P
-----END PGP SIGNATURE-----
Merge tag 'ecryptfs-4.7-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull eCryptfs fixes from Tyler Hicks:
"Provide a more concise fix for CVE-2016-1583:
- Additionally fixes linux-stable regressions caused by the
cherry-picking of the original fix
Some very minor changes that have queued up:
- Fix typos in code comments
- Remove unnecessary check for NULL before destroying kmem_cache"
* tag 'ecryptfs-4.7-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
ecryptfs: don't allow mmap when the lower fs doesn't support it
Revert "ecryptfs: forbid opening files without mmap handler"
ecryptfs: fix spelling mistakes
eCryptfs: fix typos in comment
ecryptfs: drop null test before destroy functions
There are legitimate reasons to disallow mmap on certain files, notably
in sysfs or procfs. We shouldn't emulate mmap support on file systems
that don't offer support natively.
CVE-2016-1583
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: stable@vger.kernel.org
[tyhicks: clean up f_op check by using ecryptfs_file_to_lower()]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
This reverts commit 2f36db7100.
It fixed a local root exploit but also introduced a dependency on
the lower file system implementing an mmap operation just to open a file,
which is a bit of a heavy hammer. The right fix is to have mmap depend
on the existence of the mmap handler instead.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Pull block IO fixes from Jens Axboe:
"Three small fixes that have been queued up and tested for this series:
- A bug fix for xen-blkfront from Bob Liu, fixing an issue with
incomplete requests during migration.
- A fix for an ancient issue in retrieving the IO priority of a
different PID than self, preventing that task from going away while
we access it. From Omar.
- A writeback fix from Tahsin, fixing a case where we'd call ihold()
with a zero ref count inode"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: fix use-after-free in sys_ioprio_get()
writeback: inode cgroup wb switch should not call ihold()
xen-blkfront: save uncompleted reqs in blkfront_resume()
- a fix from Marek for ppos handling in configfs_write_bin_file,
which was introduced in Linux 4.5, but didn't have any users
until recently.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXfjK1AAoJEA+eU2VSBFGDJ0kQAMZHONK7pHwM+IxDUeTsDTfa
FX+EplF1rLEtmUGOl01XbjgQp7acsP19YWikQfC09+ZjF6Vn1zFAFlNoU3qM+i/2
zdukzIRBaSM+4w4HFDQ548zGGc8e9mesIIUHrHt6n/nL0OLKTU0XzbmRMXmvXAUJ
u0nuB0OYlFJEVQFIlDYfG6E2rJy37FPilToonfw+AryVDenRm9iiUt0iMFoA8wPG
EpogUinelxrKZ+ysOEeibaTGxxLLbd3AbWeUQbkhmsk4FfxuV7GFSfGPbhmJ1LeU
n5X3LbK8lixG8goGdAW1NYulLnTjprZ6emUL0jwxYmI+MzOP7DUcsLqsGmdh5LEa
Uw5gzQnzNDOUFR8XFt5CgARUHRXeDyJfzKeWvKFd4JUink6wE9R0yQJ2bPBXu3pY
7t0P4qQSKWdeGjmYg54/JBamMba7BLQOLZuuiAplTTAt5Dg4tEi9Zuje2sUmBcDn
3YnG4dnGxPeP9EKCElh1WWtwiRItUKT+YtaqSinL1Rh2j1aWu9WJQ3M1C+s3hFQ3
vGR/CchllLtP4xmpY9TXEpUbBp4ZnTercWAxLRczi1/kOm3SdMIMUak26U6pTBmN
QfkEzSx+K5F1wGaoqa2j7MuTMSNsxfT1R0xA6aBid4oOyCoPyHOVJ3mqV0AziyOl
c7B92HgR/aGFMEevVA1G
=mZ2H
-----END PGP SIGNATURE-----
Merge tag 'configfs-for-4.7' of git://git.infradead.org/users/hch/configfs
Pull configfs fix from Christoph Hellwig:
"A fix from Marek for ppos handling in configfs_write_bin_file, which
was introduced in Linux 4.5, but didn't have any users until recently"
* tag 'configfs-for-4.7' of git://git.infradead.org/users/hch/configfs:
configfs: Remove ppos increment in configfs_write_bin_file
->atomic_open() can be given an in-lookup dentry *or* a negative one
found in dcache. Use d_in_lookup() to tell one from another, rather
than d_unhashed().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Right now when a new overlay inode is created, we initialize overlay
inode's ->i_mode from underlying inode ->i_mode but we retain only
file type bits (S_IFMT) and discard permission bits.
This patch changes it and retains permission bits too. This should allow
overlay to do permission checks on overlay inode itself in task context.
[SzM] It also fixes clearing suid/sgid bits on write.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 4bacc9c923 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Cc: <stable@vger.kernel.org>
Before 4bacc9c923 ("overlayfs: Make f_path...") file->f_path pointed to
the underlying file, hence suid/sgid removal on write worked fine.
After that patch file->f_path pointed to the overlay file, and the file
mode bits weren't copied to overlay_inode->i_mode. So the suid/sgid
removal simply stopped working.
The fix is to copy the mode bits, but then ovl_setattr() needs to clear
ATTR_MODE to avoid the BUG() in notify_change(). So do this first, then in
the next patch copy the mode.
Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 4bacc9c923 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Cc: <stable@vger.kernel.org>
Pull fuse fix from Miklos Szeredi:
"This makes sure userspace filesystems are not broken by the parallel
lookups and readdir feature"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: serialize dirops by default
Pull overlayfs fixes from Miklos Szeredi:
"This contains fixes for a dentry leak, a regression in 4.6 noticed by
Docker users and missing write access checking in truncate"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: warn instead of error if d_type is not supported
ovl: get_write_access() in truncate
ovl: fix dentry leak for default_permissions
overlay needs underlying fs to support d_type. Recently I put in a
patch in to detect this condition and started failing mount if
underlying fs did not support d_type.
But this breaks existing configurations over kernel upgrade. Those who
are running docker (partially broken configuration) with xfs not
supporting d_type, are surprised that after kernel upgrade docker does
not run anymore.
https://github.com/docker/docker/issues/22937#issuecomment-229881315
So instead of erroring out, detect broken configuration and warn
about it. This should allow existing docker setups to continue
working after kernel upgrade.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 45aebeaf4f ("ovl: Ensure upper filesystem supports d_type")
Cc: <stable@vger.kernel.org> 4.6
Pull vfs fixes from Al Viro:
"Tmpfs readdir throughput regression fix (this cycle) + some -stable
fodder all over the place.
One missing bit is Miklos' tonight locks.c fix - NFS folks had already
grabbed that one by the time I woke up ;-)"
[ The locks.c fix came through the nfsd tree just moments ago ]
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
namespace: update event counter when umounting a deleted dentry
9p: use file_dentry()
ceph: fix d_obtain_alias() misuses
lockless next_positive()
libfs.c: new helper - next_positive()
dcache_{readdir,dir_lseek}(): don't bother with nested ->d_lock
leases on overlayfs.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXdr1UAAoJECebzXlCjuG+QlsQAJiZTmio6k9tupN5+iKsZNL3
m919ooj8GYsXxlC0OTFfi09dUi1yeF8MEE3j9egk/+qxEtsdmKdEOIy0RVdcLSfd
HeGjXgLh79hVcGxgyBP+pdax2XhZ3RVisg8F5gTw2GPo+FPFZfrEuO5h7ctn+t45
MCQ+4yqYqzEhYnoPyo5XKh5Aj6wBWiaTzg3/jSe6uSuSfuBfyaMaBPq7l7ayGra/
5El+tu61o/SrJ41N2EayWSj/bOFJE92LIuGOh8NdfANuuP70JhxlwgVSldah3CCQ
6PymXAVcjw0+gJ00mokzKfTJW5FPfasxckMHaOvcFsSONy4rlmrwwqUr9C2AFzTE
wQGIzibCDYOSI8uF+//Oe+dh8JWp2TF8rfJcmKLyJMIcCq/Xl6cNx1qrq/oSWvuk
WKOv1otJrPeT31h/s5iLjr/E/Po1eX+d2ySJdvUHYu/5aZwFgWPnVXwJk0s9bLow
auZU85tPnuz+tbS2pWEK+el7LMgDBdzraVRogMdH1c+m3+G9pr53EzmpYovkZ2X8
duVJ2Leslyya347TnJAgEY47Fbeu26JaoeIChGVhKcEyCENlcqAJWaGVECrxvs3y
p/Y2MYMkO8YrCz5wQXPiLFiG4rAc+jSIn1Q+vRGl2Pkel0y7AgJNNMFtANjMCSIO
pg6BqUjOyKt8cXVy4UW7
=K0mr
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.7-3' of git://linux-nfs.org/~bfields/linux
Pull lockd/locks fixes from Bruce Fields:
"One fix for lockd soft lookups in an error path, and one fix for file
leases on overlayfs"
* tag 'nfsd-4.7-3' of git://linux-nfs.org/~bfields/linux:
locks: use file_inode()
lockd: unregister notifier blocks if the service fails to come up completely
Pull libnvdimm fixes from Dan Williams:
"1/ Two regression fixes since v4.6: one for the byte order of a sysfs
attribute (bz121161) and another for QEMU 2.6's NVDIMM _DSM (ACPI
Device Specific Method) implementation that gets tripped up by new
auto-probing behavior in the NFIT driver.
2/ A fix tagged for -stable that stops the kernel from
clobbering/ignoring changes to the configuration of a 'pfn'
instance ("struct page" driver). For example changing the
alignment from 2M to 1G may silently revert to 2M if that value is
currently stored on media.
3/ A fix from Eric for an xfstests failure in dax. It is not
currently tagged for -stable since it requires an 8-exabyte file
system to trigger, and there appear to be no user visible side
effects"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nfit: fix format interface code byte order
dax: fix offset overflow in dax_io
acpi, nfit: fix acpi_check_dsm() vs zero functions implemented
libnvdimm, pfn, dax: fix initialization vs autodetect for mode + alignment
(Another one for the f_path debacle.)
ltp fcntl33 testcase caused an Oops in selinux_file_send_sigiotask.
The reason is that generic_add_lease() used filp->f_path.dentry->inode
while all the others use file_inode(). This makes a difference for files
opened on overlayfs since the former will point to the overlay inode the
latter to the underlying inode.
So generic_add_lease() added the lease to the overlay inode and
generic_delete_lease() removed it from the underlying inode. When the file
was released the lease remained on the overlay inode's lock list, resulting
in use after free.
Reported-by: Eryu Guan <eguan@redhat.com>
Fixes: 4bacc9c923 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
- m_start() in fs/namespace.c expects that ns->event is incremented each
time a mount added or removed from ns->list.
- umount_tree() removes items from the list but does not increment event
counter, expecting that it's done before the function is called.
- There are some codepaths that call umount_tree() without updating
"event" counter. e.g. from __detach_mounts().
- When this happens m_start may reuse a cached mount structure that no
longer belongs to ns->list (i.e. use after free which usually leads
to infinite loop).
This change fixes the above problem by incrementing global event counter
before invoking umount_tree().
Change-Id: I622c8e84dcb9fb63542372c5dbf0178ee86bb589
Cc: stable@vger.kernel.org
Signed-off-by: Andrey Ulanov <andreyu@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
v9fs may be used as lower layer of overlayfs and accessing f_path.dentry
can lead to a crash. In this case it's a NULL pointer dereference in
p9_fid_create().
Fix by replacing direct access of file->f_path.dentry with the
file_dentry() accessor, which will always return a native object.
Reported-by: Alessio Igor Bogani <alessioigorbogani@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Tested-by: Alessio Igor Bogani <alessioigorbogani@gmail.com>
Fixes: 4bacc9c923 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Cc: <stable@vger.kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If the lockd service fails to start up then we need to be sure that the
notifier blocks are not registered, otherwise a subsequent start of the
service could cause the same notifier to be registered twice, leading to
soft lockups.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 0751ddf77b "lockd: Register callbacks on the inetaddr_chain..."
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Negotiate with userspace filesystems whether they support parallel readdir
and lookup. Disable parallelism by default for fear of breaking fuse
filesystems.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 9902af79c0 ("parallel lookups: actual switch to rwsem")
Fixes: d9b3dbdcfd ("fuse: switch to ->iterate_shared()")
The simple_write_to_buffer() already increments the @ppos on success,
see fs/libfs.c simple_write_to_buffer() comment:
"
On success, the number of bytes written is returned and the offset @ppos
advanced by this number, or negative value is returned on error.
"
If the configfs_write_bin_file() is invoked with @count smaller than the
total length of the written binary file, it will be invoked multiple times.
Since configfs_write_bin_file() increments @ppos on success, after calling
simple_write_to_buffer(), the @ppos is incremented twice.
Subsequent invocation of configfs_write_bin_file() will result in the next
piece of data being written to the offset twice as long as the length of
the previous write, thus creating buffer with "holes" in it.
The simple testcase using DTO follows:
$ mkdir /sys/kernel/config/device-tree/overlays/1
$ dd bs=1 if=foo.dtbo of=/sys/kernel/config/device-tree/overlays/1/dtbo
Without this patch, the testcase will result in twice as big buffer in the
kernel, which is then passed to the cfs_overlay_item_dtbo_write() .
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
The two methods essentially do the same: find the real dentry/inode
belonging to an overlay dentry. The difference is in the usage:
vfs_open() uses ->d_select_inode() and expects the function to perform
copy-up if necessary based on the open flags argument.
file_dentry() uses ->d_real() passing in the overlay dentry as well as the
underlying inode.
vfs_rename() uses ->d_select_inode() but passes zero flags. ->d_real()
with a zero inode would have worked just as well here.
This patch merges the functionality of ->d_select_inode() into ->d_real()
by adding an 'open_flags' argument to the latter.
[Al Viro] Make the signature of d_real() match that of ->d_real() again.
And constify the inode argument, while we are at it.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Stable bugfixes:
- Fix _cancel_empty_pagelist
- Fix a double page unlock
- Make nfs_atomic_open() call d_drop() on all ->open_context() errors.
- Fix another OPEN_DOWNGRADE bug
Other bugfixes:
- Ensure we handle delegation errors in nfs4_proc_layoutget()
- Layout stateids start out as being invalid
- Add sparse lock annotations for pnfs_find_alloc_layout
- Handle bad delegation stateids in nfs4_layoutget_handle_exception
- Fix up O_DIRECT results
- Fix potential use after free of state in nfs4_do_reclaim.
- Mark the layout stateid invalid when all segments are removed
- Don't let readdirplus revalidate an inode that was marked as stale
- Fix potential race in nfs_fhget()
- Fix an unused variable warning
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJXdDWkAAoJENfLVL+wpUDrpsUP/1F2zu12r/Zkv3ZEFhShcpQc
2N1TRD9X7Lruod2pUD95qqjjdw+/vu3LjcyJljrasRaJENijvZ2GQhKkB7xPODlu
qxZcmnQsH+WmpmJKcqAByAW1czcNGMoMHnt4tV0gG21NH+XUb92fgn+aGeIJDVrK
Hcd9d8TfnFWO70ZgTUXW/hv0CXwu4MEJhN2JfF4lolbxUkmjLHHLoxSDDm0AdXGC
EE8f0V9/7xurvOeLe5bQOQXfZPedBydsLNXa1ZacMGKgmBUoRNxJ5yCpPUtcTVBx
HwbiY+WDFQ7MdKTzUQqqbnrIqKw8Hu4SugIV/vHRqR+Lhc6u29YGOqdU4d2G8IKW
Nh8MBqS+dDefCkL3TJoE7MpjhP3EOO6HXnv5FjMZLOuu2X2o+Sz3+DkhYCq6pj/g
fFh480vZfZYaTsfDf1ttvVN8kIvQ+1Uk3LK6aC2EVwgPrv+0OIRu36F0JQQimxOp
EbYDlhVk7mzH/ZQ31GmPPbSIk+3sm2V58lqXnUMovoqPFPiN3xZDuBcnlyFrrzaI
NjOvsdVxkOdHWbYZyBQzj16Vo651EYbAAUEwsud70N8C3aCgkTxCZ30Q0v+KqqxU
pP5kz3zYUdkXQeHxE6T0iXG9fGcv/nGS21hTfT01YJCK7v67K8TYRNMrOEVURVgk
LSD/CZJXJHVJn1Vr4F7o
=IGNO
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.7-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client bugfixes from Anna Schumaker:
"Stable bugfixes:
- Fix _cancel_empty_pagelist
- Fix a double page unlock
- Make nfs_atomic_open() call d_drop() on all ->open_context() errors.
- Fix another OPEN_DOWNGRADE bug
Other bugfixes:
- Ensure we handle delegation errors in nfs4_proc_layoutget()
- Layout stateids start out as being invalid
- Add sparse lock annotations for pnfs_find_alloc_layout
- Handle bad delegation stateids in nfs4_layoutget_handle_exception
- Fix up O_DIRECT results
- Fix potential use after free of state in nfs4_do_reclaim.
- Mark the layout stateid invalid when all segments are removed
- Don't let readdirplus revalidate an inode that was marked as stale
- Fix potential race in nfs_fhget()
- Fix an unused variable warning"
* tag 'nfs-for-4.7-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Fix another OPEN_DOWNGRADE bug
make nfs_atomic_open() call d_drop() on all ->open_context() errors.
NFS: Fix an unused variable warning
NFS: Fix potential race in nfs_fhget()
NFS: Don't let readdirplus revalidate an inode that was marked as stale
NFSv4.1/pnfs: Mark the layout stateid invalid when all segments are removed
NFS: Fix a double page unlock
pnfs_nfs: fix _cancel_empty_pagelist
nfs4: Fix potential use after free of state in nfs4_do_reclaim.
NFS: Fix up O_DIRECT results
NFS/pnfs: handle bad delegation stateids in nfs4_layoutget_handle_exception
NFSv4.1/pnfs: Add sparse lock annotations for pnfs_find_alloc_layout
NFSv4.1/pnfs: Layout stateids start out as being invalid
NFSv4.1/pnfs: Ensure we handle delegation errors in nfs4_proc_layoutget()
When truncating a file we should check write access on the underlying
inode. And we should do so on the lower file as well (before copy-up) for
consistency.
Original patch and test case by Aihua Zhang.
- - >o >o - - test.c - - >o >o - -
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
int ret;
ret = truncate(argv[0], 4096);
if (ret != -1) {
fprintf(stderr, "truncate(argv[0]) should have failed\n");
return 1;
}
if (errno != ETXTBSY) {
perror("truncate(argv[0])");
return 1;
}
return 0;
}
- - >o >o - - >o >o - - >o >o - -
Reported-by: Aihua Zhang <zhangaihua1@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
When using the 'default_permissions' mount option, ovl_permission() on
non-directories was missing a dput(alias), resulting in "BUG Dentry still
in use".
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 8d3095f4ad ("ovl: default permissions")
Cc: <stable@vger.kernel.org> # v4.5+
Olga Kornievskaia reports that the following test fails to trigger
an OPEN_DOWNGRADE on the wire, and only triggers the final CLOSE.
fd0 = open(foo, RDRW) -- should be open on the wire for "both"
fd1 = open(foo, RDONLY) -- should be open on the wire for "read"
close(fd0) -- should trigger an open_downgrade
read(fd1)
close(fd1)
The issue is that we're missing a check for whether or not the current
state transitioned from an O_RDWR state as opposed to having transitioned
from a combination of O_RDONLY and O_WRONLY.
Reported-by: Olga Kornievskaia <aglo@umich.edu>
Fixes: cd9288ffae ("NFSv4: Fix another bug in the close/open_downgrade code")
Cc: stable@vger.kernel.org # 2.6.33+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
This isn't functionally apparent for some reason, but
when we test io at extreme offsets at the end of the loff_t
rang, such as in fstests xfs/071, the calculation of
"max" in dax_io() can be wrong due to pos + size overflowing.
For example,
# xfs_io -c "pwrite 9223372036854771712 512" /mnt/test/file
enters dax_io with:
start 0x7ffffffffffff000
end 0x7ffffffffffff200
and the rounded up "size" variable is 0x1000. This yields:
pos + size 0x8000000000000000 (overflows loff_t)
end 0x7ffffffffffff200
Due to the overflow, the min() function picks the wrong
value for the "max" variable, and when we send (max - pos)
into i.e. copy_from_iter_pmem() it is also the wrong value.
This somehow(tm) gets magically absorbed without incident,
probably because iter->count is correct. But it seems best
to fix it up properly by comparing the two values as
unsigned.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull cifs fixes from Steve French:
"Various small cifs/smb3 fixes, include some for stable, and some from
the recent SMB3 test event"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
File names with trailing period or space need special case conversion
Fix reconnect to not defer smb3 session reconnect long after socket reconnect
cifs: check hash calculating succeeded
cifs: dynamic allocation of ntlmssp blob
cifs: use CIFS_MAX_DOMAINNAME_LEN when converting the domain name
cifs: stuff the fl_owner into "pid" field in the lock request
In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code"
unconditional d_drop() after the ->open_context() had been removed. It had
been correct for success cases (there ->open_context() itself had been doing
dcache manipulations), but not for error ones. Only one of those (ENOENT)
got a compensatory d_drop() added in that commit, but in fact it should've
been done for all errors. As it is, the case of O_CREAT non-exclusive open
on a hashed negative dentry racing with e.g. symlink creation from another
client ended up with ->open_context() getting an error and proceeding to
call nfs_lookup(). On a hashed dentry, which would've instantly triggered
BUG_ON() in d_materialise_unique() (or, these days, its equivalent in
d_splice_alias()).
Cc: stable@vger.kernel.org # v3.10+
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Pull btrfs fixes part 2 from Chris Mason:
"This has one patch from Omar to bring iterate_shared back to btrfs.
We have a tree of work we queue up for directory items and it doesn't
lend itself well to shared access. While we're cleaning it up, Omar
has changed things to use an exclusive lock when there are delayed
items"
* 'for-linus-4.7-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes
Pull btrfs fixes from Chris Mason:
"I have a two part pull this time because one of the patches Dave
Sterba collected needed to be against v4.7-rc2 or higher (we used
rc4). I try to make my for-linus-xx branch testable on top of the
last major so we can hand fixes to people on the list more easily, so
I've split this pull in two.
This first part has some fixes and two performance improvements that
we've been testing for some time.
Josef's two performance fixes are most notable. The transid tracking
patch makes a big improvement on pretty much every workload"
* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: Force stripesize to the value of sectorsize
btrfs: fix disk_i_size update bug when fallocate() fails
Btrfs: fix error handling in map_private_extent_buffer
Btrfs: fix error return code in btrfs_init_test_fs()
Btrfs: don't do nocow check unless we have to
btrfs: fix deadlock in delayed_ref_async_start
Btrfs: track transid for delayed ref flushing
Commit fe742fd4f9 ("Revert "btrfs: switch to ->iterate_shared()"")
backed out the conversion to ->iterate_shared() for Btrfs because the
delayed inode handling in btrfs_real_readdir() is racy. However, we can
still do readdir in parallel if there are no delayed nodes.
This is a temporary fix which upgrades the shared inode lock to an
exclusive lock only when we have delayed items until we come up with a
more complete solution. While we're here, rename the
btrfs_{get,put}_delayed_items functions to make it very clear that
they're just for readdir.
Tested with xfstests and by doing a parallel kernel build:
while make tinyconfig && make -j4 && git clean dqfx; do
:
done
along with a bunch of parallel finds in another shell:
while true; do
for ((i=0; i<4; i++)); do
find . >/dev/null &
done
wait
done
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>