linux_old1

Commit Graph

Author	SHA1	Message	Date
Ian Abbott	6407f44aaf	fuse: Add ioctl flag for x32 compat ioctl Currently, a CUSE server running on a 64-bit kernel can tell when an ioctl request comes from a process running a 32-bit ABI, but cannot tell whether the requesting process is using legacy IA32 emulation or x32 ABI. In particular, the server does not know the size of the client process's `time_t` type. For 64-bit kernels, the `FUSE_IOCTL_COMPAT` and `FUSE_IOCTL_32BIT` flags are currently set in the ioctl input request (`struct fuse_ioctl_in` member `flags`) for a 32-bit requesting process. This patch defines a new flag `FUSE_IOCTL_COMPAT_X32` and sets it if the 32-bit requesting process is using the x32 ABI. This allows the server process to distinguish between requests coming from client processes using IA32 emulation or the x32 ABI and so infer the size of the client process's `time_t` type and any other IA32/x32 differences. Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-04-24 17:05:07 +02:00
Alan Somers	154603fe3e	fuse: document fuse_fsync_in.fsync_flags The FUSE_FSYNC_DATASYNC flag was introduced by commit `b6aeadeda2` ("[PATCH] FUSE - file operations") as a magic number. No new values have been added to fsync_flags since. Signed-off-by: Alan Somers <asomers@FreeBSD.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-04-24 17:05:07 +02:00
Kirill Smelkov	bbd84f3365	fuse: Add FOPEN_STREAM to use stream_open() Starting from commit `9c225f2655` ("vfs: atomic f_pos accesses as per POSIX") files opened even via nonseekable_open gate read and write via lock and do not allow them to be run simultaneously. This can create read vs write deadlock if a filesystem is trying to implement a socket-like file which is intended to be simultaneously used for both read and write from filesystem client. See commit `10dce8af34` ("fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock") for details and e.g. commit `581d21a2d0` ("xenbus: fix deadlock on writes to /proc/xen/xenbus") for a similar deadlock example on /proc/xen/xenbus. To avoid such deadlock it was tempting to adjust fuse_finish_open to use stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE, and in particular GVFS which actually uses offset in its read and write handlers https://codesearch.debian.net/search?q=-%3Enonseekable+%3D https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481 so if we would do such a change it will break a real user. Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the opened handler is having stream-like semantics; does not use file position and thus the kernel is free to issue simultaneous read and write request on opened file handle. This patch together with stream_open() should be added to stable kernels starting from v3.14+. This will allow to patch OSSPD and other FUSE filesystems that provide stream-like files to return FOPEN_STREAM \| FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all kernel versions. This should work because fuse_finish_open ignores unknown open flags returned from a filesystem and so passing FOPEN_STREAM to a kernel that is not aware of this flag cannot hurt. In turn the kernel that is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE is sufficient to implement streams without read vs write deadlock. Cc: stable@vger.kernel.org # v3.14+ Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-04-24 17:05:07 +02:00
Liu Bo	0cbade024b	fuse: honor RLIMIT_FSIZE in fuse_file_fallocate fstests generic/228 reported this failure that fuse fallocate does not honor what 'ulimit -f' has set. This adds the necessary inode_newsize_ok() check. Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Fixes: `05ba1f0823` ("fuse: add FALLOCATE operation") Cc: <stable@vger.kernel.org> # v3.5 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-04-24 17:05:06 +02:00
Miklos Szeredi	9de5be06d0	fuse: fix writepages on 32bit Writepage requests were cropped to i_size & 0xffffffff, which meant that mmaped writes to any file larger than 4G might be silently discarded. Fix by storing the file size in a properly sized variable (loff_t instead of size_t). Reported-by: Antonio SJ Musumeci <trapexit@spawn.link> Fixes: `6eaf4782eb` ("fuse: writepages: crop secondary requests") Cc: <stable@vger.kernel.org> # v3.13 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-04-24 17:05:06 +02:00
Chad Austin	fabf7e0262	fuse: cache readdir calls if filesystem opts out of opendir If a filesystem returns ENOSYS from opendir and thus opts out of opendir and releasedir requests, it almost certainly would also like readdir results cached. Default open_flags to FOPEN_KEEP_CACHE and FOPEN_CACHE_DIR in that case. With this patch, I've measured recursive directory enumeration across large FUSE mounts to be faster than native mounts. Signed-off-by: Chad Austin <chadaustin@fb.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:15 +01:00
Chad Austin	d9a9ea94f7	fuse: support clients that don't implement 'opendir' Allow filesystems to return ENOSYS from opendir, preventing the kernel from sending opendir and releasedir messages in the future. This avoids userspace transitions when filesystems don't need to keep track of state per directory handle. A new capability flag, FUSE_NO_OPENDIR_SUPPORT, parallels FUSE_NO_OPEN_SUPPORT, indicating the new semantics for returning ENOSYS from opendir. Signed-off-by: Chad Austin <chadaustin@fb.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:15 +01:00
Miklos Szeredi	2f7b6f5bed	fuse: lift bad inode checks into callers Bad inode checks were done done in various places, and move them into fuse_file_{read\|write}_iter(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:15 +01:00
Miklos Szeredi	55752a3aba	fuse: multiplex cached/direct_io file operations This is cleanup, as well as allowing switching between I/O modes while the file is open in the future. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:15 +01:00
Miklos Szeredi	d4136d6075	fuse add copy_file_range to direct io fops Nothing preventing copy_file_range to work on files opened with FOPEN_DIRECT_IO. Fixes: `88bc7d5097` ("fuse: add support for copy_file_range()") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:14 +01:00
Miklos Szeredi	3c3db095b6	fuse: use iov_iter based generic splice helpers The default splice implementation is grossly inefficient and the iter based ones work just fine, so use those instead. I've measured an 8x speedup for splice write (with len = 128k). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:14 +01:00
Martin Raiber	23c94e1cdc	fuse: Switch to using async direct IO for FOPEN_DIRECT_IO Switch to using the async directo IO code path in fuse_direct_read_iter() and fuse_direct_write_iter(). This is especially important in connection with loop devices with direct IO enabled as loop assumes async direct io is actually async. Signed-off-by: Martin Raiber <martin@urbackup.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:14 +01:00
Miklos Szeredi	75126f5504	fuse: use atomic64_t for khctr ...to get rid of one more fc->lock use. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:14 +01:00
Kirill Tkhai	f15ecfef05	fuse: Introduce fi->lock to protect write related fields To minimize contention of fc->lock, this patch introduces a new spinlock for protection fuse_inode metadata: fuse_inode: writectr writepages write_files queued_writes attr_version inode: i_size i_nlink i_mtime i_ctime Also, it protects the fields changed in fuse_change_attributes_common() (too many to list). Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:14 +01:00
Kirill Tkhai	4510d86fbb	fuse: Convert fc->attr_version into atomic64_t This patch makes fc->attr_version of atomic64_t type, so fc->lock won't be needed to read or modify it anymore. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:13 +01:00
Kirill Tkhai	ebf84d0c72	fuse: Add fuse_inode argument to fuse_prepare_release() Here is preparation for next patches, which introduce new fi->lock for protection of ff->write_entry linked into fi->write_files. This patch just passes new argument to the function. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:13 +01:00
Kirill Tkhai	c5de16cca2	fuse: Replace page without copying in fuse_writepage_in_flight() It looks like we can optimize page replacement and avoid copying by simple updating the request's page. [SzM: swap with new request's tmp page to avoid use after free.] Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:12 +01:00
Miklos Szeredi	e2653bd53a	fuse: fix leaked aux requests Auxiliary requests chained on req->misc.write.next may be leaked on truncate. Free these as well if the parent request was truncated off. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:12 +01:00
Miklos Szeredi	419234d595	fuse: only reuse auxiliary request in fuse_writepage_in_flight() Don't reuse the queued request, even if it only contains a single page. This is needed because previous locking changes (spliting out fiq->waitq.lock from fc->lock) broke the assumption that request will remain in FR_PENDING at least until the new page contents are copied. This fix removes a slight optimization for a rare corner case, so we really shoudln't care. Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com> Fixes: `fd22d62ed0` ("fuse: no fc->lock for iqueue parts") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:12 +01:00
Miklos Szeredi	7f305ca192	fuse: clean up fuse_writepage_in_flight() Restructure the function to better separate the locked and the unlocked parts. Use the "old_req" local variable to mean only the queued request, and not any auxiliary requests added onto its misc.write.next list. These changes are in preparation for the following patch. Also turn BUG_ON instances into WARN_ON and add a header comment explaining what the function does. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:12 +01:00
Miklos Szeredi	2fe93bd432	fuse: extract fuse_find_writeback() helper Call this from fuse_range_is_writeback() and fuse_writepage_in_flight(). Turn a BUG_ON() into a WARN_ON() in the process. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-02-13 13:15:12 +01:00
Miklos Szeredi	a2ebba8241	fuse: decrement NR_WRITEBACK_TEMP on the right page NR_WRITEBACK_TEMP is accounted on the temporary page in the request, not the page cache page. Fixes: `8b284dc472` ("fuse: writepages: handle same page rewrites") Cc: <stable@vger.kernel.org> # v3.13 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-01-16 10:27:59 +01:00
Chad Austin	2e64ff154c	fuse: continue to send FUSE_RELEASEDIR when FUSE_OPEN returns ENOSYS When FUSE_OPEN returns ENOSYS, the no_open bit is set on the connection. Because the FUSE_RELEASE and FUSE_RELEASEDIR paths share code, this incorrectly caused the FUSE_RELEASEDIR request to be dropped and never sent to userspace. Pass an isdir bool to distinguish between FUSE_RELEASE and FUSE_RELEASEDIR inside of fuse_file_put. Fixes: `7678ac5061` ("fuse: support clients that don't implement 'open'") Cc: <stable@vger.kernel.org> # v3.14 Signed-off-by: Chad Austin <chadaustin@fb.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-12-11 21:47:28 +01:00
Miklos Szeredi	a9c2d1e82f	fuse: fix fsync on directory Commit `ab2257e994` ("fuse: reduce size of struct fuse_inode") moved parts of fields related to writeback on regular file and to directory caching into a union. However fuse_fsync_common() called from fuse_dir_fsync() touches some writeback related fields, resulting in a crash. Move writeback related parts from fuse_fsync_common() to fuse_fysnc(). Reported-by: Brett Girton <btgirton@gmail.com> Tested-by: Brett Girton <btgirton@gmail.com> Fixes: `ab2257e994` ("fuse: reduce size of struct fuse_inode") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-12-03 10:14:43 +01:00
Lukas Czerner	ebacb81273	fuse: fix use-after-free in fuse_direct_IO() In async IO blocking case the additional reference to the io is taken for it to survive fuse_aio_complete(). In non blocking case this additional reference is not needed, however we still reference io to figure out whether to wait for completion or not. This is wrong and will lead to use-after-free. Fix it by storing blocking information in separate variable. This was spotted by KASAN when running generic/208 fstest. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reported-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: `744742d692` ("fuse: Add reference counting for fuse_io_priv") Cc: <stable@vger.kernel.org> # v4.6	2018-11-09 15:52:17 +01:00
Linus Torvalds	9931a07d51	Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull AFS updates from Al Viro: "AFS series, with some iov_iter bits included" * 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) missing bits of "iov_iter: Separate type from direction and use accessor functions" afs: Probe multiple fileservers simultaneously afs: Fix callback handling afs: Eliminate the address pointer from the address list cursor afs: Allow dumping of server cursor on operation failure afs: Implement YFS support in the fs client afs: Expand data structure fields to support YFS afs: Get the target vnode in afs_rmdir() and get a callback on it afs: Calc callback expiry in op reply delivery afs: Fix FS.FetchStatus delivery from updating wrong vnode afs: Implement the YFS cache manager service afs: Remove callback details from afs_callback_break struct afs: Commit the status on a new file/dir/symlink afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS afs: Don't invoke the server to read data beyond EOF afs: Add a couple of tracepoints to log I/O errors afs: Handle EIO from delivery function afs: Fix TTL on VL server and address lists afs: Implement VL server rotation afs: Improve FS server rotation error handling ...	2018-11-01 19:58:52 -07:00
David Howells	00e2370744	iov_iter: Use accessor function Use accessor functions to access an iterator's type and direction. This allows for the possibility of using some other method of determining the type of iterator than if-chains with bitwise-AND conditions. Signed-off-by: David Howells <dhowells@redhat.com>	2018-10-24 00:40:44 +01:00
Miklos Szeredi	9a2eb24d1a	fuse: only invalidate atime in direct read After sending a synchronous READ request from __fuse_direct_read() we only need to invalidate atime; none of the other attributes should be changed by a read(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-10-15 15:43:06 +02:00
Miklos Szeredi	e52a825048	fuse: realloc page array Writeback caching currently allocates requests with the maximum number of possible pages, while the actual number of pages per request depends on a couple of factors that cannot be determined when the request is allocated (whether page is already under writeback, whether page is contiguous with previous pages already added to a request). This patch allows such requests to start with no page allocation (all pages inline) and grow the page array on demand. If the max_pages tunable remains the default value, then this will mean just one allocation that is the same size as before. If the tunable is larger, then this adds at most 3 additional memory allocations (which is generously compensated by the improved performance from the larger request). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-10-01 10:07:06 +02:00
Constantine Shulyupin	5da784cce4	fuse: add max_pages to init_out Replace FUSE_MAX_PAGES_PER_REQ with the configurable parameter max_pages to improve performance. Old RFC with detailed description of the problem and many fixes by Mitsuo Hayasaka (mitsuo.hayasaka.hu@hitachi.com): - https://lkml.org/lkml/2012/7/5/136 We've encountered performance degradation and fixed it on a big and complex virtual environment. Environment to reproduce degradation and improvement: 1. Add lag to user mode FUSE Add nanosleep(&(struct timespec){ 0, 1000 }, NULL); to xmp_write_buf in passthrough_fh.c 2. patch UM fuse with configurable max_pages parameter. The patch will be provided latter. 3. run test script and perform test on tmpfs fuse_test() { cd /tmp mkdir -p fusemnt passthrough_fh -o max_pages=$1 /tmp/fusemnt grep fuse /proc/self/mounts dd conv=fdatasync oflag=dsync if=/dev/zero of=fusemnt/tmp/tmp \ count=1K bs=1M 2>&1 \| grep -v records rm fusemnt/tmp/tmp killall passthrough_fh } Test results: passthrough_fh /tmp/fusemnt fuse.passthrough_fh \ rw,nosuid,nodev,relatime,user_id=0,group_id=0 0 0 1073741824 bytes (1.1 GB) copied, 1.73867 s, 618 MB/s passthrough_fh /tmp/fusemnt fuse.passthrough_fh \ rw,nosuid,nodev,relatime,user_id=0,group_id=0,max_pages=256 0 0 1073741824 bytes (1.1 GB) copied, 1.15643 s, 928 MB/s Obviously with bigger lag the difference between 'before' and 'after' will be more significant. Mitsuo Hayasaka, in 2012 (https://lkml.org/lkml/2012/7/5/136), observed improvement from 400-550 to 520-740. Signed-off-by: Constantine Shulyupin <const@MakeLinux.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-10-01 10:07:06 +02:00
Miklos Szeredi	ab2257e994	fuse: reduce size of struct fuse_inode Do this by grouping fields used for cached writes and putting them into a union with fileds used for cached readdir (with obviously no overlap, since we don't have hybrid objects). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-10-01 10:07:05 +02:00
Miklos Szeredi	5d7bc7e868	fuse: allow using readdir cache The cache is only used if it's completed, not while it's still being filled; this constraint could be lifted later, if it turns out to be useful. Introduce state in struct fuse_file that indicates the position within the cache. After a seek, reset the position to the beginning of the cache and search the cache for the current position. If the current position is not found in the cache, then fall back to uncached readdir. It can also happen that page(s) disappear from the cache, in which case we must also fall back to uncached readdir. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-10-01 10:07:04 +02:00
Kirill Tkhai	63825b4e1d	fuse: do not take fc->lock in fuse_request_send_background() Currently, we take fc->lock there only to check for fc->connected. But this flag is changed only on connection abort, which is very rare operation. So allow checking fc->connected under just fc->bg_lock and use this lock (as well as fc->lock) when resetting fc->connected. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-09-28 16:43:23 +02:00
Kirill Tkhai	ae2dffa394	fuse: introduce fc->bg_lock To reduce contention of fc->lock, this patch introduces bg_lock for protection of fields related to background queue. These are: max_background, congestion_threshold, num_background, active_background, bg_queue and blocked. This allows next patch to make async reads not requiring fc->lock, so async reads and writes will have better performance executed in parallel. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-09-28 16:43:22 +02:00
Niels de Vos	88bc7d5097	fuse: add support for copy_file_range() There are several FUSE filesystems that can implement server-side copy or other efficient copy/duplication/clone methods. The copy_file_range() syscall is the standard interface that users have access to while not depending on external libraries that bypass FUSE. Signed-off-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-09-28 16:43:22 +02:00
Linus Torvalds	ad1d697358	fuse update for 4.19 This contains various bug fixes and cleanups. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCW3xvGwAKCRDh3BK/laaZ PKECAP9qUpdtQ5RaIL/y9OGZzJLSZbBZuK3LGNY2u2B3EfrSjgEAvhkhXyOQgvVi kgYLNszbg/C+w8U4Xc5GWB6cjNm6rwE= =GJI7 -----END PGP SIGNATURE----- Merge tag 'fuse-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse update from Miklos Szeredi: "Various bug fixes and cleanups" * tag 'fuse-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: reduce allocation size for splice_write fuse: use kvmalloc to allocate array of pipe_buffer structs. fuse: convert last timespec use to timespec64 fs: fuse: Adding new return type vm_fault_t fuse: simplify fuse_abort_conn() fuse: Add missed unlock_page() to fuse_readpages_fill() fuse: Don't access pipe->buffers without pipe_lock() fuse: fix initial parallel dirops fuse: Fix oops at process_init_reply() fuse: umount should wait for all requests fuse: fix unlocked access to processing queue fuse: fix double request_end()	2018-08-21 18:47:36 -07:00
Souptick Joarder	46fb504a71	fs: fuse: Adding new return type vm_fault_t Use new return type vm_fault_t for fault handler in struct vm_operations_struct. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. commit `1c8f422059` ("mm: change return type to vm_fault_t") Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-07-26 16:13:12 +02:00
Kirill Tkhai	109728ccc5	fuse: Add missed unlock_page() to fuse_readpages_fill() The above error path returns with page unlocked, so this place seems also to behave the same. Fixes: `f8dbdf8182` ("fuse: rework fuse_readpages()") Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2018-07-26 16:13:12 +02:00
Eric W. Biederman	7a36094d61	pids: Compute task_tgid using signal->leader_pid The cost is the the same and this removes the need to worry about complications that come from de_thread and group_leader changing. __task_pid_nr_ns has been updated to take advantage of this change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-07-21 10:43:12 -05:00
Linus Torvalds	a9a08845e9	vfs: do bulk POLL* -> EPOLL* replacement This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V \| grep -v '^t' \| grep -v /um/ \| grep -v '^sa' \| grep -v '/poll.h$'\|grep -v '^D'` for f in $L; do sed -i "-es/^$[^\"]$$\<POLL$V\>$/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-02-11 14:34:03 -08:00
Al Viro	c71d227fc4	make kernel-side POLL... arch-independent mangle/demangle on the way to/from userland Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-11-29 19:00:41 -05:00
Al Viro	076ccb76e1	fs: annotate ->poll() instances Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-11-27 16:20:05 -05:00
Linus Torvalds	e7989f973a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: "This fixes a regression (spotted by the Sandstorm.io folks) in the pid namespace handling introduced in 4.12. There's also a fix for honoring sync/dsync flags for pwritev2()" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: getattr cleanup fuse: honor iocb sync flags on write fuse: allow server to run in different pid_ns	2017-09-13 10:10:19 -07:00
Miklos Szeredi	5b97eeacbd	fuse: getattr cleanup The refreshed argument isn't used by any caller, get rid of it. Use a helper for just updating the inode (no need to fill in a kstat). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-09-12 16:57:54 +02:00
Miklos Szeredi	e1c0eecba1	fuse: honor iocb sync flags on write If the IOCB_DSYNC flag is set a sync is not being performed by fuse_file_write_iter. Honor IOCB_DSYNC/IOCB_SYNC by setting O_DYSNC/O_SYNC respectively in the flags filed of the write request. We don't need to sync data or metadata, since fuse_perform_write() does write-through and the filesystem is responsible for updating file times. Original patch by Vitaly Zolotusky. Reported-by: Nate Clark <nate@neworld.us> Cc: Vitaly Zolotusky <vitaly@unitc.com>. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-09-12 16:57:53 +02:00
Miklos Szeredi	5d6d3a301c	fuse: allow server to run in different pid_ns Commit `0b6e9ea041` ("fuse: Add support for pid namespaces") broke Sandstorm.io development tools, which have been sending FUSE file descriptors across PID namespace boundaries since early 2014. The above patch added a check that prevented I/O on the fuse device file descriptor if the pid namespace of the reader/writer was different from the pid namespace of the mounter. With this change passing the device file descriptor to a different pid namespace simply doesn't work. The check was added because pids are transferred to/from the fuse userspace server in the namespace registered at mount time. To fix this regression, remove the checks and do the following: 1) the pid in the request header (the pid of the task that initiated the filesystem operation) is translated to the reader's pid namespace. If a mapping doesn't exist for this pid, then a zero pid is used. Note: even if a mapping would exist between the initiator task's pid namespace and the reader's pid namespace the pid will be zero if either mapping from initator's to mounter's namespace or mapping from mounter's to reader's namespace doesn't exist. 2) The lk.pid value in setlk/setlkw requests and getlk reply is left alone. Userspace should not interpret this value anyway. Also allow the setlk/setlkw operations if the pid of the task cannot be represented in the mounter's namespace (pid being zero in that case). Reported-by: Kenton Varda <kenton@sandstorm.io> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: `0b6e9ea041` ("fuse: Add support for pid namespaces") Cc: <stable@vger.kernel.org> # v4.12+ Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Seth Forshee <seth.forshee@canonical.com>	2017-09-12 16:57:53 +02:00
Linus Torvalds	ec3604c7a5	Writeback error handling fixes for v4.14 -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJZrTy3AAoJEAAOaEEZVoIVaucP/ApBAj2S5wzvlV1u6l8E6ae7 ZeEEZfcWwzRYlKjZAkTWqj9XvGpDGO5gLq4wsZK2edFAq++/MJF8ZVtN4tdZ1kUZ DUvRodtVOrT08Kp9wZXGT7JOFrf6U/6gMcR6p0MuWnHndeKYvlpcFi9NPT4EC9/z Zm9V7gtlPdSOha7eaSjUS0+vLERkxqXLBW3Av9QUOBP/lbI3lqIroGKeHDYnVdya 2P/k5EcRRJMyJP6TqyYxmmJl+UWjJFMLvnlUDBslHnD/u3mIUhw3JLHYBjn5dZRE Xjq56IDPoXDUvzlBhtn/Uqyx+/wtwsNsylpmKv6K5G1JfdeuSsPVsCey+A1cqV64 LpE5896wf9TmnmI9LNyh6vDn925xPSGBiF45UEp5f9aO7jXeY0MaEZ8g+ENqFIDK v4gtZdS9FhYHV+/l4qEwYMKrqSbwKEs1r1FT+f4wnABby1ojfdA57ZPlp5PV2Vjp szTp88Zkb7cMvZwEnWwxWofcJNmgS7uNahvnQF3IJ4ITsioEkuyYR3K4ZQMaaaV9 wCp6G0FhXZaK3OI7o9WiDwaO2elp9Hxc8bnqKpiBbHZkY0NLh7/++5VxpeNbTHFy AGijQiiKGNNyYqNj93wq9jpVdMNjB0pXrHRxfav8v7MtQ+WfbEoAENF4T7hN7iXn UuF6eSWEC5O1UCRUk1A+ =LLY3 -----END PGP SIGNATURE----- Merge tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull writeback error handling updates from Jeff Layton: "This pile continues the work from last cycle on better tracking writeback errors. In v4.13 we added some basic errseq_t infrastructure and converted a few filesystems to use it. This set continues refining that infrastructure, adds documentation, and converts most of the other filesystems to use it. The main exception at this point is the NFS client" * tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: ecryptfs: convert to file_write_and_wait in ->fsync mm: remove optimizations based on i_size in mapping writeback waits fs: convert a pile of fsync routines to errseq_t based reporting gfs2: convert to errseq_t based writeback error reporting for fsync fs: convert sync_file_range to use errseq_t based error-tracking mm: add file_fdatawait_range and file_write_and_wait fuse: convert to errseq_t based error tracking for fsync mm: consolidate dax / non-dax checks for writeback Documentation: add some docs for errseq_t errseq: rename __errseq_set to errseq_set	2017-09-06 14:11:03 -07:00
Linus Torvalds	066dea8c30	File locking related changes for v4.14 -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJZrTzyAAoJEAAOaEEZVoIVj8wP/0sOxG+7vEEpe4uj2W52aq9T Y39/ZLfRTLm9SqgH61lkN+IyUsvDx+IP1ws2LBhp0IDRD9m40wdILhHZRWXJcRW2 ApEfmXF+rxnZZ6725ixX9w4Ylab2ZeGmKbzaG4wIjxfddftewZkJvFQcb1LZDfWq 1N0SF4KWoWN6t26Du5CHmYSj/Sz6YGrWGhF22u3mNfkGL+MmuKbz+kB3W+0q2NUF ZjkOIH9WcRiXgSlcHPBLre2EKHqHaNgb0s4Iofd3ZEe50v1NwY/vBMefxuwRdgKS kpLhIKIYMawrHn2rpV0jm12qdgCYj+t2kbVIUBDn3unBP2zYA0e/oo5HNIrroVlk Q6aGwmW0LN60rpd5qcRuNS1p1h2id2HpxEe98dsski6T8CVnj/nvu7EIxmWM02cf g2HeOd7bnl3+uu7SwSTkOVb6G7Kjn+Xufiz/n11mK6fl2jvOyWZZmDqhhjWAYJ8r t5mQVWJdEV12+6+A1WSv9DeS3TUgdYPCF8dzDtF+JVn3WEmxYHywH36Y3hKKz+BA gFEhnHvlyaVvpXCr8Y5BqNSfEfvZe/YUnmVReHpgBU/U4GJ17iQYk/g2vfmPLmsN IZ2OGCrDUc/LfdWc4llRyQBvlGT1KujaT0tbN7xnuWcS2qWdsfX4jDtDUH9E6pvK TB6Sw4Ike0ixamG8N8q/ =VPMU -----END PGP SIGNATURE----- Merge tag 'locks-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: "This pile just has a few file locking fixes from Ben Coddington. There are a couple of cleanup patches + an attempt to bring sanity to the l_pid value that is reported back to userland on an F_GETLK request. After a few gyrations, he came up with a way for filesystems to communicate to the VFS layer code whether the pid should be translated according to the namespace or presented as-is to userland" * tag 'locks-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: locks: restore a warn for leaked locks on close fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks fs/locks: Use allocation rather than the stack in fcntl_getlk()	2017-09-06 13:43:26 -07:00
Jeff Layton	9183976ef1	fuse: set mapping error in writepage_locked when it fails This ensures that we see errors on fsync when writeback fails. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-08-11 11:38:26 +02:00
Ashish Samant	61c12b49e1	fuse: Dont call set_page_dirty_lock() for ITER_BVEC pages for async_dio Commit `8fba54aebb` ("fuse: direct-io: don't dirty ITER_BVEC pages") fixes the ITER_BVEC page deadlock for direct io in fuse by checking in fuse_direct_io(), whether the page is a bvec page or not, before locking it. However, this check is missed when the "async_dio" mount option is enabled. In this case, set_page_dirty_lock() is called from the req->end callback in request_end(), when the fuse thread is returning from userspace to respond to the read request. This will cause the same deadlock because the bvec condition is not checked in this path. Here is the stack of the deadlocked thread, while returning from userspace: [13706.656686] INFO: task glusterfs:3006 blocked for more than 120 seconds. [13706.657808] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [13706.658788] glusterfs D ffffffff816c80f0 0 3006 1 0x00000080 [13706.658797] ffff8800d6713a58 0000000000000086 ffff8800d9ad7000 ffff8800d9ad5400 [13706.658799] ffff88011ffd5cc0 ffff8800d6710008 ffff88011fd176c0 7fffffffffffffff [13706.658801] 0000000000000002 ffffffff816c80f0 ffff8800d6713a78 ffffffff816c790e [13706.658803] Call Trace: [13706.658809] [<ffffffff816c80f0>] ? bit_wait_io_timeout+0x80/0x80 [13706.658811] [<ffffffff816c790e>] schedule+0x3e/0x90 [13706.658813] [<ffffffff816ca7e5>] schedule_timeout+0x1b5/0x210 [13706.658816] [<ffffffff81073ffb>] ? gup_pud_range+0x1db/0x1f0 [13706.658817] [<ffffffff810668fe>] ? kvm_clock_read+0x1e/0x20 [13706.658819] [<ffffffff81066909>] ? kvm_clock_get_cycles+0x9/0x10 [13706.658822] [<ffffffff810f5792>] ? ktime_get+0x52/0xc0 [13706.658824] [<ffffffff816c6f04>] io_schedule_timeout+0xa4/0x110 [13706.658826] [<ffffffff816c8126>] bit_wait_io+0x36/0x50 [13706.658828] [<ffffffff816c7d06>] __wait_on_bit_lock+0x76/0xb0 [13706.658831] [<ffffffffa0545636>] ? lock_request+0x46/0x70 [fuse] [13706.658834] [<ffffffff8118800a>] __lock_page+0xaa/0xb0 [13706.658836] [<ffffffff810c8500>] ? wake_atomic_t_function+0x40/0x40 [13706.658838] [<ffffffff81194d08>] set_page_dirty_lock+0x58/0x60 [13706.658841] [<ffffffffa054d968>] fuse_release_user_pages+0x58/0x70 [fuse] [13706.658844] [<ffffffffa0551430>] ? fuse_aio_complete+0x190/0x190 [fuse] [13706.658847] [<ffffffffa0551459>] fuse_aio_complete_req+0x29/0x90 [fuse] [13706.658849] [<ffffffffa05471e9>] request_end+0xd9/0x190 [fuse] [13706.658852] [<ffffffffa0549126>] fuse_dev_do_write+0x336/0x490 [fuse] [13706.658854] [<ffffffffa054963e>] fuse_dev_write+0x6e/0xa0 [fuse] [13706.658857] [<ffffffff812a9ef3>] ? security_file_permission+0x23/0x90 [13706.658859] [<ffffffff81205300>] do_iter_readv_writev+0x60/0x90 [13706.658862] [<ffffffffa05495d0>] ? fuse_dev_splice_write+0x350/0x350 [fuse] [13706.658863] [<ffffffff812062a1>] do_readv_writev+0x171/0x1f0 [13706.658866] [<ffffffff810b3d00>] ? try_to_wake_up+0x210/0x210 [13706.658868] [<ffffffff81206361>] vfs_writev+0x41/0x50 [13706.658870] [<ffffffff81206496>] SyS_writev+0x56/0xf0 [13706.658872] [<ffffffff810257a1>] ? syscall_trace_leave+0xf1/0x160 [13706.658874] [<ffffffff816cbb2e>] system_call_fastpath+0x12/0x71 Fix this by making should_dirty a fuse_io_priv parameter that can be checked in fuse_aio_complete_req(). Reported-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Ashish Samant <ashish.samant@oracle.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-08-03 17:55:58 +02:00

1 2 3 4 5 ...

362 Commits