linux

Commit Graph

Author	SHA1	Message	Date
Mathieu Desnoyers	a961e40917	membarrier: Provide register expedited private command This introduces a "register private expedited" membarrier command which allows eventual removal of important memory barrier constraints on the scheduler fast-paths. It changes how the "private expedited" membarrier command (new to 4.14) is used from user-space. This new command allows processes to register their intent to use the private expedited command. This affects how the expedited private command introduced in 4.14-rc is meant to be used, and should be merged before 4.14 final. Processes are now required to register before using MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM. This fixes a problem that arose when designing requested extensions to sys_membarrier() to allow JITs to efficiently flush old code from instruction caches. Several potential algorithms are much less painful if the user register intent to use this functionality early on, for example, before the process spawns the second thread. Registering at this time removes the need to interrupt each and every thread in that process at the first expedited sys_membarrier() system call. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-10-19 22:13:40 -04:00
Linus Torvalds	73d3393ada	Changes since last update: - fix some more CONFIG_XFS_RT related build problems - fix data loss when writeback at eof races eofblocks gc and loses - invalidate page cache after fs finishes a dio write - remove dirty page state when invalidating pages so releasepage does the right thing when handed a dirty page -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCgAGBQJZ5jqbAAoJEPh/dxk0SrTrtfMP/jcQ6lTDcpnQ7XEP2fg2dXjx 2+z8uI7Mjr5wo2qfIWHc8nZHZ+8KRak4U28rTlrXkeVbJ79x3Z+SzeipP76dGHXB u9MD7uacTD6BDT7R8/bux7g7KrPATVJYJiT3PRHZ5ysUT6i9KnREdbaKpgOwhMcI Ivd9ROZHx62CmZhsbfLzD+Ccy9/mGBR5OmT8nQlsuD8cEcFU5u1afaJ2/YlCjNLN c16Q8dhGXed7tjduiYCzsxDiewJMzSfcGdyk6yCwXdR3zcI3RdhXUN5FRH0R9GB2 xxG1n5Q4qgtgODGgcPUl9WG8mfhVvEcuZGioxChQrxCEcaHt1Waop0fOixLy9J3Q lUn4qjA5S+VBqa6XsKCSCkiZdDtncSedvMRQYef09q8DGAouwAtN/Z3BVM24oyWU k5888Gt4EHZK6V3lz3qPMmGFxfuPL6GeyEvIYUezpVIYsmp0sLQTeNFUW+XC7fb/ tOBNom4ARHFmSb5da7uwJvesNZBVFSpFQtxkcx1OL0rhTqlKIfPP61dLznKhqUTL 2NhaFjnznYenSEK2CsP+V3CtQrCxywdqDNnOEgTgKJbWPpsYMX63z/Cmtm0A7Qdz BAbGc+OSBLqelwsWNnNzTWPHk33SKxtIxGTe8gKbKbrzbR7mxyJxHKEwpZvWIqh+ 8eTdgJb1wgJyqtBsTSHN =UY00 -----END PGP SIGNATURE----- Merge tag 'xfs-4.14-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Darrick Wong: - fix some more CONFIG_XFS_RT related build problems - fix data loss when writeback at eof races eofblocks gc and loses - invalidate page cache after fs finishes a dio write - remove dirty page state when invalidating pages so releasepage does the right thing when handed a dirty page * tag 'xfs-4.14-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: move two more RT specific functions into CONFIG_XFS_RT xfs: trim writepage mapping to within eof fs: invalidate page cache after end_io() in dio completion xfs: cancel dirty pages on invalidation	2017-10-18 14:51:50 -04:00
Linus Torvalds	020b302376	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Three small fixes: - A fix for skd, it was using kfree() to free a structure allocate with kmem_cache_alloc(). - Stable fix for nbd, fixing a regression using the normal ioctl based tools. - Fix for a previous fix in this series, that fixed up inconsistencies between buffered and direct IO" * 'for-linus' of git://git.kernel.dk/linux-block: fs: Avoid invalidation in interrupt context in dio_complete() nbd: don't set the device size until we're connected skd: Use kmem_cache_free	2017-10-18 14:43:40 -04:00
Lukas Czerner	ffe51f0142	fs: Avoid invalidation in interrupt context in dio_complete() Currently we try to defer completion of async DIO to the process context in case there are any mapped pages associated with the inode so that we can invalidate the pages when the IO completes. However the check is racy and the pages can be mapped afterwards. If this happens we might end up calling invalidate_inode_pages2_range() in dio_complete() in interrupt context which could sleep. This can be reproduced by generic/451. Fix this by passing the information whether we can or can't invalidate to the dio_complete(). Thanks Eryu Guan for reporting this and Jan Kara for suggesting a fix. Fixes: `332391a993` ("fs: Fix page cache inconsistency when mixing buffered and AIO DIO") Reported-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Tested-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-17 08:43:09 -06:00
Arnd Bergmann	785545c898	xfs: move two more RT specific functions into CONFIG_XFS_RT The last cleanup introduced two harmless warnings: fs/xfs/xfs_fsmap.c:480:1: warning: '__xfs_getfsmap_rtdev' defined but not used fs/xfs/xfs_fsmap.c:372:1: warning: 'xfs_getfsmap_rtdev_rtbitmap_helper' defined but not used This moves those two functions as well. Fixes: `bb9c2e5433` ("xfs: move more RT specific code under CONFIG_XFS_RT") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-10-16 12:26:50 -07:00
Brian Foster	40214d128e	xfs: trim writepage mapping to within eof The writeback rework in commit `fbcc025613` ("xfs: Introduce writeback context for writepages") introduced a subtle change in behavior with regard to the block mapping used across the ->writepages() sequence. The previous xfs_cluster_write() code would only flush pages up to EOF at the time of the writepage, thus ensuring that any pages due to file-extending writes would be handled on a separate cycle and with a new, updated block mapping. The updated code establishes a block mapping in xfs_writepage_map() that could extend beyond EOF if the file has post-eof preallocation. Because we now use the generic writeback infrastructure and pass the cached mapping to each writepage call, there is no implicit EOF limit in place. If eofblocks trimming occurs during ->writepages(), any post-eof portion of the cached mapping becomes invalid. The eofblocks code has no means to serialize against writeback because there are no pages associated with post-eof blocks. Therefore if an eofblocks trim occurs and is followed by a file-extending buffered write, not only has the mapping become invalid, but we could end up writing a page to disk based on the invalid mapping. Consider the following sequence of events: - A buffered write creates a delalloc extent and post-eof speculative preallocation. - Writeback starts and on the first writepage cycle, the delalloc extent is converted to real blocks (including the post-eof blocks) and the mapping is cached. - The file is closed and xfs_release() trims post-eof blocks. The cached writeback mapping is now invalid. - Another buffered write appends the file with a delalloc extent. - The concurrent writeback cycle picks up the just written page because the writeback range end is LLONG_MAX. xfs_writepage_map() attributes it to the (now invalid) cached mapping and writes the data to an incorrect location on disk (and where the file offset is still backed by a delalloc extent). This problem is reproduced by xfstests test generic/464, which triggers racing writes, appends, open/closes and writeback requests. To address this problem, trim the mapping used during writeback to within EOF when the mapping is validated. This ensures the mapping is revalidated for any pages encountered beyond EOF as of the time the current mapping was cached or last validated. Reported-by: Eryu Guan <eguan@redhat.com> Diagnosed-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-10-16 12:26:50 -07:00
Eryu Guan	5e25c269e1	fs: invalidate page cache after end_io() in dio completion Commit `332391a993` ("fs: Fix page cache inconsistency when mixing buffered and AIO DIO") moved page cache invalidation from iomap_dio_rw() to iomap_dio_complete() for iomap based direct write path, but before the dio->end_io() call, and it re-introdued the bug fixed by commit `c771c14baa` ("iomap: invalidate page caches should be after iomap_dio_complete() in direct write"). I found this because fstests generic/418 started failing on XFS with v4.14-rc3 kernel, which is the regression test for this specific bug. So similarly, fix it by moving dio->end_io() (which does the unwritten extent conversion) before page cache invalidation, to make sure next buffer read reads the final real allocations not unwritten extents. I also add some comments about why should end_io() go first in case we get it wrong again in the future. Note that, there's no such problem in the non-iomap based direct write path, because we didn't remove the page cache invalidation after the ->direct_IO() in generic_file_direct_write() call, but I decided to fix dio_complete() too so we don't leave a landmine there, also be consistent with iomap_dio_complete(). Fixes: `332391a993` ("fs: Fix page cache inconsistency when mixing buffered and AIO DIO") Signed-off-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Lukas Czerner <lczerner@redhat.com>	2017-10-16 12:11:56 -07:00
Dave Chinner	793d7dbe6d	xfs: cancel dirty pages on invalidation Recently we've had warnings arise from the vm handing us pages without bufferheads attached to them. This should not ever occur in XFS, but we don't defend against it properly if it does. The only place where we remove bufferheads from a page is in xfs_vm_releasepage(), but we can't tell the difference here between "page is dirty so don't release" and "page is dirty but is being invalidated so release it". In some places that are invalidating pages ask for pages to be released and follow up afterward calling ->releasepage by checking whether the page was dirty and then aborting the invalidation. This is a possible vector for releasing buffers from a page but then leaving it in the mapping, so we really do need to avoid dirty pages in xfs_vm_releasepage(). To differentiate between invalidated pages and normal pages, we need to clear the page dirty flag when invalidating the pages. This can be done through xfs_vm_invalidatepage(), and will result xfs_vm_releasepage() seeing the page as clean which matches the bufferhead state on the page after calling block_invalidatepage(). Hence we can re-add the page dirty check in xfs_vm_releasepage to catch the case where we might be releasing a page that is actually dirty and so should not have the bufferheads on it removed. This will remove one possible vector of "dirty page with no bufferheads" and so help narrow down the search for the root cause of that problem. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-10-16 12:11:56 -07:00
Eryu Guan	7e86600606	fs/binfmt_misc.c: node could be NULL when evicting inode inode->i_private is assigned by a Node pointer only after registering a new binary format, so it could be NULL if inode was created by bm_fill_super() (or iput() was called by the error path in bm_register_write()), and this could result in NULL pointer dereference when evicting such an inode. e.g. mount binfmt_misc filesystem then umount it immediately: mount -t binfmt_misc binfmt_misc /proc/sys/fs/binfmt_misc umount /proc/sys/fs/binfmt_misc will result in BUG: unable to handle kernel NULL pointer dereference at 0000000000000013 IP: bm_evict_inode+0x16/0x40 [binfmt_misc] ... Call Trace: evict+0xd3/0x1a0 iput+0x17d/0x1d0 dentry_unlink_inode+0xb9/0xf0 __dentry_kill+0xc7/0x170 shrink_dentry_list+0x122/0x280 shrink_dcache_parent+0x39/0x90 do_one_tree+0x12/0x40 shrink_dcache_for_umount+0x2d/0x90 generic_shutdown_super+0x1f/0x120 kill_litter_super+0x29/0x40 deactivate_locked_super+0x43/0x70 deactivate_super+0x45/0x60 cleanup_mnt+0x3f/0x70 __cleanup_mnt+0x12/0x20 task_work_run+0x86/0xa0 exit_to_usermode_loop+0x6d/0x99 syscall_return_slowpath+0xba/0xf0 entry_SYSCALL_64_fastpath+0xa3/0xa Fix it by making sure Node (e) is not NULL. Link: http://lkml.kernel.org/r/20171010100642.31786-1-eguan@redhat.com Fixes: `83f918274e` ("exec: binfmt_misc: shift filp_close(interp_file) from kill_node() to bm_evict_inode()") Signed-off-by: Eryu Guan <eguan@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-10-13 16:18:33 -07:00
Matthew Wilcox	f892760aa6	fs/mpage.c: fix mpage_writepage() for pages with buffers When using FAT on a block device which supports rw_page, we can hit BUG_ON(!PageLocked(page)) in try_to_free_buffers(). This is because we call clean_buffers() after unlocking the page we've written. Introduce a new clean_page_buffers() which cleans all buffers associated with a page and call it from within bdev_write_page(). [akpm@linux-foundation.org: s/PAGE_SIZE/~0U/ per Linus and Matthew] Link: http://lkml.kernel.org/r/20171006211541.GA7409@bombadil.infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reported-by: Toshi Kani <toshi.kani@hpe.com> Reported-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Tested-by: Toshi Kani <toshi.kani@hpe.com> Acked-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-10-13 16:18:33 -07:00
Linus Torvalds	8ff0b97cf2	Changes since last update: - Fix a stale kernel memory exposure when logging inodes. - Fix some build problems with CONFIG_XFS_RT=n - Don't change inode mode if the acl write fails, leaving the file totally inaccessible. - Fix a dangling pointer problem when removing an attr fork under memory pressure. - Don't crash while trying to invalidate a null buffer associated with a corrupt metadata pointer. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCgAGBQJZ3lPiAAoJEPh/dxk0SrTrfuMP/Axy7VSX71tE/eXPOmzxCVZD w4/usqO+OsQj+q8o+rwwuX9hz0VGF8kWZJOdgGdXpYT7pWqPmcf88wbThheTetLF fjevusqva0Ds+U4AE7DCNWSKQQRhu2jDgnhQXTv1hdYhWIF59qGwioIijbEvb72I 0QW+/uV9yXmODjWL6KfRh9zRT9N4npMtszukScONwJr9t0/5ub8H03H/ktv8T9oi C3ljEWwyMk5lEYH8p6tpta8EbY0mrIZgo+kj33PU5s9rHvcrTGtyPNqidREUm1fL X3+STMytcDQFAcZdBBXHN0nFMwa8ADTrVvKmEgaR8OsXmOmrlcPn7HfVVlWrY31w X3awJ0b0+IXUrsbbQOPeqgTo5hIkMDkMOga5AP/rqpx1yCCOrlMHaRPXB2NxNcVw dyTj6IpKybhsQ4GkcqmFcgnxPPaogNpYlp6SXV5Dm+8zEJdIQNUuci/EGsNz7UcV msxNlJJkxczXOew6JzCyw45wTnJCxduX7Y1xrOTLaDfa9pkWO2zQBXukCJNIqVIq 35Q4P4JVYtmwQr8XkkX9tiqU0gBWTCTG9KjmTCMm5MYkutEYM0uTNR5Jvyiobl7L Nn+RydssVw7ssnNfgsLhzQHPElUivRdYoYFSBa2DQp6ViILrefqQegd5INAjK63W 7vnHVZyJMHPM0YFoiX8w =6Yvh -----END PGP SIGNATURE----- Merge tag 'xfs-4.14-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Darrick Wong: - Fix a stale kernel memory exposure when logging inodes. - Fix some build problems with CONFIG_XFS_RT=n - Don't change inode mode if the acl write fails, leaving the file totally inaccessible. - Fix a dangling pointer problem when removing an attr fork under memory pressure. - Don't crash while trying to invalidate a null buffer associated with a corrupt metadata pointer. * tag 'xfs-4.14-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: handle error if xfs_btree_get_bufs fails xfs: reinit btree pointer on attr tree inactivation walk xfs: Fix bool initialization/comparison xfs: don't change inode mode if ACL update fails xfs: move more RT specific code under CONFIG_XFS_RT xfs: Don't log uninitialised fields in inode structures	2017-10-12 14:51:13 -07:00
Linus Torvalds	3206e7d5e2	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota fix from Jan Kara: "A fix for a regression in handling of quota grace times and warnings" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: Generate warnings for DQUOT_SPACE_NOFAIL allocations	2017-10-12 10:56:06 -07:00
Eric Sandeen	93e8befc17	xfs: handle error if xfs_btree_get_bufs fails Jason reported that a corrupted filesystem failed to replay the log with a metadata block out of bounds warning: XFS (dm-2): _xfs_buf_find: Block out of range: block 0x80270fff8, EOFS 0x9c40000 _xfs_buf_find() and xfs_btree_get_bufs() return NULL if that happens, and then when xfs_alloc_fix_freelist() calls xfs_trans_binval() on that NULL bp, we oops with: BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8 We don't handle _xfs_buf_find errors very well, every caller higher up the stack gets to guess at why it failed. But we should at least handle it somehow, so return EFSCORRUPTED here. Reported-by: Jason L Tibbitts III <tibbs@math.uh.edu> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-10-11 10:21:07 -07:00
Brian Foster	f35c5e10c6	xfs: reinit btree pointer on attr tree inactivation walk xfs_attr3_root_inactive() walks the attr fork tree to invalidate the associated blocks. xfs_attr3_node_inactive() recursively descends from internal blocks to leaf blocks, caching block address values along the way to revisit parent blocks, locate the next entry and descend down that branch of the tree. The code that attempts to reread the parent block is unsafe because it assumes that the local xfs_da_node_entry pointer remains valid after an xfs_trans_brelse() and re-read of the parent buffer. Under heavy memory pressure, it is possible that the buffer has been reclaimed and reallocated by the time the parent block is reread. This means that 'btree' can point to an invalid memory address, lead to a random/garbage value for child_fsb and cause the subsequent read of the attr fork to go off the rails and return a NULL buffer for an attr fork offset that is most likely not allocated. Note that this problem can be manufactured by setting XFS_ATTR_BTREE_REF to 0 to prevent LRU caching of attr buffers, creating a file with a multi-level attr fork and removing it to trigger inactivation. To address this problem, reinit the node/btree pointers to the parent buffer after it has been re-read. This ensures btree points to a valid record and allows the walk to proceed. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-10-11 10:21:07 -07:00
Thomas Meyer	749f24f33e	xfs: Fix bool initialization/comparison Bool initializations should use true and false. Bool tests don't need comparisons. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-10-11 10:21:06 -07:00
Dave Chinner	67f2ffe31d	xfs: don't change inode mode if ACL update fails If we get ENOSPC half way through setting the ACL, the inode mode can still be changed even though the ACL does not exist. Reorder the operation to only change the mode of the inode if the ACL is set correctly. Whilst this does not fix the problem with crash consistency (that requires attribute addition to be a deferred op) it does prevent ENOSPC and other non-fatal errors setting an xattr to be handled sanely. This fixes xfstests generic/449. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-10-11 10:21:06 -07:00
Dave Chinner	bb9c2e5433	xfs: move more RT specific code under CONFIG_XFS_RT Various utility functions and interfaces that iterate internal devices try to reference the realtime device even when RT support is not compiled into the kernel. Make sure this code is excluded from the CONFIG_XFS_RT=n build, and where appropriate stub functions to return fatal errors if they ever get called when RT support is not present. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-10-11 10:21:06 -07:00
Dave Chinner	20413e37d7	xfs: Don't log uninitialised fields in inode structures Prevent kmemcheck from throwing warnings about reading uninitialised memory when formatting inodes into the incore log buffer. There are several issues here - we don't always log all the fields in the inode log format item, and we never log the inode the di_next_unlinked field. In the case of the inode log format item, this is exacerbated by the old xfs_inode_log_format structure padding issue. Hence make the padded, 64 bit aligned version of the structure the one we always use for formatting the log and get rid of the 64 bit variant. This means we'll always log the 64-bit version and so recovery only needs to convert from the unpadded 32 bit version from older 32 bit kernels. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-10-11 10:21:06 -07:00
Alexander Levin	56ae414e9d	9p: set page uptodate when required in write_end() Commit `77469c3f57` prevented setting the page as uptodate when we wrote the right amount of data, fix that. Fixes: `77469c3f57` ("9p: saner ->write_end() on failing copy into non-uptodate page") Reviewed-by: Jan Kara <jack@suse.com> Signed-off-by: Alexander Levin <alexander.levin@verizon.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-10-11 09:30:08 -07:00
Linus Torvalds	ce3861819a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Fairly old DIO bug caught by Andreas (3.10+) and several slightly younger blk_rq_map_user_iov() bugs, both on map and copy codepaths (Vitaly and me)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: bio_copy_user_iov(): don't ignore ->iov_offset more bio_map_user_iov() leak fixes fix unbalanced page refcounting in bio_map_user_iov direct-io: Prevent NULL pointer access in submit_page_section	2017-10-11 09:00:22 -07:00
Andreas Gruenbacher	899f0429c7	direct-io: Prevent NULL pointer access in submit_page_section In the code added to function submit_page_section by commit `b1058b981`, sdio->bio can currently be NULL when calling dio_bio_submit. This then leads to a NULL pointer access in dio_bio_submit, so check for a NULL bio in submit_page_section before trying to submit it instead. Fixes xfstest generic/250 on gfs2. Cc: stable@vger.kernel.org # v3.10+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-10-10 23:10:02 -04:00
Linus Torvalds	f953d2481e	One fix for a 4.14 regression, and one minor fix to the MAINTAINERs file. (I was weirdly flattered by the idea that lots of random people suddenly seemed to think Jeff and I were VFS experts. Turns out it was just a typo.) -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJZ3RyyAAoJECebzXlCjuG+JvQP/RkwFqMZJHDjhSDhj/cr/t2o ciK5Xche1A4E5vaaPVV17w6OwIYTNhnQawwBtNw88GaqDUEELVyFZFzNtRm44Bv1 27RLOahPTT6bmHl/cd+uNpgpXs9svuNF6x4C5SUmKTm4kFdLBP7khjdcnFhwFi2y OerDFj4XmPsUDqW8dv7a7XktRf1klMvhbRh80r9TR5JW+h4IYQIYNevue9CABpUm 4vvv4kAyxo8oodslCMQ5OyWpG4NDDsFADtlLn++9tzUl7y5j6TQyIYfeYDH3XOru 5Ara5pkuxloS1Fu4EtEInF3iLAjMZkJD+QgHFhf2/mLMzQhZZzpbnFYPhrgyQONv wR3u7DaH2t/JbYtlSnKQpLEG0hv2hSBQ33G4ysKUHXrhnF5DC9N59epcA2X34++B DSwyc2wgxNfr8OGPyaNNw/kcBJyahNvsxlpTxZfTnvc0p4M1dzr1mxl/zsGC2b3v Ei1Y+u5JU2d/jmzeTOLCGtc59UyAoswdVzNa8SNYad1Tu5eAr81uooCPUvj77lTj GWQa9wYSOxt+Ld295dtzagqx+hQFdVKa+QTzfaZuPHeuUWmhQLGgalWXCxlVKtuF SGfAfutikQ4zbfAEz9PuNoThywfppiWbE74pfHRDkteL5+o2JQBLOSo6V6Ow0xV6 O4cOvwV5X/RExbOoZlx1 =yj7E -----END PGP SIGNATURE----- Merge tag 'nfsd-4.14-1' of git://linux-nfs.org/~bfields/linux Pull nfsd fix from Bruce Fields: "One fix for a 4.14 regression, and one minor fix to the MAINTAINERs file. (I was weirdly flattered by the idea that lots of random people suddenly seemed to think Jeff and I were VFS experts. Turns out it was just a typo)" * tag 'nfsd-4.14-1' of git://linux-nfs.org/~bfields/linux: nfsd4: define nfsd4_secinfo_no_name_release() MAINTAINERS: associate linux/fs.h with VFS instead of file locking	2017-10-10 13:01:51 -07:00
Linus Torvalds	7056964a85	f2fs-for-4.14-rc5 This contains one bug fix which causes a kernel panic during fstrim introduced in 4.14-rc1. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAlnc+awACgkQQBSofoJI UNIQLg/9HB/NikmBxVtkDtwrTKpVEPK5AYRHOvoa9k6twGkU6pB8FE0cd2PstwlZ tAwRstyt8W9nGzF5BPY+WAyVs9ybc26wIqNo13cnzwXbc0/cc4pTy8lzeiFQdQrK JIzz2lHNt0b5euCsEEAsnwK+rTb5DPUMKm8JkBUQ8f94oxIHLWvg7Um9FBppTw7s JNOJ8/ymzQVNlWu7VxFaVwfUPbEhK7gtpSWjO65fiprQ0JjwXLEr65356XU2XW8x lhQkByPMfMv1ZyGSNr3m4Hih0M6250slNHzwrZDxTdH7NDJmy1DfcPiM+epMWZMa 4uT+2hsxhTCqDQbIEvP9jv+KVHV7AG9ldCD04a0RD+XoNKDVLKlzSMFWVcWE/d0H jSaDrMZj+taseF72x/efP8P/RrTbzqYsqBoAkoByibOXvBf7U8vsLK4NuG7agoL4 EUXDMuVJDB5d8LJRSYt0lPI5R+lhRVlVuint7a9T09yiLyCeR0wGf+eoH9C9Y4V8 t/mEM9azBi9l7T0yraVfqnh+SPzwwlxYOLQeZTi0bf3uqmBOeKb0OvfOiwboOnaZ 5Rl6jYD/hgZAowXpbohRjqPJhMoLMabsTJ4kHj6uJcQDhvTqDpamm9g9Afsiyr6z xPYo09iHHlWA/iSiV7VSnbZu8hr59bchVt86r77fy/4YH3DXOcM= =fAsG -----END PGP SIGNATURE----- Merge tag 'f2fs-for-4.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fix from Jaegeuk Kim: "This contains one bug fix which causes a kernel panic during fstrim introduced in 4.14-rc1" * tag 'f2fs-for-4.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: fix potential panic during fstrim	2017-10-10 11:04:00 -07:00
Jan Kara	ac3d79392f	quota: Generate warnings for DQUOT_SPACE_NOFAIL allocations Eryu has reported that since commit `7b9ca4c61b` "quota: Reduce contention on dq_data_lock" test generic/233 occasionally fails. This is caused by the fact that since that commit we don't generate warning and set grace time for quota allocations that have DQUOT_SPACE_NOFAIL set (these are for example some metadata allocations in ext4). We need these allocations to behave regularly wrt warning generation and grace time setting so fix the code to return to the original behavior. Reported-and-tested-by: Eryu Guan <eguan@redhat.com> CC: stable@vger.kernel.org Fixes: `7b9ca4c61b` Signed-off-by: Jan Kara <jack@suse.cz>	2017-10-10 17:24:46 +02:00
Linus Torvalds	68ebe3cbe7	NFS client bugfixes for Linux 4.14 Hightlights include: stable fixes: - nfs/filelayout: fix oops when freeing filelayout segment - NFS: Fix uninitialized rpc_wait_queue bugfixes: - NFSv4/pnfs: Fix an infinite layoutget loop - nfs: RPC_MAX_AUTH_SIZE is in bytes -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJZ27KKAAoJEGcL54qWCgDybIIP/Ai9g9AQ52B7Id0VhcB40fZM Bn8I8nYbSzkOivL+w5DHW5eTg0spJ2+iEBjOucPkDWuK0hmeu7kDaIIfauaBTmcM dg2eQMVEaU8PnB0Bf9xMF1hR4Jf3laPVaW3Dnpl01+eJu0feQVf3EDJOzwDll5e6 GDt8wuKXjfXZmHEVuvMvD/YSbzlLgKIyp62VRWXWMM73VUHL9YNc0VDaX6LTHzkM fYK+jWEgoq93/xuC2cP98+PyoziL82AYl7em0mcHTeffHm6FlB2KXrQq6dsW3UqI QMHQdqn6j+CWAv/PyJP+AifT/pTlvnor9ia4TVXlleWwrMSllUDCEttWi0jaBJxv OhaQgaQQEIGb6TLo7qbmHIX/VXxC1UMfjkx1Eqr4vu/Ps8y9t1Wy6V+pd86+QbzG qo/+jtFVHTMWIU9JBlowKoAJkeyeMfhL4cfSqcgdsSj9JJ2O/F/a/BFNh3bgui69 TeSFLMoS0FCw9T2h2QeMCSwXvETmFDZR2pUXdsoULxYH0jZ4oPr7Fr9GflsSITwA oCITgkpt1oOoB5V/PrLPWfjq0JzcA69VAgmD1WJn5eNz1AvQErYYNU+VDf51T4rm zEAxk26WB7+KBBYMEyRCBeatnAAx0a28MFyYI7ittwovOkXIXOv/dw2bFZbSNyoc vpe4ZMGP442znvyy5Myh =QOH4 -----END PGP SIGNATURE----- Merge tag 'nfs-for-4.14-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: "Hightlights include: stable fixes: - nfs/filelayout: fix oops when freeing filelayout segment - NFS: Fix uninitialized rpc_wait_queue bugfixes: - NFSv4/pnfs: Fix an infinite layoutget loop - nfs: RPC_MAX_AUTH_SIZE is in bytes" * tag 'nfs-for-4.14-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4/pnfs: Fix an infinite layoutget loop nfs/filelayout: fix oops when freeing filelayout segment sunrpc: remove redundant initialization of sock NFS: Fix uninitialized rpc_wait_queue NFS: Cleanup error handling in nfs_idmap_request_key() nfs: RPC_MAX_AUTH_SIZE is in bytes	2017-10-09 10:55:37 -07:00
Linus Torvalds	eab26ad197	Changes since last update: - fix a race between overlapping copy on write aio - fix cow fork swapping when we defragment reflinked files -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCgAGBQJZ1/WPAAoJEPh/dxk0SrTrk8AP/0rV3Cb6tknRTwNPHWC2KG+v UPP2KmN9tGPrqbrDTzMYdQC4/UNE4Je7+hMevF+A61Q7rug/4xofGP3Bl+vxWV22 Y2lDA2jGHDnA20tvHvNUNJ+8aWbiHXXkzYCbohrlTHteDMaB+diHLp7jtePPrgzu ++qBM2X2noXhC3B6MB/GzEDUyTwHgEySsfx2IJDHs7LkQR5qV9UF8f1SSLbr9o7u N7JJ6CXUW5Dfb6Sxk8WJGEBHxTzf14vdPeTOmnsx1OwW9FFidVtcr8/YdY6Cv1F+ LjpDuR/pWwJM0Ig1BB03jIcKNoG6Q6V1AJjNdZkq0hoEYc4Z8mNdyHPPSyvgMqqS 733eMJI7q1Cu546XBP2NKmzUBJr4wVNPxTVbxZnbqrL1ybODTzKuDRkgpkoE8Hrg gSKXi4gnXJkR4/N5DPN+dP3cLMRl81QJ6widiZdpvxWzJGaOM1Ynu/o9mmo0yj7K rlHQ6tgex2TyuTys+jyPgRb489rf6eKnNTxu2I4F4nNbHsNOiNa8eVUc7FLP1SxL SfL2PUmUgcI1FcLl3yMZ2wZ3zP+PMV005aZB2q9KW08COF/ASXOX87efsQ91WaUy rEzOZoBxZvfc0DA0G5Tmlb+MbGtlfdjDfidPygmDeBrSRPJpUyxTh7xoRk/an8wL B4QtpX77Pj/qQNbuThkv =oDmt -----END PGP SIGNATURE----- Merge tag 'xfs-4.14-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Darrick Wong: - fix a race between overlapping copy on write aio - fix cow fork swapping when we defragment reflinked files * tag 'xfs-4.14-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: handle racy AIO in xfs_reflink_end_cow xfs: always swap the cow forks when swapping extents	2017-10-06 15:53:36 -07:00
Linus Torvalds	bf2db0b9f5	Merge branch 'for-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Two more fixes for bugs introduced in 4.13. The sector_t problem with 32bit architecture and !LBDAF config seems serious but the number of affected deployments is hopefully low. The clashing status bits could lead to a confusing in-memory state of the whole-filesystem operations if used with the quota override sysfs knob" * 'for-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix overlap of fs_info::flags values btrfs: avoid overflow when sector_t is 32 bit	2017-10-06 09:03:08 -07:00
Linus Torvalds	b77779b93d	Two fixups for CephFS snapshot-handling patches in -rc1. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJZ142+AAoJEEp/3jgCEfOLPdQH/0wFtTLG7sKhEBVndsDUG8u0 RUtLBE4dXFJU7IlLQOuAkD4GvC4XqttLIJs7bkUwSUu7Vk3+2OKk0JvUq2qKFl03 tM5sWIqX5FkL9nenivV28YI6rOPHyghVXttVw/4xy5QYLJ1G3OoJpGPJOE44v5v9 w96guw+EEaPWyn8+/SBhEkfpVAR2fRXe4UDKiLzGYLqYNYiGSSd90j/7F8I4uaNG hpQ6aJVJOzNoTQtfmsGyZ0DHuBD8/CSQOIumXdICegDk7stEVGaxSlkBX2ZwwR2q jwxIRj6ItM+jDORSgaVAhQ6NJktCxs+scfNFgu8MlQ+RaTOSnEkcvigA7DIVMrw= =h2CQ -----END PGP SIGNATURE----- Merge tag 'ceph-for-4.14-rc4' of git://github.com/ceph/ceph-client Pull ceph fixes from Ilya Dryomov: "Two fixups for CephFS snapshot-handling patches in -rc1" * tag 'ceph-for-4.14-rc4' of git://github.com/ceph/ceph-client: ceph: fix __choose_mds() for LSSNAP request ceph: properly queue cap snap for newly created snap realm	2017-10-06 09:01:45 -07:00
Linus Torvalds	8d4ef4e15e	Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: "Fix a regression in 4.14 and one in 4.13. The latter is a case when Docker is doing something it really shouldn't and gets away with it. We now print a warning instead of erroring out. There are also fixes to several error paths" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix regression caused by exclusive upper/work dir protection ovl: fix missing unlock_rename() in ovl_do_copy_up() ovl: fix dentry leak in ovl_indexdir_cleanup() ovl: fix dput() of ERR_PTR in ovl_cleanup_index() ovl: fix error value printed in ovl_lookup_index() ovl: fix may_write_real() for overlayfs directories	2017-10-06 08:52:53 -07:00
Eryu Guan	ec572b9e81	nfsd4: define nfsd4_secinfo_no_name_release() Commit `34b1744c91` ("nfsd4: define ->op_release for compound ops") defined a couple ->op_release functions and run them if necessary. But there's a problem with that is that it reused nfsd4_secinfo_release() as the op_release of OP_SECINFO_NO_NAME, and caused a leak on struct nfsd4_secinfo_no_name in nfsd4_encode_secinfo_no_name(), because there's no .si_exp field in struct nfsd4_secinfo_no_name. I found this because I was unable to umount an ext4 partition after exporting it via NFS & run fsstress on the nfs mount. A simplified reproducer would be: # mount a local-fs device at /mnt/test, and export it via NFS with # fsid=0 export option (this is required) mount /dev/sda5 /mnt/test echo "/mnt/test *(rw,no_root_squash,fsid=0)" >> /etc/exports service nfs restart # locally mount the nfs export with all default, note that I have # nfsv4.1 configured as the default nfs version, because of the # fsid export option, v4 mount would fail and fall back to v3 mount localhost:/mnt/test /mnt/nfs # try to umount the underlying device, but got EBUSY umount /mnt/nfs service nfs stop umount /mnt/test <=== EBUSY here Fixed it by defining a separate nfsd4_secinfo_no_name_release() function as the op_release method of OP_SECINFO_NO_NAME that releases the correct nfsd4_secinfo_no_name structure. Fixes: `34b1744c91` ("nfsd4: define ->op_release for compound ops") Signed-off-by: Eryu Guan <eguan@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2017-10-05 14:45:25 -04:00
Amir Goldstein	85fdee1eef	ovl: fix regression caused by exclusive upper/work dir protection Enforcing exclusive ownership on upper/work dirs caused a docker regression: https://github.com/moby/moby/issues/34672. Euan spotted the regression and pointed to the offending commit. Vivek has brought the regression to my attention and provided this reproducer: Terminal 1: mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none merged/ Terminal 2: unshare -m Terminal 1: umount merged mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none merged/ mount: /root/overlay-testing/merged: none already mounted or mount point busy To fix the regression, I replaced the error with an alarming warning. With index feature enabled, mount does fail, but logs a suggestion to override exclusive dir protection by disabling index. Note that index=off mount does take the inuse locks, so a concurrent index=off will issue the warning and a concurrent index=on mount will fail. Documentation was updated to reflect this change. Fixes: `2cac0c00a6` ("ovl: get exclusive ownership on upper/work dirs") Cc: <stable@vger.kernel.org> # v4.13 Reported-by: Euan Kemp <euank@euank.com> Reported-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-10-05 15:53:18 +02:00
Amir Goldstein	5820dc0888	ovl: fix missing unlock_rename() in ovl_do_copy_up() Use the ovl_lock_rename_workdir() helper which requires unlock_rename() only on lock success. Fixes: ("fd210b7d67ee ovl: move copy up lock out") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-10-05 15:53:18 +02:00
Amir Goldstein	dc7ab6773e	ovl: fix dentry leak in ovl_indexdir_cleanup() index dentry was not released when breaking out of the loop due to index verification error. Fixes: `415543d5c6` ("ovl: cleanup bad and stale index entries on mount") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-10-05 15:53:18 +02:00
Amir Goldstein	9f4ec904db	ovl: fix dput() of ERR_PTR in ovl_cleanup_index() Fixes: `caf70cb2ba` ("ovl: cleanup orphan index entries") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-10-05 15:53:18 +02:00
Amir Goldstein	e0082a0f04	ovl: fix error value printed in ovl_lookup_index() Fixes: `359f392ca5` ("ovl: lookup index entry for copy up origin") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-10-05 15:53:18 +02:00
Amir Goldstein	954c736f86	ovl: fix may_write_real() for overlayfs directories Overlayfs directory file_inode() is the overlay inode whether the real inode is upper or lower. This fixes a regression in xfstest generic/158. Fixes: `7c6893e3c9` ("ovl: don't allow writing ioctl on lower layer") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2017-10-05 15:53:18 +02:00
Trond Myklebust	e8fa33a6f6	NFSv4/pnfs: Fix an infinite layoutget loop Since we can now use a lock stateid or a delegation stateid, that differs from the context stateid, we need to change the test in nfs4_layoutget_handle_exception() to take this into account. This fixes an infinite layoutget loop in the NFS client whereby it keeps retrying the initial layoutget using the same broken stateid. Fixes: `70d2f7b1ea` ("pNFS: Use the standard I/O stateid when...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2017-10-04 14:06:54 -04:00
Linus Torvalds	b7e1416441	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "A lot of stuff, sorry about that. A week on a beach, then a bunch of time catching up then more time letting it bake in -next. Shan't do that again!" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (51 commits) include/linux/fs.h: fix comment about struct address_space checkpatch: fix ignoring cover-letter logic m32r: fix build failure lib/ratelimit.c: use deferred printk() version kernel/params.c: improve STANDARD_PARAM_DEF readability kernel/params.c: fix an overflow in param_attr_show kernel/params.c: fix the maximum length in param_get_string mm/memory_hotplug: define find_{smallest\|biggest}_section_pfn as unsigned long mm/memory_hotplug: change pfn_to_section_nr/section_nr_to_pfn macro to inline function kernel/kcmp.c: drop branch leftover typo memremap: add scheduling point to devm_memremap_pages mm, page_alloc: add scheduling point to memmap_init_zone mm, memory_hotplug: add scheduling point to __add_pages lib/idr.c: fix comment for idr_replace() mm: memcontrol: use vmalloc fallback for large kmem memcg arrays kernel/sysctl.c: remove duplicate UINT_MAX check on do_proc_douintvec_conv() include/linux/bitfield.h: remove 32bit from FIELD_GET comment block lib/lz4: make arrays static const, reduces object code size exec: binfmt_misc: kill the onstack iname[BINPRM_BUF_SIZE] array exec: binfmt_misc: fix race between load_misc_binary() and kill_node() ...	2017-10-04 09:30:50 -07:00
Tsutomu Itoh	69ad59767d	Btrfs: fix overlap of fs_info::flags values Because the values of BTRFS_FS_EXCL_OP and BTRFS_FS_QUOTA_OVERRIDE overlap, we should change the value. First, BTRFS_FS_EXCL_OP was set to 14. commit `171938e528` ("btrfs: track exclusive filesystem operation in flags") Next, the value of BTRFS_FS_QUOTA_OVERRIDE was set to 14. commit `f29efe2921` ("btrfs: add quota override flag to enable quota override for CAP_SYS_RESOURCE") As a result, the value 14 overlapped, by accident. This problem is solved by defining the value of BTRFS_FS_EXCL_OP as 16, the flags are internal. Fixes: `f29efe2921` ("btrfs: add quota override flag to enable quota override for CAP_SYS_RESOURCE") CC: stable@vger.kernel.org # 4.13+ Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> [ minimize the change, update only BTRFS_FS_EXCL_OP ] Signed-off-by: David Sterba <dsterba@suse.com>	2017-10-04 16:44:18 +02:00
Goffredo Baroncelli	2d8ce70a08	btrfs: avoid overflow when sector_t is 32 bit Jean-Denis Girard noticed commit `c821e7f3` "pass bytes to btrfs_bio_alloc" (https://patchwork.kernel.org/patch/9763081/) introduces a regression on 32 bit machines. When CONFIG_LBDAF is _not_ defined (CONFIG_LBDAF == Support for large (2TB+) block devices and files) sector_t is 32 bit on 32bit machines. In the function submit_extent_page, 'sector' (which is sector_t type) is multiplied by 512 to convert it from sectors to bytes, leading to an overflow when the disk is bigger than 4GB (!). I added a cast to u64 to avoid overflow. Fixes: `c821e7f3` ("btrfs: pass bytes to btrfs_bio_alloc") CC: stable@vger.kernel.org # 4.13+ Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it> Tested-by: Jean-Denis Girard <jd.girard@sysnux.pf> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-10-04 16:22:56 +02:00
Casey Schaufler	57e7ba04d4	lsm: fix smack_inode_removexattr and xattr_getsecurity memleak security_inode_getsecurity() provides the text string value of a security attribute. It does not provide a "secctx". The code in xattr_getsecurity() that calls security_inode_getsecurity() and then calls security_release_secctx() happened to work because SElinux and Smack treat the attribute and the secctx the same way. It fails for cap_inode_getsecurity(), because that module has no secctx that ever needs releasing. It turns out that Smack is the one that's doing things wrong by not allocating memory when instructed to do so by the "alloc" parameter. The fix is simple enough. Change the security_release_secctx() to kfree() because it isn't a secctx being returned by security_inode_getsecurity(). Change Smack to allocate the string when told to do so. Note: this also fixes memory leaks for LSMs which implement inode_getsecurity but not release_secctx, such as capabilities. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: stable@vger.kernel.org Signed-off-by: James Morris <james.l.morris@oracle.com>	2017-10-04 18:03:15 +11:00
Christoph Hellwig	e12199f85d	xfs: handle racy AIO in xfs_reflink_end_cow If we got two AIO writes into a COW area the second one might not have any COW extents left to convert. Handle that case gracefully instead of triggering an assert or accessing beyond the bounds of the extent list. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-10-03 21:27:55 -07:00
Darrick J. Wong	52bfcdd7ad	xfs: always swap the cow forks when swapping extents Since the CoW fork exists as a secondary data structure to the data fork, we must always swap cow forks during swapext. We also need to swap the extent counts and reset the cowblocks tags. Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-10-03 21:27:55 -07:00
Oleg Nesterov	50097f7493	exec: binfmt_misc: kill the onstack iname[BINPRM_BUF_SIZE] array After the previous change "fmt" can't go away, we can kill iname/iname_addr and use fmt->interpreter. Link: http://lkml.kernel.org/r/20170922143653.GA17232@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ben Woodard <woodard@redhat.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jim Foraker <foraker1@llnl.gov> Cc: <tdhooge@llnl.gov> Cc: Travis Gummels <tgummels@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-10-03 17:54:25 -07:00
Oleg Nesterov	43a4f26190	exec: binfmt_misc: fix race between load_misc_binary() and kill_node() load_misc_binary() makes a local copy of fmt->interpreter under entries_lock to avoid the race with kill_node() but this is not enough; the whole Node can be freed after we drop entries_lock, not only the ->interpreter string. Add dget/dput(fmt->dentry) to ensure bm_evict_inode() can't destroy/free this Node. Link: http://lkml.kernel.org/r/20170922143650.GA17227@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ben Woodard <woodard@redhat.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jim Foraker <foraker1@llnl.gov> Cc: Travis Gummels <tgummels@redhat.com> Cc: <tdhooge@llnl.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-10-03 17:54:25 -07:00
Oleg Nesterov	eb23aa0317	exec: binfmt_misc: remove the confusing e->interp_file != NULL checks If MISC_FMT_OPEN_FILE flag is set e->interp_file must be valid or we have a bug which should not be silently ignored. Link: http://lkml.kernel.org/r/20170922143647.GA17222@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ben Woodard <woodard@redhat.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jim Foraker <foraker1@llnl.gov> Cc: <tdhooge@llnl.gov> Cc: Travis Gummels <tgummels@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-10-03 17:54:25 -07:00
Oleg Nesterov	83f918274e	exec: binfmt_misc: shift filp_close(interp_file) from kill_node() to bm_evict_inode() To ensure that load_misc_binary() can't use the partially destroyed Node, see also the next patch. The current logic looks wrong in any case, once we close interp_file it doesn't make any sense to delay kfree(inode->i_private), this Node is no longer valid. Even if the MISC_FMT_OPEN_FILE/interp_file checks were not racy (they are), load_misc_binary() should not try to reopen ->interpreter if MISC_FMT_OPEN_FILE is set but ->interp_file is NULL. And I can't understand why do we use filp_close(), not fput(). Link: http://lkml.kernel.org/r/20170922143644.GA17216@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ben Woodard <woodard@redhat.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jim Foraker <foraker1@llnl.gov> Cc: <tdhooge@llnl.gov> Cc: Travis Gummels <tgummels@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-10-03 17:54:25 -07:00
Oleg Nesterov	baba1b2973	exec: binfmt_misc: don't nullify Node->dentry in kill_node() kill_node() nullifies/checks Node->dentry to avoid double free. This complicates the next changes and this is very confusing: - we do not need to check dentry != NULL under entries_lock, kill_node() is always called under inode_lock(d_inode(root)) and we rely on this inode_lock() anyway, without this lock the MISC_FMT_OPEN_FILE cleanup could race with itself. - if kill_inode() was already called and ->dentry == NULL we should not even try to close e->interp_file. We can change bm_entry_write() to simply check !list_empty(list) before kill_node. Again, we rely on inode_lock(), in particular it saves us from the race with bm_status_write(), another caller of kill_node(). Link: http://lkml.kernel.org/r/20170922143641.GA17210@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ben Woodard <woodard@redhat.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jim Foraker <foraker1@llnl.gov> Cc: <tdhooge@llnl.gov> Cc: Travis Gummels <tgummels@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-10-03 17:54:25 -07:00
Oleg Nesterov	c2315c187f	exec: load_script: kill the onstack interp[BINPRM_BUF_SIZE] array Patch series "exec: binfmt_misc: fix use-after-free, kill iname[BINPRM_BUF_SIZE]". It looks like this code was always wrong, then commit `948b701a60` ("binfmt_misc: add persistent opened binary handler for containers") added more problems. This patch (of 6): load_script() can simply use i_name instead, it points into bprm->buf[] and nobody can change this memory until we call prepare_binprm(). The only complication is that we need to also change the signature of bprm_change_interp() but this change looks good too. While at it, do whitespace/style cleanups. NOTE: the real motivation for this change is that people want to increase BINPRM_BUF_SIZE, we need to change load_misc_binary() too but this looks more complicated because afaics it is very buggy. Link: http://lkml.kernel.org/r/20170918163446.GA26793@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Travis Gummels <tgummels@redhat.com> Cc: Ben Woodard <woodard@redhat.com> Cc: Jim Foraker <foraker1@llnl.gov> Cc: <tdhooge@llnl.gov> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-10-03 17:54:25 -07:00
Andrea Arcangeli	384632e67e	userfaultfd: non-cooperative: fix fork use after free When reading the event from the uffd, we put it on a temporary fork_event list to detect if we can still access it after releasing and retaking the event_wqh.lock. If fork aborts and removes the event from the fork_event all is fine as long as we're still in the userfault read context and fork_event head is still alive. We've to put the event allocated in the fork kernel stack, back from fork_event list-head to the event_wqh head, before returning from userfaultfd_ctx_read, because the fork_event head lifetime is limited to the userfaultfd_ctx_read stack lifetime. Forgetting to move the event back to its event_wqh place then results in __remove_wait_queue(&ctx->event_wqh, &ewq->wq); in userfaultfd_event_wait_completion to remove it from a head that has been already freed from the reader stack. This could only happen if resolve_userfault_fork failed (for example if there are no file descriptors available to allocate the fork uffd). If it succeeded it was put back correctly. Furthermore, after find_userfault_evt receives a fork event, the forked userfault context in fork_nctx and uwq->msg.arg.reserved.reserved1 can be released by the fork thread as soon as the event_wqh.lock is released. Taking a reference on the fork_nctx before dropping the lock prevents an use after free in resolve_userfault_fork(). If the fork side aborted and it already released everything, we still try to succeed resolve_userfault_fork(), if possible. Fixes: `893e26e61d` ("userfaultfd: non-cooperative: Add fork() event") Link: http://lkml.kernel.org/r/20170920180413.26713-1-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-10-03 17:54:25 -07:00

1 2 3 4 5 ...

50833 Commits