linux_old1

Commit Graph

Author	SHA1	Message	Date
Qu Wenruo	fce466eab7	btrfs: tree-checker: Verify block_group_item A crafted image with invalid block group items could make free space cache code to cause panic. We could detect such invalid block group item by checking: 1) Item size Known fixed value. 2) Block group size (key.offset) We have an upper limit on block group item (10G) 3) Chunk objectid Known fixed value. 4) Type Only 4 valid type values, DATA, METADATA, SYSTEM and DATA\|METADATA. No more than 1 bit set for profile type. 5) Used space No more than the block group size. This should allow btrfs to detect and refuse to mount the crafted image. Link: https://bugzilla.kernel.org/show_bug.cgi?id=199849 Reported-by: Xu Wen <wen.xu@gatech.edu> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:41 +02:00
David Sterba	6d8ff4e458	btrfs: annotate unlikely branches after V0 extent type removal The v0 extent type checks are the right case for the unlikely annotations as we don't expect to ever see them, so let's give the compiler some hint. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:41 +02:00
Nikolay Borisov	ba3c2b196b	btrfs: Add graceful handling of V0 extents Following the removal of the v0 handling code let's be courteous and print an error message when such extents are handled. In the cases where we have a transaction just abort it, otherwise just call btrfs_handle_fs_error. Both cases result in the FS being re-mounted RO. In case the error handling would be too intrusive, leave the BUG_ON in place, like extent_data_ref_count, other proper handling would catch that earlier. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:41 +02:00
Nikolay Borisov	a79865c680	btrfs: Remove V0 extent support The v0 compat code was introduced in commit `5d4f98a28c` ("Btrfs: Mixed back reference (FORWARD ROLLING FORMAT CHANGE)") 9 years ago, which was merged in 2.6.31. This means that the code is there to support filesystems which are _VERY_ old and if you are using btrfs on such an old kernel, you have much bigger problems. This coupled with the fact that no one is likely testing/maintining this code likely means it has bugs lurking. All things considered I think 43 kernel releases later it's high time this remnant of the past got removed. This patch removes all code wrapped in #ifdefs but leaves the BUG_ONs in case we have a v0 with no support intact as a sort of safety-net. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:41 +02:00
Chengguang Xu	4de426cd39	btrfs: remove unnecessary curly braces in btrfs_get_acl It's only coding style fix not functinal change. When if/else has only one statement then the braces are not needed. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:41 +02:00
Chengguang Xu	dc7789ef87	btrfs: avoid error code override in btrfs_get_acl It's not good to override the error code when failing from btrfs_getxattr() in btrfs_get_acl() because it hides the real reason of the failure. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:40 +02:00
Chengguang Xu	5ee552da50	btrfs: remove unnecessary -ERANGE check in btrfs_get_acl There is no chance to get into -ERANGE error condition because we first call btrfs_getxattr to get the length of the attribute, then we do a subsequent call with the size from the first call. Between the 2 calls the size shouldn't change. So remove the unnecessary -ERANGE error check. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:40 +02:00
Chengguang Xu	7e35eab958	btrfs: replace empty string with NULL when getting attribute length in btrfs_get_acl In btrfs_get_acl() the first call of btr_getxattr() is for getting the length of attribute, the value buffer is never used in this case. So it's better to replace empty string with NULL. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:40 +02:00
Chengguang Xu	ab3629ed86	btrfs: return error instead of crash when detecting unexpected type in btrfs_get_acl The caller of btrfs_get_acl() checks error condition so there is no impact from this change. In practice there is no chance to get into default case of switch statement because VFS has already checked the type. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:40 +02:00
Su Yue	af431dcb24	btrfs: return EUCLEAN if extent_inline_ref type is invalid If type of extent_inline_ref found is not expected, filesystem may have been corrupted, should return EUCLEAN instead of EINVAL. Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:40 +02:00
Goldwyn Rodrigues	e4af400a9c	btrfs: Use iocb to derive pos instead of passing a separate parameter struct kiocb carries the ki_pos, so there is no need to pass it as a separate function parameter. generic_file_direct_write() increments ki_pos, so we now assign pos after the function. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> [ rename to btrfs_buffered_write ] Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:40 +02:00
Su Yue	893bf4b115	btrfs: print more details when checking tree block finds a problem For easier debugging, print eb->start if level is invalid. Also make clear if bytenr found is not expected. Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:39 +02:00
Nikolay Borisov	7b4284de93	btrfs: Streamline memory allocation failure handling in btrfs_add_delayed_tree_ref Currently the function uses 2 goto labels to properly handle allocation failures. This could be simplified by simply re-arranging the code so that allocations are the in the beginning of the function. This allows to use simple return statements. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:39 +02:00
Qu Wenruo	4379444654	btrfs: Don't remove block group that still has pinned down bytes [BUG] Under certain KVM load and LTP tests, it is possible to hit the following calltrace if quota is enabled: BTRFS critical (device vda2): unable to find logical 8820195328 length 4096 BTRFS critical (device vda2): unable to find logical 8820195328 length 4096 WARNING: CPU: 0 PID: 49 at ../block/blk-core.c:172 blk_status_to_errno+0x1a/0x30 CPU: 0 PID: 49 Comm: kworker/u2:1 Not tainted 4.12.14-15-default #1 SLE15 (unreleased) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] task: ffff9f827b340bc0 task.stack: ffffb4f8c0304000 RIP: 0010:blk_status_to_errno+0x1a/0x30 Call Trace: submit_extent_page+0x191/0x270 [btrfs] ? btrfs_create_repair_bio+0x130/0x130 [btrfs] __do_readpage+0x2d2/0x810 [btrfs] ? btrfs_create_repair_bio+0x130/0x130 [btrfs] ? run_one_async_done+0xc0/0xc0 [btrfs] __extent_read_full_page+0xe7/0x100 [btrfs] ? run_one_async_done+0xc0/0xc0 [btrfs] read_extent_buffer_pages+0x1ab/0x2d0 [btrfs] ? run_one_async_done+0xc0/0xc0 [btrfs] btree_read_extent_buffer_pages+0x94/0xf0 [btrfs] read_tree_block+0x31/0x60 [btrfs] read_block_for_search.isra.35+0xf0/0x2e0 [btrfs] btrfs_search_slot+0x46b/0xa00 [btrfs] ? kmem_cache_alloc+0x1a8/0x510 ? btrfs_get_token_32+0x5b/0x120 [btrfs] find_parent_nodes+0x11d/0xeb0 [btrfs] ? leaf_space_used+0xb8/0xd0 [btrfs] ? btrfs_leaf_free_space+0x49/0x90 [btrfs] ? btrfs_find_all_roots_safe+0x93/0x100 [btrfs] btrfs_find_all_roots_safe+0x93/0x100 [btrfs] btrfs_find_all_roots+0x45/0x60 [btrfs] btrfs_qgroup_trace_extent_post+0x20/0x40 [btrfs] btrfs_add_delayed_data_ref+0x1a3/0x1d0 [btrfs] btrfs_alloc_reserved_file_extent+0x38/0x40 [btrfs] insert_reserved_file_extent.constprop.71+0x289/0x2e0 [btrfs] btrfs_finish_ordered_io+0x2f4/0x7f0 [btrfs] ? pick_next_task_fair+0x2cd/0x530 ? __switch_to+0x92/0x4b0 btrfs_worker_helper+0x81/0x300 [btrfs] process_one_work+0x1da/0x3f0 worker_thread+0x2b/0x3f0 ? process_one_work+0x3f0/0x3f0 kthread+0x11a/0x130 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x35/0x40 BTRFS critical (device vda2): unable to find logical 8820195328 length 16384 BTRFS: error (device vda2) in btrfs_finish_ordered_io:3023: errno=-5 IO failure BTRFS info (device vda2): forced readonly BTRFS error (device vda2): pending csums is 2887680 [CAUSE] It's caused by race with block group auto removal: - There is a meta block group X, which has only one tree block The tree block belongs to fs tree 257. - In current transaction, some operation modified fs tree 257 The tree block gets COWed, so the block group X is empty, and marked as unused, queued to be deleted. - Some workload (like fsync) wakes up cleaner_kthread() Which will call btrfs_delete_unused_bgs() to remove unused block groups. So block group X along its chunk map get removed. - Some delalloc work finished for fs tree 257 Quota needs to get the original reference of the extent, which will read tree blocks of commit root of 257. Then since the chunk map gets removed, the above warning gets triggered. [FIX] Just let btrfs_delete_unused_bgs() skip block group which still has pinned bytes. However there is a minor side effect: currently we only queue empty blocks at update_block_group(), and such empty block group with pinned bytes won't go through update_block_group() again, such block group won't be removed, until it gets new extent allocated and removed. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:39 +02:00
Geert Uytterhoeven	bc931c0ef8	btrfs: Refactor count handling in btrfs_unpin_free_ino With gcc 4.1.2: fs/btrfs/inode-map.c: In function ‘btrfs_unpin_free_ino’: fs/btrfs/inode-map.c:241: warning: ‘count’ may be used uninitialized in this function While this warning is a false-positive, it can easily be killed by refactoring the code. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:39 +02:00
Arnd Bergmann	d3c6be6fda	btrfs: use timespec64 for i_otime While the regular inode timestamps all use timespec64 now, the i_otime field is btrfs specific and still needs to be converted to correctly represent times beyond 2038. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:39 +02:00
Arnd Bergmann	afd48513f0	btrfs: use monotonic time for transaction handling The transaction times were changed to ktime_get_real_seconds to avoid the y2038 overflow, but they still have a minor problem when they go backwards or jump due to settimeofday() or leap seconds. This changes the transaction handling to instead use ktime_get_seconds(), which returns a CLOCK_MONOTONIC timestamp that has neither of those problems. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:38 +02:00
Qu Wenruo	e41ca58974	btrfs: Get rid of the confusing btrfs_file_extent_inline_len We used to call btrfs_file_extent_inline_len() to get the uncompressed data size of an inlined extent. However this function is hiding evil, for compressed extent, it has no choice but to directly read out ram_bytes from btrfs_file_extent_item. While for uncompressed extent, it uses item size to calculate the real data size, and ignoring ram_bytes completely. In fact, for corrupted ram_bytes, due to above behavior kernel btrfs_print_leaf() can't even print correct ram_bytes to expose the bug. Since we have the tree-checker to verify all EXTENT_DATA, such mismatch can be detected pretty easily, thus we can trust ram_bytes without the evil btrfs_file_extent_inline_len(). Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:38 +02:00
Nikolay Borisov	bc877d285c	btrfs: Deduplicate extent_buffer init code When a new extent buffer is allocated there are a few mandatory fields which need to be set in order for the buffer to be sane: level, generation, bytenr, backref_rev, owner and FSID/UUID. Currently this is open coded in the callers of btrfs_alloc_tree_block, meaning it's fairly high in the abstraction hierarchy of operations. This patch solves this by simply moving this init code in btrfs_init_new_buffer, since this is the function which initializes a newly allocated extent buffer. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:38 +02:00
Qu Wenruo	9912bbf644	btrfs: check-integrity: Fix NULL pointer dereference for degraded mount Commit `f8f84b2dfd` ("btrfs: index check-integrity state hash by a dev_t") changed how btrfsic indexes device state. Now we need to access device->bdev->bd_dev, while for degraded mount it's completely possible to have device->bdev as NULL, thus it will trigger a NULL pointer dereference at mount time. Fix it by checking if the device is degraded before accessing device->bdev->bd_dev. There are a lot of other places accessing device->bdev->bd_dev, however the other call sites have either checked device->bdev, or the device->bdev is passed from btrfsic_map_block(), so it won't cause harm. Fixes: `f8f84b2dfd` ("btrfs: index check-integrity state hash by a dev_t") Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:38 +02:00
Nikolay Borisov	43a7e99db6	btrfs: Remove fs_info from btrfs_force_chunk_alloc It can be referenced from the passed transaction handle. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:38 +02:00
Nikolay Borisov	c83488afc5	btrfs: Remove fs_info from btrfs_inc_block_group_ro It can be referenced from the passed bg cache. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:37 +02:00
Nikolay Borisov	61da2abfca	btrfs: Remove fs_info from btrfs_alloc_logged_file_extent It can be referenced from trans since the function is always called within a valid transaction. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:37 +02:00
Nikolay Borisov	87cc7a8a2a	btrfs: Remove fs_info from remove_extent_backref It can be referenced directly from the transaction handle since it's always valid. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:37 +02:00
Nikolay Borisov	5fac7f9ee1	btrfs: Remove fs_info from run_one_delayed_ref It can be referenced from the passed transaction handle, since it's always valid. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:37 +02:00
Nikolay Borisov	a639cdeba3	btrfs: Remove fs_info from insert_inline_extent_backref It can be referenced from the passed transaction handle, since it's always valid. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:37 +02:00
Nikolay Borisov	3c4da6574e	btrfs: Remove fs_info from exclude_super_stripes It can be referenced from the passed block group. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:36 +02:00
Nikolay Borisov	9e715da860	btrfs: Remove fs_info from free_excluded_extents It can be referenced from the passed block group. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:36 +02:00
Nikolay Borisov	451a2c1303	btrfs: Remove fs_info from check_system_chunk It can be referenced from trans since the function is always called within a transaction. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:36 +02:00
Nikolay Borisov	c216b2039a	btrfs: Remove fs_info from btrfs_alloc_chunk It can be referenced from trans since the function is always called within a transaction. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:36 +02:00
Nikolay Borisov	01458828bb	btrfs: Remove fs_info from do_chunk_alloc This function is always called with a valid transaction handle from where fs_info can be referenced. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:36 +02:00
Nikolay Borisov	f97806f2ee	btrfs: Remove fs_info from run_delayed_tree_ref It can always be referneced from the passed transaction handle since it's always valid. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:35 +02:00
Nikolay Borisov	f9871eddd9	btrfs: Remove fs_info from cleanup_ref_head fs_info can be refenreced from the transaction handle, since it's always valid. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:35 +02:00
Nikolay Borisov	c4d56d4a16	btrfs: Remove unused fs_info from cleanup_extent_op The argument is no longer used so remove it. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:35 +02:00
Nikolay Borisov	20b9a2d670	btrfs: Remove fs_info from run_delayed_extent_op This function is always called with a valid transaction handle so fs_info can be referenced from there. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:35 +02:00
Nikolay Borisov	2bf98ef35f	btrfs: Remove fs_info from run_delayed_data_ref This function is always called with a valid transaction from where fs_info can be referenced. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:35 +02:00
Nikolay Borisov	2590d0f155	btrfs: Remove fs_info argument from __btrfs_inc_extent_ref This function already takes a transaction which holds a reference to the fs_info struct. Use that reference and remove the extra arg. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:35 +02:00
Nikolay Borisov	ef89b8245b	btrfs: Remove fs_info from alloc_reserved_file_extent fs_info can be referenced from the transaction handle, which is always valid. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:34 +02:00
Nikolay Borisov	e72cb9235d	btrfs: Remove fs_info from __btrfs_free_extent This function is always called with a valid transaction handle so we can reference the fs_info from there. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:34 +02:00
Nikolay Borisov	5a98ec0141	btrfs: Remove fs_info from btrfs_remove_block_group This function is always called with a valid transaction handle from where we can reference fs_info. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:34 +02:00
Nikolay Borisov	e7e02096d9	btrfs: Remove fs_info from btrfs_make_block_group This function is always called with a valid transaction handle from where we can reference the fs_info. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:34 +02:00
Nikolay Borisov	88a979c615	btrfs: Remove fs_info from btrfs_add_delayed_data_ref This function is always called with a valid transaction handle from where fs_info can be referenced. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:34 +02:00
Nikolay Borisov	44e1c47d5c	btrfs: Remove fs_info from btrfs_add_delayed_tree_ref This function is always called with a valid transaction handle from where fs_info can be referenced. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:33 +02:00
Nikolay Borisov	fbe4801b26	btrfs: Remove fs_info from lookup_extent_backref This argument is unused. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:33 +02:00
Nikolay Borisov	bd1d53ef35	btrfs: Remove fs_info argument from lookup_extent_data_ref This function is always called with a valid transaction handle from where fs_info can be referenced. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:33 +02:00
Nikolay Borisov	b8582eeabb	btrfs: Remove fs_info argument from lookup_tree_block_ref This function is always called with a valid transaction handle from where the fs_info can be referenced. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:33 +02:00
Nikolay Borisov	61a18f1c66	btrfs: Remove fs_info argument from update_inline_extent_backref This function always uses the leaf's extent_buffer which already contains a reference to the fs_info. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:33 +02:00
Nikolay Borisov	867cc1fbeb	btrfs: Remove fs_info from lookup_inline_extent_backref This function is always called with a valid transaction handle from where the fs_info can be referenced. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:32 +02:00
Nikolay Borisov	b167fa9152	btrfs: Remove fs_info from fixup_low_keys This argument is unused. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:32 +02:00
Nikolay Borisov	e9f6290d59	btrfs: Remove fs_info from remove_extent_data_ref This function is always called with a valid transaction from where the fs_info can be referenced. No functional change. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:32 +02:00
Nikolay Borisov	375934105c	btrfs: Remove fs_info argument from insert_extent_backref This function is always called with a valid transaction handle from where fs_info can be referenced. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:32 +02:00
Nikolay Borisov	62b895af40	btrfs: Remove fs_info from insert_extent_data_ref This function is always called with a valid transaction handle from where fs_info can be referenced. So remove the redundant argument. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:32 +02:00
Nikolay Borisov	10728404c6	btrfs: Remove fs_info from insert_tree_block_ref This function is always called with a valid transaction so there is no need to duplicate the fs_info, we can reference it directly from the trans handle. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:31 +02:00
Bart Van Assche	edf57cbf2b	btrfs: Fix a C compliance issue The C programming language does not allow to use preprocessor statements inside macro arguments (pr_info() is defined as a macro). Hence rework the pr_info() statement in btrfs_print_mod_info() such that it becomes compliant. This patch allows tools like sparse to analyze the BTRFS source code. Fixes: `62e855771d` ("btrfs: convert printk(KERN_* to use pr_* calls") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:31 +02:00
Bart Van Assche	acd43e3cdf	btrfs: Annotate fall-through when parsing mount option This patch avoids that the compiler complains that a fall-through annotation is missing when building with W=1. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:31 +02:00
Bart Van Assche	bece2e8239	btrfs: Fix misleading indentation reported by smatch This patch avoids that building the BTRFS source code with smatch triggers complaints about inconsistent indenting. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:31 +02:00
Nikolay Borisov	a9ecb653b0	btrfs: Streamline log_extent_csums a bit Currently this function takes the root as an argument only to get the log_root from it. Simplify this by directly passing the log root from the caller. Also eliminate the fs_info local variable, since it's used only once, so directly reference it from the transaction handle. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:31 +02:00
David Sterba	ca5788aba3	btrfs: remove remaing full_sync logic from btrfs_sync_file The logic to check if the inode is already in the log can now be simplified since we always wait for the ordered extents to complete before deciding whether the inode needs to be logged. The big comment about it can go away too. CC: Filipe Manana <fdmanana@suse.com> Suggested-by: Filipe Manana <fdmanana@suse.com> [ code and changelog copied from mail discussion ] Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:31 +02:00
Josef Bacik	5636cf7d6d	btrfs: remove the logged extents infrastructure This is no longer used anywhere, remove all of it. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:30 +02:00
Josef Bacik	a2120a473a	btrfs: clean up the left over logged_list usage We no longer use this list we've passed around so remove it everywhere. Also remove the extra checks for ordered/filemap errors as this is handled higher up now that we're waiting on ordered_extents before getting to the tree log code. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:30 +02:00
Josef Bacik	e7175a6927	btrfs: remove the wait ordered logic in the log_one_extent path Since we are waiting on all ordered extents at the start of the fsync() path we don't need to wait on any logged ordered extents, and we don't need to look up the checksums on the ordered extents as they will already be on disk prior to getting here. Rework this so we're only looking up and copying the on-disk checksums for the extent range we care about. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:30 +02:00
Josef Bacik	b5e6c3e170	btrfs: always wait on ordered extents at fsync time There's a priority inversion that exists currently with btrfs fsync. In some cases we will collect outstanding ordered extents onto a list and only wait on them at the very last second. However this "very last second" falls inside of a transaction handle, so if we are in a lower priority cgroup we can end up holding the transaction open for longer than needed, so if a high priority cgroup is also trying to fsync() it'll see latency. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:30 +02:00
Nikolay Borisov	16d1c062c7	btrfs: Fix comment in lookup_inline_extent_backref The comment wrongfully states that the owner parameter is the level of the parent block. In fact owner is the level of the current block and by adding 1 to it we can eventually get to the parent/root. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:30 +02:00
Nikolay Borisov	bd3c685ed9	btrfs: Document __btrfs_inc_extent_ref Here is a doc-only patch which tires to deobfuscate the terra-incognita that arguments for delayed refs are. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:29 +02:00
Qu Wenruo	9bebe665c3	btrfs: scrub: Remove unused copy_nocow_pages and its callchain Since commit `ac0b4145d6` ("btrfs: scrub: Don't use inode pages for device replace") the function is not used and we can remove all functions down the call chain. There was an optimization that reused inode pages to speed up device replace, but broke when there was nodatasum and compressed page. The potential performance gain is small so we don't loose much by removing it and using scrub_pages same as the other pages. Signed-off-by: Qu Wenruo <wqu@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:29 +02:00
Allen Pais	a944442c2b	btrfs: replace get_seconds with new 64bit time API The get_seconds() function is deprecated as it truncates the timestamp to 32 bits. Change it to or ktime_get_real_seconds(). Signed-off-by: Allen Pais <allen.lkml@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:29 +02:00
Christoph Hellwig	e8693bcfa0	aio: allow direct aio poll comletions for keyed wakeups If we get a keyed wakeup for a aio poll waitqueue and wake can acquire the ctx_lock without spinning we can just complete the iocb straight from the wakeup callback to avoid a context switch. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Avi Kivity <avi@scylladb.com>	2018-08-06 10:24:39 +02:00
Christoph Hellwig	bfe4037e72	aio: implement IOCB_CMD_POLL Simple one-shot poll through the io_submit() interface. To poll for a file descriptor the application should submit an iocb of type IOCB_CMD_POLL. It will poll the fd for the events specified in the the first 32 bits of the aio_buf field of the iocb. Unlike poll or epoll without EPOLLONESHOT this interface always works in one shot mode, that is once the iocb is completed, it will have to be resubmitted. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Avi Kivity <avi@scylladb.com>	2018-08-06 10:24:33 +02:00
Christoph Hellwig	9018ccc453	aio: add a iocb refcount This is needed to prevent races caused by the way the ->poll API works. To avoid introducing overhead for other users of the iocbs we initialize it to zero and only do refcount operations if it is non-zero in the completion path. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Avi Kivity <avi@scylladb.com>	2018-08-06 10:24:28 +02:00
Christoph Hellwig	7dda712818	timerfd: add support for keyed wakeups This prepares timerfd for use with aio poll. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Avi Kivity <avi@scylladb.com>	2018-08-06 10:24:08 +02:00
Jens Axboe	05b9ba4b55	Linux 4.18-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAltU8z0eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG5X8H/2fJr7m3k242+t76 sitwvx1eoPqTgryW59dRKm9IuXAGA+AjauvHzaz1QxomeQa50JghGWefD0eiJfkA 1AphQ/24EOiAbbVk084dAI/C2p122dE4D5Fy7CrfLnuouyrbFaZI5STbnrRct7sR 9deeYW0GDHO1Uenp4WDCj0baaqJqaevZ+7GG09DnWpya2nQtSkGBjqn6GpYmrfOU mqFuxAX8mEOW6cwK16y/vYtnVjuuMAiZ63/OJ8AQ6d6ArGLwAsdn7f8Fn4I4tEr2 L0d3CRLUyegms4++Dmlu05k64buQu46WlPhjCZc5/Ts4kjrNxBuHejj2/jeSnUSt vJJlibI= =42a5 -----END PGP SIGNATURE----- Merge tag 'v4.18-rc6' into for-4.19/block2 Pull in 4.18-rc6 to get the NVMe core AEN change to avoid a merge conflict down the line. Signed-of-by: Jens Axboe <axboe@kernel.dk>	2018-08-05 19:32:09 -06:00
David S. Miller	c1c8626fce	Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net Lots of overlapping changes, mostly trivial in nature. The mlxsw conflict was resolving using the example resolution at: https://github.com/jpirko/linux_mlxsw/blob/combined_queue/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c Signed-off-by: David S. Miller <davem@davemloft.net>	2018-08-05 13:04:31 -07:00
Gustavo A. R. Silva	7964410fcf	fs: dcache: Use true and false for boolean values Return statements in functions returning bool should use true or false instead of an integer value. This issue was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-08-05 15:52:44 -04:00
Al Viro	808aa6c5e3	Merge branch 'work.hpfs' into work.lookup	2018-08-05 15:51:10 -04:00
Al Viro	1401a0fc2d	afs_try_auto_mntpt(): return NULL instead of ERR_PTR(-ENOENT) simpler logics in callers that way Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-08-05 15:50:59 -04:00
Al Viro	34b2a88fb4	afs_lookup(): switch to d_splice_alias() ->lookup() methods can (and should) use d_splice_alias() instead of d_add(). Even if they are not going to be hit by open_by_handle(), code does get copied around; besides, d_splice_alias() has better calling conventions for use in ->lookup(), so the code gets simpler. Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-08-05 15:44:14 -04:00
Al Viro	855371bd01	afs: switch dynroot lookups to d_splice_alias() ->lookup() methods can (and should) use d_splice_alias() instead of d_add(). Even if they are not going to be hit by open_by_handle(), code does get copied around... Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-08-05 15:41:16 -04:00
Linus Torvalds	60f5a21736	- Fix JFS usercopy whitelist (it needed to cover neighboring field too) for "overflow" inline inode data. -----BEGIN PGP SIGNATURE----- Comment: Kees Cook <kees@outflux.net> iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAltlv70WHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJj3eD/9+goWbs9U62tlO2cIqT5lgTFX9 3vvVpqNJ3atw/fU6SZoa0Z2nRa9TIpZEfEljgARGhCyd2p2MplLJalWo6bq/gUUA 5aWQbVxhyVXUp8kh8m3OnZsAZz658Y5geLMk8vakXdyQ//PF43wO0cZyOFYdG3ec sYuWA318UKaIsxqB9tT/K8YBRjjBgJ8wjgtpoSAr+FUhgg9Qqp3NL4fjr6SOEQrv XWXLVLLhyUo8kQJ29E/VyzIfLysgiA67O4ClW+DEyD6rr/ZK9XOeG6yv3vwLGSHI 06/4BXMJce23iGIYf57Jz7b5tfAaZntdzNqUiTW6up0TuvG09auwjv/hKkwfjQv2 fNf0TVnYOCN4ZWbCm4FTEU5q31u+pZDHiRxOTJG9EbuBVEsFiV5X9B9VTLoA+dWK SqWmE2X9YdONt9q6K8TGpxxrdQnQXGKOOtEY23KoF04XGYQtutIfj7cdFCDDX/eC AjcOIfV3u8dHWjc/wKIYS5XRUzbkdpEeNOaiCQ3RN79JN2UrA0/w7lAxZTICmYCs HtMtaFTER5weKQGzTzg0SP3M95qKPdWhlIq5lspdhGedonZAKBNfGahMltTH2UoY vIb1qGT8uz8tQSzGsu0uvrR92ZIJZ0qUVjB/nhE9HxTz26xqtDDlLXpdIXTGrnU4 hM+Omud//MNgNbsmrg== =pwq8 -----END PGP SIGNATURE----- Merge tag 'usercopy-fix-v4.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull usercopy whitelisting fix from Kees Cook: "Bart Massey discovered that the usercopy whitelist for JFS was incomplete: the inline inode data may intentionally "overflow" into the neighboring "extended area", so the size of the whitelist needed to be raised to include the neighboring field" * tag 'usercopy-fix-v4.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: jfs: Fix usercopy whitelist for inline inode data	2018-08-04 18:34:55 -07:00
Linus Torvalds	f639bef55d	Changes since last update: - Fix incorrect shifting in the iomap bmap functions. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAltj7lcACgkQ+H93GTRK tOutzw//U9yOrUhuOTFrkmA2x+CB1ArncFZysUTXDFUsLRrMp0dwNaRV0WafyypO tCr4nvqgwg2VTFf21zWblOSAajstgJzH+x3FdxTsU5Gktd/BH9gCHSDV0lORnFhp jbqNOsBRPJjNIBzaIIAnyDGMuOF/MeWpGsbGF4eDcsgj58lr098rir7grEVgCkKM +eCMtqJwUXUZ2bo/grBWvZXnLms+8LMPYeRQMJSnw59DVunwxRdSbWumJwkPt9DD mz3Upa/qFJCZw2n9kluU1b/tnrpxkNWYuSjzp9iu0cMdo52HF+yDNHriUPCHa4PB 3KfyGrhOvg+iC2pUAJ5mI3Dpv+NNAk7j/+4mOVtlUYIf0mi5gszIjNHbLiONKVH5 5x8G59wM/ae6a1WcHe7tJRma7r984G+JLTIxAQnWaWhFq5AsYMOplk4oq6X7Lmir wAR7laDN0Xgl3WCFj6SXaKcnuzZFsE4A3SnZILtlgO/WxAujUyFEnICx1cO4Q9OY 64txWii6ora9kdBtalolxjfGHwyScbhx6FiBQdKIznxGgBQR89X8hzxghdp7KTIx kxkC1hAM3KXXtokArjad2hgd8QG23jUshyBwKpfHnEwL75GiZQ3qtPdDm/oJlVWV ItnRSt+tGIT/fsT5szNPmLKgtIW4kQHflVuih0KR4IsGPMmZ8dU= =3omy -----END PGP SIGNATURE----- Merge tag 'xfs-4.18-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs bugfix from Darrick Wong: "One more patch for 4.18 to fix a coding error in the iomap_bmap() function introduced in -rc1: fix incorrect shifting" * tag 'xfs-4.18-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: fs: fix iomap_bmap position calculation	2018-08-04 18:30:58 -07:00
zhong jiang	863c37fcb1	ext4: remove unneeded variable "err" in ext4_mb_release_inode_pa() The err is not used after initalization. So just remove the variable. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-08-04 17:34:07 -04:00
Kees Cook	961b33c244	jfs: Fix usercopy whitelist for inline inode data Bart Massey reported what turned out to be a usercopy whitelist false positive in JFS when symlink contents exceeded 128 bytes. The inline inode data (i_inline) is actually designed to overflow into the "extended area" following it (i_inline_ea) when needed. So the whitelist needed to be expanded to include both i_inline and i_inline_ea (the whole size of which is calculated internally using IDATASIZE, 256, instead of sizeof(i_inline), 128). $ cd /mnt/jfs $ touch $(perl -e 'print "B" x 250') $ ln -s B* b $ ls -l >/dev/null [ 249.436410] Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLUB object 'jfs_ip' (offset 616, size 250)! Reported-by: Bart Massey <bart.massey@gmail.com> Fixes: `8d2704d382` ("jfs: Define usercopy region in jfs_ip slab cache") Cc: Dave Kleikamp <shaggy@kernel.org> Cc: jfs-discussion@lists.sourceforge.net Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>	2018-08-04 07:53:46 -07:00
Geliang Tang	1021bcf44d	pstore: add zstd compression support This patch added the 6th compression algorithm support for pstore: zstd. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org>	2018-08-03 18:12:18 -07:00
Al Viro	c7b15a8657	jfs: don't bother with make_bad_inode() in ialloc() We hit that when inumber allocation has failed. In that case the in-core inode is not hashed and since its ->i_nlink is 1 the only place where jfs checks is_bad_inode() won't be reached. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-08-03 16:03:33 -04:00
Al Viro	d8e78da868	adfs: don't put inodes into icache We never look them up in there; inode_fake_hash() will make them appear hashed for mark_inode_dirty() purposes. And don't leave them around until memory pressure kicks them out - we never look them up again. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-08-03 16:03:33 -04:00
Al Viro	5bef915104	new helper: inode_fake_hash() open-coded in a quite a few places... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-08-03 16:03:32 -04:00
Miklos Szeredi	e950564b97	vfs: don't evict uninitialized inode iput() ends up calling ->evict() on new inode, which is not yet initialized by owning fs. So use destroy_inode() instead. Add to sb->s_inodes list only if inode is not in I_CREATING state (meaning that it wasn't allocated with new_inode(), which already does the insertion). Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: `80ea09a002` ("vfs: factor out inode_insert5()") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-08-03 16:03:32 -04:00
Al Viro	a6cbedfa87	jfs: switch to discard_new_inode() we don't want open-by-handle to pick an in-core inode that has failed setup halfway through. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-08-03 16:03:31 -04:00
Al Viro	2e5afe54e0	ext2: make sure that partially set up inodes won't be returned by ext2_iget() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-08-03 16:03:31 -04:00
Al Viro	5c1a68a358	udf: switch to discard_new_inode() we don't want open-by-handle to pick an in-core inode that has failed setup halfway through. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-08-03 16:03:30 -04:00
Al Viro	dd54992776	ufs: switch to discard_new_inode() we don't want open-by-handle to pick an in-core inode that has failed setup halfway through. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-08-03 16:03:30 -04:00
Al Viro	32955c5422	btrfs: switch to discard_new_inode() Make sure that no partially set up inodes can be returned by open-by-handle. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-08-03 16:03:29 -04:00
Al Viro	c2b6d621c4	new primitive: discard_new_inode() We don't want open-by-handle picking half-set-up in-core struct inode from e.g. mkdir() having failed halfway through. In other words, we don't want such inodes returned by iget_locked() on their way to extinction. However, we can't just have them unhashed - otherwise open-by-handle immediately after that would've ended up creating a new in-core inode over the on-disk one that is in process of being freed right under us. Solution: new flag (I_CREATING) set by insert_inode_locked() and removed by unlock_new_inode() and a new primitive (discard_new_inode()) to be used by such halfway-through-setup failure exits instead of unlock_new_inode() / iput() combinations. That primitive unlocks new inode, but leaves I_CREATING in place. iget_locked() treats finding an I_CREATING inode as failure (-ESTALE, once we sort out the error propagation). insert_inode_locked() treats the same as instant -EBUSY. ilookup() treats those as icache miss. [Fix by Dan Carpenter <dan.carpenter@oracle.com> folded in] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-08-03 15:55:30 -04:00
David Howells	eb9950eb31	rxrpc: Push iov_iter up from rxrpc_kernel_recv_data() to caller Push iov_iter up from rxrpc_kernel_recv_data() to its caller to allow non-contiguous iovs to be passed down, thereby permitting file reading to be simplified in the AFS filesystem in a future patch. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-08-03 12:46:20 -07:00
Linus Torvalds	310810ae19	NFS client bugfixes for Linux 4.18 Highlights include: Bugfixes: - Fix a NFSv4 file locking regression -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJbZFG/AAoJEA4mA3inWBJcyo8P/0MN6eRpDU2VCZheM7DgsN2L a7W/Phc1PjHoaqksQB4Zb3uMYQQCWAyvE/VB5nRJSF1ZQbKvv7kgkyqX+PxEG8LI Pp3igaVXzqkRj3eA33G1HXZCrymjf7sTb4CEdMgSUBe3wLXjrFPP4Og0RJ8YGhCG QBG0ZENwcseUk5bmbSpp9Ac60URy1si1pD1nB3z+zSQT2ViA8QHgzg3Hpwgm+0R7 WG/QzIFoGoJ8J9sDX/tXdkwUT2Z3yusxXcwK7Be5dtUcu1codf+EaPxRNG55myRW J2fY+KIocgQK8lo3w9ok1sGTyN+YkS8eIQqTeZzg0Gty/LX7bwH/3ScCeQtbu9RH nAR2OJQkc/wJ8sJojmUmDnBgskvgWzdfxfxRGQwlnRMD0W3t0LUDCeIUZ/1OL69l 4pgvFLaR5MRD/DS4sSftKcOpgH5KDTlfuUXA+PamELLAk93FWJEZTVI4hmUR02+h /0QoRE6FAraQ7IY9TuLd/Jj3wWmqvataL6JGuWSdmhd35PbRxxBun+5zCyj62BAM /h0SjrCMD+dhotcdiekHINNbNYRG6ukbswgP6zCtuq4icTCW8SMVNyI3mXUVQwF3 hAc3FylKpdGkgSrK3unLnBSeBgGwnCy1PYtusx0MgJf/qhdPYsl0bwgZhcR1U01y WfyGrwoNhLEmxL6+zECQ =gMVX -----END PGP SIGNATURE----- Merge tag 'nfs-for-4.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfix from Trond Myklebust: "Fix a NFSv4 file locking regression" * tag 'nfs-for-4.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Fix _nfs4_do_setlk()	2018-08-03 10:42:01 -07:00
Huang Chong	a0e336ba3e	xfs: fix a comment in xfs_log_reserve Fix the comment in xfs_log_reserve to avoid confusing. Signed-of-by: Huang Chong <huang.chong@zte.com.cn> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-08-03 08:17:54 -07:00
Darrick J. Wong	1f31c98d65	xfs: only validate summary counts on primary superblock Skip the summary counter checks for secondary superblocks and inprogress primary superblocks because mkfs has always written those out with zeroed summary counters. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>	2018-08-03 08:17:35 -07:00
Andreas Gruenbacher	21e2156f3c	gfs2: Get rid of gfs2_ea_strlen Function gfs2_ea_strlen is only called from ea_list_i, so inline it there. Remove the duplicate switch statement and the creative use of memcpy to set a null byte. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com> Reviewed-by: Bob Peterson <rpeterso@redhat.com>	2018-08-03 13:20:02 +01:00
Thomas Bianchi	c2b6e1591b	xfs: substitute spaces with tabs Inside xfs_attr_shortform_list removes spaces at the beginnig of the line and replaces with tabs. Issue found by checkpatch. ERROR: code indent should use tabs where possible Signed-off-by: Thomas Bianchi <thomas.bianchi8@gmail.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-08-02 23:05:14 -07:00
Brian Foster	9d9e623385	xfs: fold dfops into the transaction struct xfs_defer_ops has now been reduced to a single list_head. The external dfops mechanism is unused and thus everywhere a (permanent) transaction is accessible the associated dfops structure is as well. Remove the xfs_defer_ops structure and fold the list_head into the transaction. Also remove the last remnant of external dfops in xfs_trans_dup(). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-08-02 23:05:14 -07:00
Brian Foster	c03edc9e49	xfs: always defer agfl block frees The AGFL fixup code conditionally defers block frees from the free list based on whether the current transaction has an associated xfs_defer_ops structure. Now that dfops is embedded in the transaction and the internal dfops is used unconditionally, this invariant is always true. Remove the now dead logic to check for ->t_dfops in xfs_alloc_fix_freelist() and unconditionally defer AGFL block frees. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-08-02 23:05:14 -07:00
Brian Foster	0f37d1780c	xfs: pass transaction to xfs_defer_add() The majority of remaining references to struct xfs_defer_ops in XFS are associated with xfs_defer_add(). At this point, there are no more external xfs_defer_ops users left. All instances of xfs_defer_ops are embedded in the transaction, which means we can safely pass the transaction down to the dfops add interface. Update xfs_defer_add() to receive the transaction as a parameter. Various subsystems implement wrappers to allocate and construct the context specific data structures for the associated deferred operation type. Update these to also carry the transaction down as needed and clean up unused dfops parameters along the way. This removes most of the remaining references to struct xfs_defer_ops throughout the code and facilitates removal of the structure. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> [darrick: fix unused variable warnings with ftrace disabled] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-08-02 23:05:14 -07:00
Brian Foster	1ae093cbea	xfs: replace xfs_defer_ops ->dop_pending with on-stack list The xfs_defer_ops ->dop_pending list is used to track active deferred operations once intents are logged. These items must be aborted in the event of an error. The list is populated as intents are logged and items are removed as they complete (or are aborted). Now that xfs_defer_finish() cancels on error, there is no need to ever access ->dop_pending outside of xfs_defer_finish(). The list is only ever populated after xfs_defer_finish() begins and is either completed or cancelled before it returns. Remove ->dop_pending from xfs_defer_ops and replace it with a local list in the xfs_defer_finish() path. Pass the local list to the various helpers now that it is not accessible via dfops. Note that we have to check for NULL in the abort case as the final tx roll occurs outside of the scope of the new local list (once the dfops has completed and thus drained the list). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-08-02 23:05:14 -07:00
Brian Foster	9b1f4e9831	xfs: cancel dfops on xfs_defer_finish() error The current semantics of xfs_defer_finish() require the caller to call xfs_defer_cancel() on error. This is slightly inconsistent with transaction commit error handling where a failed commit cleans up the transaction before returning. More significantly, the only requirement for exposure of ->dop_pending outside of xfs_defer_finish() is so that xfs_defer_cancel() can drain it on error. Since the only recourse of xfs_defer_finish() errors is cancellation, mirror the transaction logic and cancel remaining dfops before returning from xfs_defer_finish() with an error. Beside simplifying xfs_defer_finish() semantics, this ensures that xfs_defer_finish() always returns with an empty ->dop_pending and thus facilitates removal of the list from xfs_defer_ops. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-08-02 23:05:14 -07:00
Brian Foster	60f31a609e	xfs: clean out superfluous dfops dop params/vars The dfops code still passes around the xfs_defer_ops pointer superfluously in a few places. Clean this up wherever the transaction will suffice. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-08-02 23:05:14 -07:00
Brian Foster	7dbddbaccd	xfs: drop dop param from xfs_defer_op_type ->finish_item() callback The dfops infrastructure ->finish_item() callback passes the transaction and dfops as separate parameters. Since dfops is always part of a transaction, the latter parameter is no longer necessary. Remove it from the various callbacks. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-08-02 23:05:14 -07:00
Brian Foster	a8198666fb	xfs: automatic dfops inode relogging Inodes that are held across deferred operations are explicitly joined to the dfops structure to ensure appropriate relogging. While inodes are currently joined explicitly, we can detect the conditions that require relogging at dfops finish time by inspecting the transaction item list for inodes with ili_lock_flags == 0. Replace the xfs_defer_ijoin() infrastructure with such detection and automatic relogging of held inodes. This eliminates the need for the per-dfops inode list, replaced by an on-stack variant in xfs_defer_trans_roll(). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-08-02 23:05:14 -07:00
Brian Foster	82ff27bc52	xfs: automatic dfops buffer relogging Buffers that are held across deferred operations are explicitly joined to the dfops structure to ensure appropriate relogging. While buffers are currently joined explicitly, we can detect the conditions that require relogging at dfops finish time by inspecting the transaction item list for held buffers. Replace the xfs_defer_bjoin() infrastructure with such detection and automatic relogging of held buffers. This eliminates the need for the per-dfops buffer list, replaced by an on-stack variant in xfs_defer_trans_roll(). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-08-02 23:05:13 -07:00
Brian Foster	488c919a5b	xfs: add missing defer ijoins for held inodes Log items that require relogging during deferred operations processing are explicitly joined to the associated dfops via the xfs_defer_*join() helpers. These calls imply that the associated object is "held" by the transaction such that when rolled, the item can be immediately joined to a follow up transaction. For buffers, this means the buffer remains locked and held after each roll. For inodes, this means that the inode remains locked. Failure to join a held item to the dfops structure means the associated object pins the tail of the log while dfops processing completes, because the item never relogs and is not unlocked or released until deferred processing completes. Currently, all buffers that are held in transactions (XFS_BLI_HOLD) with deferred operations are explicitly joined to the dfops. This is not the case for inodes, however, as various contexts defer operations to transactions with held inodes without explicit joins to the associated dfops (and thus not relogging). While this is not a catastrophic problem, it is not ideal. Given that we want to eventually relog such items automatically during dfops processing, start by explicitly adding these missing xfs_defer_ijoin() calls. A call is added everywhere an inode is joined to a transaction without transferring lock ownership and said transaction runs deferred operations. All xfs_defer_ijoin() calls will eventually be replaced by automatic dfops inode relogging. This patch essentially implements the behavior change that would otherwise occur due to automatic inode dfops relogging. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-08-02 23:05:13 -07:00
Brian Foster	1214f1cf66	xfs: replace dop_low with transaction flag The dop_low field enables the low free space allocation mode when a previous allocation has detected difficulty allocating blocks. It has historically been part of the xfs_defer_ops structure, which means if enabled, it remains enabled across a set of transactions until the deferred operations have completed and the dfops is reset. Now that the dfops is embedded in the transaction, we can save a bit more space by using a transaction flag rather than a standalone boolean. Drop the ->dop_low field and replace it with a transaction flag that is set at the same points, carried across rolling transactions and cleared on completion of deferred operations. This essentially emulates the behavior of ->dop_low and so should not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-08-02 23:05:13 -07:00
Brian Foster	ce356d6477	xfs: pass transaction to dfops reset/move helpers All callers pass ->t_dfops of the associated transactions. Refactor the helpers to receive the transactions and facilitate further cleanups between xfs_defer_ops and xfs_trans. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-08-02 23:05:13 -07:00
Brian Foster	7279aa13b8	xfs: remove unused __xfs_defer_cancel() internal helper With no more external dfops users, there is no need for an xfs_defer_ops cancel wrapper. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-08-02 23:05:13 -07:00
Brian Foster	fbfa977d25	xfs: use transaction for intent recovery instead of raw dfops Log intent recovery is the last user of an external (on-stack) dfops. The pattern exists because the dfops is used to collect additional deferred operations queued during the whole recovery sequence. The dfops is finished with a new transaction after intent recovery completes. We already have a mechanism to create an empty, container-like transaction to support the scrub infrastructure. We can reuse that mechanism here to drop the final user of external dfops. This facilitates folding dfops state (i.e., dop_low) into the transaction, the elimination of now unused external dfops support and also eliminates the only caller of __xfs_defer_cancel(). Replace the on-stack dfops with an empty transaction and pass it around to the various helpers that queue and finish deferred operations during intent recovery. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-08-02 23:05:13 -07:00
Brian Foster	98719051e7	xfs: refactor internal dfops initialization The current transaction allocation code conditionally initializes the ->t_dfops indirection pointer. Transaction commit/cancel check the validity of the pointer to determine whether to finish/cancel the internal dfops. This disallows the ability to use the internal dfops list as a temporary container (via xfs_trans_alloc_empty()). Refactor transaction allocation to always initialize ->t_dfops and check permanent reservation state on transaction commit/cancel. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-08-02 23:05:13 -07:00
Mike Rapoport	31e810aa10	userfaultfd: remove uffd flags from vma->vm_flags if UFFD_EVENT_FORK fails The fix in commit `0cbb4b4f4c` ("userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails") cleared the vma->vm_userfaultfd_ctx but kept userfaultfd flags in vma->vm_flags that were copied from the parent process VMA. As the result, there is an inconsistency between the values of vma->vm_userfaultfd_ctx.ctx and vma->vm_flags which triggers BUG_ON in userfaultfd_release(). Clearing the uffd flags from vma->vm_flags in case of UFFD_EVENT_FORK failure resolves the issue. Link: http://lkml.kernel.org/r/1532931975-25473-1-git-send-email-rppt@linux.vnet.ibm.com Fixes: `0cbb4b4f4c` ("userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails") Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Reported-by: syzbot+121be635a7a35ddb7dcb@syzkaller.appspotmail.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-02 16:03:40 -07:00
Eric Sandeen	79b3dbe4ad	fs: fix iomap_bmap position calculation The position calculation in iomap_bmap() shifts bno the wrong way, so we don't progress properly and end up re-mapping block zero over and over, yielding an unchanging physical block range as the logical block advances: # filefrag -Be file ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 0: 21.. 21: 1: merged 1: 1.. 1: 21.. 21: 1: 22: merged Discontinuity: Block 1 is at 21 (was 22) 2: 2.. 2: 21.. 21: 1: 22: merged Discontinuity: Block 2 is at 21 (was 22) 3: 3.. 3: 21.. 21: 1: 22: merged This breaks the FIBMAP interface for anyone using it (XFS), which in turn breaks LILO, zipl, etc. Bug-actually-spotted-by: Darrick J. Wong <darrick.wong@oracle.com> Fixes: `89eb1906a9` ("iomap: add an iomap-based bmap implementation") Cc: stable@vger.kernel.org Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-08-02 13:09:27 -07:00
Bart Van Assche	2afc9166f7	scsi: sysfs: Introduce sysfs_{un,}break_active_protection() Introduce these two functions and export them such that the next patch can add calls to these functions from the SCSI core. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-08-02 15:53:15 -04:00
Chengguang Xu	8687a3e2c7	ceph: add additional offset check in ceph_write_iter() If the offset is larger or equal to both real file size and max file size, then return -EFBIG. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2018-08-02 21:33:28 +02:00
Chengguang Xu	0671e9968d	ceph: add additional range check in ceph_fallocate() If the range is larger than both real file size and limit of max file size, then return -EFBIG. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2018-08-02 21:33:28 +02:00
Chengguang Xu	719784ba70	ceph: add new field max_file_size in ceph_fs_client In order to not bother to VFS and other specific filesystems, we decided to do offset validation inside ceph kernel client, so just simply set sb->s_maxbytes to MAX_LFS_FILESIZE so that it can successfully pass VFS check. We add new field max_file_size in ceph_fs_client to store real file size limit and doing proper check based on it. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2018-08-02 21:33:27 +02:00
Ilya Dryomov	6daca13d2e	libceph: add authorizer challenge When a client authenticates with a service, an authorizer is sent with a nonce to the service (ceph_x_authorize_[ab]) and the service responds with a mutation of that nonce (ceph_x_authorize_reply). This lets the client verify the service is who it says it is but it doesn't protect against a replay: someone can trivially capture the exchange and reuse the same authorizer to authenticate themselves. Allow the service to reject an initial authorizer with a random challenge (ceph_x_authorize_challenge). The client then has to respond with an updated authorizer proving they are able to decrypt the service's challenge and that the new authorizer was produced for this specific connection instance. The accepting side requires this challenge and response unconditionally if the client side advertises they have CEPHX_V2 feature bit. This addresses CVE-2018-1128. Link: http://tracker.ceph.com/issues/24836 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>	2018-08-02 21:33:24 +02:00
Souptick Joarder	24499847e4	ceph: adding new return type vm_fault_t Use new return type vm_fault_t for page_mkwrite and fault handler. Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2018-08-02 21:33:20 +02:00
Arnd Bergmann	0ed1e90a09	ceph: use timespec64 for r_stamp The ceph_mds_request stamp still uses the deprecated timespec structure, this converts it over as well. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2018-08-02 21:33:19 +02:00
Arnd Bergmann	fac02ddf91	libceph: use timespec64 for r_mtime The request mtime field is used all over ceph, and is currently represented as a 'timespec' structure in Linux. This changes it to timespec64 to allow times beyond 2038, modifying all users at the same time. [ Remove now redundant ts variable in writepage_nounlock(). ] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2018-08-02 21:33:14 +02:00
Arnd Bergmann	9bbeab41ce	ceph: use timespec64 for inode timestamp Since the vfs structures are all using timespec64, we can now change the internal representation, using ceph_encode_timespec64 and ceph_decode_timespec64. In case of ceph_aux_inode however, we need to avoid doing a memcmp() on uninitialized padding data, so the members of the i_mtime field get copied individually into 64-bit integers. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2018-08-02 21:26:12 +02:00
Arnd Bergmann	63ecae7e43	ceph: stop using current_kernel_time() ceph_mdsc_create_request() is one of the last callers of the deprecated current_kernel_time() as well as timespec_trunc(). This changes it to use the timespec64 based interfaces instead, though we still need to convert the result until we are ready to change over req->r_stamp. The output of the two functions, ktime_get_coarse_real_ts64() and current_kernel_time() is the same coarse-granular timestamp, the only difference here is that ktime_get_coarse_real_ts64() doesn't overflow in 2038. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2018-08-02 21:26:12 +02:00
Chengguang Xu	67fcd15140	ceph: add d_drop for some error cases in ceph_symlink() When file num exceeds quota limit, should call d_drop to drop dentry from cache as well. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2018-08-02 21:26:12 +02:00
Chengguang Xu	0459871c49	ceph: add d_drop for some error cases in ceph_mknod() When file num exceeds quota limit or fails from ceph_per_init_acls() should call d_drop to drop dentry from cache as well. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2018-08-02 21:26:12 +02:00
Chengguang Xu	61ad36d47d	ceph: return errors from posix_acl_equiv_mode() correctly In order to return correct error code should replace variable ret using err in error case. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2018-08-02 21:26:12 +02:00
Yan, Zheng	dfeb84d4ad	ceph: fix incorrect use of strncpy GCC8 prints following warning: fs/ceph/mds_client.c:3683:2: warning: ‘strncpy’ output may be truncated copying 64 bytes from a string of length 64 [-Wstringop-truncation] [ Change to strscpy() while at it. ] Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2018-08-02 21:26:11 +02:00
Ilya Dryomov	2f56b6bae7	libceph: amend "bad option arg" error message Don't mention "mount" -- in the rbd case it is "mapping". Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2018-08-02 21:26:11 +02:00
Chengguang Xu	93d35c754d	ceph: restore ctime as well in the case of restoring old mode It's better to restore ctime as well in the case of restoring old mode in ceph_set_acl(). Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2018-08-02 21:26:11 +02:00
Chengguang Xu	f017754d69	ceph: add retry logic for error -ERANGE in ceph_get_acl() When the size of acl extended attribution is larger than pre-allocated value buffer size, we will hit error '-ERANGE' and it's probabaly caused by concurrent get/set acl from different clients. In this case, current logic just sets acl to NULL so that we cannot get proper information but the operation looks successful. This patch adds retry logic for error -ERANGE and return -EIO if fail from the retry. Additionally, print real errno when failing from __ceph_getxattr(). Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2018-08-02 21:26:11 +02:00
David S. Miller	89b1698c93	Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net The BTF conflicts were simple overlapping changes. The virtio_net conflict was an overlap of a fix of statistics counter, happening alongisde a move over to a bonafide statistics structure rather than counting value on the stack. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-08-02 10:55:32 -07:00
Phillip Lougher	a3f94cb99a	Squashfs: Compute expected length from inode size rather than block length Previously in squashfs_readpage() when copying data into the page cache, it used the length of the datablock read from the filesystem (after decompression). However, if the filesystem has been corrupted this data block may be short, which will leave pages unfilled. The fix for this is to compute the expected number of bytes to copy from the inode size, and use this to detect if the block is short. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Tested-by: Willy Tarreau <w@1wt.eu> Cc: Анатолий Тросиненко <anatoly.trosinenko@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-02 09:34:02 -07:00
Linus Torvalds	71755ee535	squashfs: more metadata hardening The squashfs fragment reading code doesn't actually verify that the fragment is inside the fragment table. The end result _is_ verified to be inside the image when actually reading the fragment data, but before that is done, we may end up taking a page fault because the fragment table itself might not even exist. Another report from Anatoly and his endless squashfs image fuzzing. Reported-by: Анатолий Тросиненко <anatoly.trosinenko@gmail.com> Acked-by:: Phillip Lougher <phillip.lougher@gmail.com>, Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-02 09:32:23 -07:00
Liu Song	bc71652346	ext4: improve code readability in ext4_iget() Merge the duplicated complex conditions to improve code readability. Signed-off-by: Liu Song <liu.song11@zte.com.cn> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>	2018-08-02 00:11:16 -04:00
Jeremy Cline	1a5d5e5d51	ext4: fix spectre gadget in ext4_mb_regular_allocator() 'ac->ac_g_ex.fe_len' is a user-controlled value which is used in the derivation of 'ac->ac_2order'. 'ac->ac_2order', in turn, is used to index arrays which makes it a potential spectre gadget. Fix this by sanitizing the value assigned to 'ac->ac2_order'. This covers the following accesses found with the help of smatch: * fs/ext4/mballoc.c:1896 ext4_mb_simple_scan_group() warn: potential spectre issue 'grp->bb_counters' [w] (local cap) * fs/ext4/mballoc.c:445 mb_find_buddy() warn: potential spectre issue 'EXT4_SB(e4b->bd_sb)->s_mb_offsets' [r] (local cap) * fs/ext4/mballoc.c:446 mb_find_buddy() warn: potential spectre issue 'EXT4_SB(e4b->bd_sb)->s_mb_maxs' [r] (local cap) Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jeremy Cline <jcline@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2018-08-02 00:03:40 -04:00
Al Viro	c971e6a006	kill d_instantiate_no_diralias() The only user is fuse_create_new_entry(), and there it's used to mitigate the same mkdir/open-by-handle race as in nfs_mkdir(). The same solution applies - unhash the mkdir argument, then call d_splice_alias() and if that returns a reference to preexisting alias, dput() and report success. ->mkdir() argument left unhashed negative with the preexisting alias moved in the right place is just fine from the ->mkdir() callers point of view. Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-08-01 23:18:53 -04:00
Trond Myklebust	6ea76bf513	NFSv4: Fix _nfs4_do_setlk() The patch to fix the case where a lock request was interrupted ended up changing default handling of errors such as NFS4ERR_DENIED and caused the client to immediately resend the lock request. Let's do a partial revert of that request so that the default is now to exit, but change the way we handle resends to take into account the fact that the user may have interrupted the request. Reported-by: Kenneth Johansson <ken@kenjo.org> Fixes: `a3cf9bca2a` ("NFSv4: Don't add a new lock on an interrupted wait..") Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>	2018-08-01 23:17:06 -04:00
Christoph Hellwig	006477f40d	kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt No need to have this in the top-level Kconfig. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>	2018-08-02 08:06:55 +09:00
Linus Torvalds	cdbb65c4c7	squashfs metadata 2: electric boogaloo Anatoly continues to find issues with fuzzed squashfs images. This time, corrupt, missing, or undersized data for the page filling wasn't checked for, because the squashfs_{copy,read}_cache() functions did the squashfs_copy_data() call without checking the resulting data size. Which could result in the page cache pages being incompletely filled in, and no error indication to the user space reading garbage data. So make a helper function for the "fill in pages" case, because the exact same incomplete sequence existed in two places. [ I should have made a squashfs branch for these things, but I didn't intend to start doing them in the first place. My historical connection through cramfs is why I got into looking at these issues at all, and every time I (continue to) think it's a one-off. Because _this_ time is always the last time. Right? - Linus ] Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Tested-by: Willy Tarreau <w@1wt.eu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-01 10:38:43 -07:00
Theodore Ts'o	7d95178c77	ext4: check for NUL characters in extended attribute's name Extended attribute names are defined to be NUL-terminated, so the name must not contain a NUL character. This is important because there are places when remove extended attribute, the code uses strlen to determine the length of the entry. That should probably be fixed at some point, but code is currently really messy, so the simplest fix for now is to simply validate that the extended attributes are sane. https://bugzilla.kernel.org/show_bug.cgi?id=200401 Reported-by: Wen Xu <wen.xu@gatech.edu> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2018-08-01 12:36:52 -04:00
Wang Shilong	5ef2a69993	ext4: use ext4_warning() for sb_getblk failure Out of memory should not be considered as critical errors; so replace ext4_error() with ext4_warnig(). Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2018-08-01 12:02:31 -04:00
Darrick J. Wong	56830d6cc1	xfs: check da node magic in _node_lookup_int Before we start processing what we /think/ is a da3 node block, actually check the magic to make sure that we're looking at a node block. This way we won't blow the asserts in _node_hdr_from_disk on corrupted metadata. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>	2018-08-01 07:42:43 -07:00
Darrick J. Wong	611995db2c	xfs: use a local variable for magic number in xfs_da3_node_lookup_int Use a local variable for the block magic number checks instead of abusing blk->magic. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>	2018-08-01 07:42:18 -07:00
Darrick J. Wong	0c60d3aa0e	xfs: refactor log recovery check Add a predicate to decide if the log is actively in recovery and use that instead of open-coding a pagf_init check in the attr leaf verifier. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>	2018-08-01 07:40:48 -07:00
Darrick J. Wong	ff23f4af7e	xfs: move extent busy tree initialization to xfs_initialize_perag Move the per-AG busy extent tree initialization to the per-ag structure initialization since we don't want online repair to leak the old tree. We only deconstruct the tree at unmount time, so this should be safe. This also enables us to eliminate the commented out initialization in the xfsprogs libxfs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2018-07-31 13:18:09 -07:00
Christoph Hellwig	e666aa37f4	xfs: avoid COW fork extent lookups in writeback if the fork didn't change Used the per-fork sequence counter to avoid lookups in the writeback code unless the COW fork actually changed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-31 13:18:09 -07:00
Christoph Hellwig	745b3f76d1	xfs: maintain a sequence count for inode fork manipulations Add a simple 32-bit unsigned integer as the sequence count for modifications to the extent list in the inode fork. This will be used to optimize away extent list lookups in the writeback code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-31 13:18:09 -07:00
Darrick J. Wong	9e037cb797	xfs: check for unknown v5 feature bits in superblock write verifier Make sure we never try to write the superblock with unknown feature bits set. We checked those at mount time, so if they're set now then memory is corrupt. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>	2018-07-31 13:18:09 -07:00
Darrick J. Wong	69775fd15d	xfs: verify icount in superblock write Add a helper predicate to check the inode count for sanity, then use it in the superblock write verifier to inspect sb_icount. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>	2018-07-31 13:18:09 -07:00
Bill O'Donnell	8756a5af18	libxfs: add more bounds checking to sb sanity checks Current sb verifier doesn't check bounds on sb_fdblocks and sb_ifree. Add sanity checks for these parameters. Signed-off-by: Bill O'Donnell <billodo@redhat.com> [darrick: port to refactored sb validation predicates] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>	2018-07-31 13:18:09 -07:00
Darrick J. Wong	eca383fcd6	xfs: refactor superblock verifiers Split the superblock verifier into the common checks, the read-time checks, and the write-time check functions. No functional changes, but we're setting up to add more write-only checks. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>	2018-07-31 13:18:09 -07:00
Darrick J. Wong	86d969b425	xfs: refactor the xrep_extent_list into xfs_bitmap As mentioned previously, the xrep_extent_list basically implements a bitmap with two functions: set and disjoint union. Rename all these functions to xfs_bitmap to shorten the name and make it more obvious what we're doing. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-07-31 13:18:08 -07:00
Zubin Mithra	5248ee8560	tracefs: Annotate tracefs_ops with __ro_after_init tracefs_ops is initialized inside tracefs_create_instance_dir and not modified after. tracefs_create_instance_dir allows for initialization only once, and is called from create_trace_instances(marked __init), which is called from tracer_init_tracefs(marked __init). Also, mark tracefs_create_instance_dir as __init. Link: http://lkml.kernel.org/r/20180725171901.4468-1-zsm@chromium.org Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Zubin Mithra <zsm@chromium.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-07-31 11:32:44 -04:00
Linus Torvalds	d512584780	squashfs: more metadata hardening Anatoly reports another squashfs fuzzing issue, where the decompression parameters themselves are in a compressed block. This causes squashfs_read_data() to be called in order to read the decompression options before the decompression stream having been set up, making squashfs go sideways. Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Acked-by: Phillip Lougher <phillip.lougher@gmail.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-07-30 17:29:17 -07:00
Mauro Carvalho Chehab	d21c249b26	media: dvb/audio.h: get rid of unused APIs There are a number of other ioctls that aren't used anywhere inside the Kernel tree. Get rid of them. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>	2018-07-30 16:21:49 -04:00
Mauro Carvalho Chehab	b41e44b4cb	media: dvb/video.h: get rid of unused APIs There are a number of other ioctls that aren't used anywhere inside the Kernel tree. Get rid of them. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>	2018-07-30 15:43:47 -04:00
Christoph Hellwig	51d6269030	xfs: introduce a new xfs_inode_has_cow_data helper We have a few places that already check if an inode has actual data in the COW fork to avoid work on reflink inodes that do not actually have outstanding COW blocks. There are a few more places that can avoid working if doing the same check, so add a documented helper for this condition and use it in all places where it makes sense. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-30 07:57:48 -07:00
Christoph Hellwig	3ba738df25	xfs: remove the xfs_ifork_t typedef We only have a few more callers left, so seize the opportunity and kill it off. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-30 07:57:48 -07:00
Christoph Hellwig	1216b58b35	xfs: simplify xfs_idata_realloc Streamline the code and take advantage of the fact that kmem_realloc through krealloc will be have like a normal allocation if passing in a NULL old pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-30 07:57:48 -07:00
Christoph Hellwig	fcacbc3f51	xfs: remove if_real_bytes The field is only used for asserts, and to track if we really need to do realloc when growing the inode fork data. But the krealloc function already performs this check internally, so there is no need to keep track of the real allocation size. This will free space in the inode fork for keeping a sequence counter of changes to the extent list. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-30 07:57:48 -07:00
Greg Kroah-Hartman	d2fc88a61b	Merge 4.18-rc7 into driver-core-next We need the driver core changes in here as well for testing. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-07-30 10:08:09 +02:00
Darrick J. Wong	bc270b53e6	xfs: move the repair extent list into its own file Move the xrep_extent_list code into a separate file. Logically, this data structure is really just a clumsy bitmap, and in the next patch we'll make this more obvious. No functional changes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-07-29 22:37:09 -07:00
Darrick J. Wong	ebcbef3a61	xfs: pass transaction lock while setting up agresv on cyclic metadata Pass a tranaction pointer through to all helpers that calculate the per-AG block reservation. Online repair will use this to reinitialize per-ag reservations while it still holds all the AG headers locked to the repair transaction. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-07-29 22:37:08 -07:00
Wang Shilong	9af0b3d125	ext4: fix race when setting the bitmap corrupted flag Whenever we hit block or inode bitmap corruptions we set bit and then reduce this block group free inode/clusters counter to expose right available space. However some of ext4_mark_group_bitmap_corrupted() is called inside group spinlock, some are not, this could make it happen that we double reduce one block group free counters from system. Always hold group spinlock for it could fix it, but it looks a little heavy, we could use test_and_set_bit() to fix race problems here. Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2018-07-29 17:27:45 -04:00
Eric Sandeen	f39b3f45db	ext4: reset error code in ext4_find_entry in fallback When ext4_find_entry() falls back to "searching the old fashioned way" due to a corrupt dx dir, it needs to reset the error code to NULL so that the nonstandard ERR_BAD_DX_DIR code isn't returned to userspace. https://bugzilla.kernel.org/show_bug.cgi?id=199947 Reported-by: Anatoly Trosinenko <anatoly.trosinenko@yandex.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2018-07-29 17:13:42 -04:00
Ross Zwisler	430657b6be	ext4: handle layout changes to pinned DAX mappings Follow the lead of xfs_break_dax_layouts() and add synchronization between operations in ext4 which remove blocks from an inode (hole punch, truncate down, etc.) and pages which are pinned due to DAX DMA operations. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Lukas Czerner <lczerner@redhat.com>	2018-07-29 17:00:22 -04:00
Ross Zwisler	cdbf8897cb	dax: dax_layout_busy_page() warn on !exceptional Inodes using DAX should only ever have exceptional entries in their page caches. Make this clear by warning if the iteration in dax_layout_busy_page() ever sees a non-exceptional entry, and by adding a comment for the pagevec_release() call which only deals with struct page pointers. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2018-07-29 16:59:16 -04:00
Linus Torvalds	3cfb6772d4	Some miscellaneous ext4 fixes for 4.18; one fix is for a regression introduced in 4.18-rc4. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlteF34ACgkQ8vlZVpUN gaMQugf+LjlbbncSEuPxZ+C3CnSGkEzjrg8IRylZA2uf04Z5Bax8K5gqvXLx7ZtF Qz3vzmrYpaUV8UiaMy0SGLCRWebwoxPEN7ZX3/W1PfeymP3wQ4DLw37059AzLfsq Vzh9w3N1At1plUee7iJ2MDBU830Q0a917jjnpZ+M0AtQx/BzP8QEISuzp4JWICqe NbJDVybMWoW2YOSpMPiihxSFqCDx5rMyAJ1vllboopZK+XAjpQ/visnLh3aT3o71 7cTPl9gI2rbwYbJk8kM5fmXhWqSARHARV1bpZNOUnCAUU1E2Se7aETjggQ0QzJE/ mIc7wCzFLrrY8+iakwdhb5Aw3qOPyg== =ZdXo -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Some miscellaneous ext4 fixes for 4.18; one fix is for a regression introduced in 4.18-rc4. Sorry for the late-breaking pull. I was originally going to wait for the next merge window, but Eric Whitney found a regression introduced in 4.18-rc4, so I decided to push out the regression plus the other fixes now. (The other commits have been baking in linux-next since early July)" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix check to prevent initializing reserved inodes ext4: check for allocation block validity with block group locked ext4: fix inline data updates with checksums enabled ext4: clear mmp sequence number when remounting read-only ext4: fix false negatives and false positives in ext4_check_descriptors()	2018-07-29 13:13:45 -07:00
Gustavo A. R. Silva	62bbdd9974	ext4: use swap macro in mext_page_double_lock Make use of the swap macro and remove unnecessary variable tmp. This makes the code easier to read and maintain. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-07-29 16:11:59 -04:00
Tyler Hicks	d175339027	sysfs: Fix regression when adding a file to an existing group Commit `5f81880d52` ("sysfs, kobject: allow creating kobject belonging to arbitrary users") incorrectly changed the argument passed as the parent parameter when calling sysfs_add_file_mode_ns(). This caused some sysfs attribute files to not be added correctly to certain groups. Fixes: `5f81880d52` ("sysfs, kobject: allow creating kobject belonging to arbitrary users") Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Reported-by: Heiner Kallweit <hkallweit1@gmail.com> Tested-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-29 13:11:28 -07:00
Chengguang Xu	21ac738ede	ext4: check allocation failure when duplicating "data" in ext4_remount() There is no check for allocation failure when duplicating "data" in ext4_remount(). Check for failure and return error -ENOMEM in this case. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>	2018-07-29 15:51:54 -04:00
Junichi Uekawa	7f144fd046	ext4: fix warning message in ext4_enable_quotas() Output the warning message before we clobber type and be -1 all the time. The error message would now be [ 1.519791] EXT4-fs warning (device vdb): ext4_enable_quotas:5402: Failed to enable quota tracking (type=0, err=-3). Please run e2fsck to fix. Signed-off-by: Junichi Uekawa <uekawa@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>	2018-07-29 15:51:52 -04:00
Arnd Bergmann	6a0678a79b	ext4: super: extend timestamps to 40 bits The inode timestamps use 34 bits in ext4, but the various timestamps in the superblock are limited to 32 bits. If every user accesses these as 'unsigned', then this is good until year 2106, but it seems better to extend this a bit further in the process of removing the deprecated get_seconds() function. This adds another byte for each timestamp in the superblock, making them long enough to store timestamps beyond what is in the inodes, which seems good enough here (in ocfs2, they are already 64-bit wide, which is appropriate for a new layout). I did not modify e2fsprogs, which obviously needs the same change to actually interpret future timestamps correctly. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-07-29 15:51:48 -04:00
Arnd Bergmann	b42d1d6b5b	jbd2: replace current_kernel_time64 with ktime equivalent jbd2 is one of the few callers of current_kernel_time64(), which is a wrapper around ktime_get_coarse_real_ts64(). This calls the latter directly for consistency with the rest of the kernel that is moving to the ktime_get_ family of time accessors. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-07-29 15:51:47 -04:00
Arnd Bergmann	7b62b29320	ext4: use timespec64 for all inode times This is the last missing piece for the inode times on 32-bit systems: now that VFS interfaces use timespec64, we just need to stop truncating the tv_sec values for y2038 compatibililty. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-07-29 15:51:00 -04:00
Arnd Bergmann	5ffff83432	ext4: use ktime_get_real_seconds for i_dtime We only care about the low 32-bit for i_dtime as explained in commit `b5f515735b` ("ext4: avoid Y2038 overflow in recently_deleted()"), so the use of get_seconds() is correct here, but that function is getting removed in the process of the y2038 fixes, so let's use the modern ktime_get_real_seconds() here. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-07-29 15:50:00 -04:00
Arnd Bergmann	af123b3718	ext4: use 64-bit timestamps for mmp_time The mmp_time field is 64 bits wide, which is good, but calling get_seconds() results in a 32-bit value on 32-bit architectures. Using ktime_get_real_seconds() instead returns 64 bits everywhere. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-07-29 15:49:00 -04:00
Arnd Bergmann	a4d2aadca1	ext4: sysfs: print ext4_super_block fields as little-endian While working on extended rand for last_error/first_error timestamps, I noticed that the endianess is wrong; we access the little-endian fields in struct ext4_super_block as native-endian when we print them. This adds a special case in ext4_attr_show() and ext4_attr_store() to byteswap the superblock fields if needed. In older kernels, this code was part of super.c, it got moved to sysfs.c in linux-4.4. Cc: stable@vger.kernel.org Fixes: `52c198c682` ("ext4: add sysfs entry showing whether the fs contains errors") Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-07-29 15:48:00 -04:00
Linus Torvalds	01cfb7937a	squashfs: be more careful about metadata corruption Anatoly Trosinenko reports that a corrupted squashfs image can cause a kernel oops. It turns out that squashfs can end up being confused about negative fragment lengths. The regular squashfs_read_data() does check for negative lengths, but squashfs_read_metadata() did not, and the fragment size code just blindly trusted the on-disk value. Fix both the fragment parsing and the metadata reading code. Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Phillip Lougher <phillip@squashfs.org.uk> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-07-29 12:44:46 -07:00
Theodore Ts'o	5012284700	ext4: fix check to prevent initializing reserved inodes Commit 8844618d8aa7: "ext4: only look at the bg_flags field if it is valid" will complain if block group zero does not have the EXT4_BG_INODE_ZEROED flag set. Unfortunately, this is not correct, since a freshly created file system has this flag cleared. It gets almost immediately after the file system is mounted read-write --- but the following somewhat unlikely sequence will end up triggering a false positive report of a corrupted file system: mkfs.ext4 /dev/vdc mount -o ro /dev/vdc /vdc mount -o remount,rw /dev/vdc Instead, when initializing the inode table for block group zero, test to make sure that itable_unused count is not too large, since that is the case that will result in some or all of the reserved inodes getting cleared. This fixes the failures reported by Eric Whiteney when running generic/230 and generic/231 in the the nojournal test case. Fixes: `8844618d8a` ("ext4: only look at the bg_flags field if it is valid") Reported-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-07-29 15:34:00 -04:00
Linus Torvalds	eb181a814c	for-linus-20180727 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAltbc20QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpgh4D/9GYQcjk9qLVFxkv5ucAUvCuxEL6gjsMf4W M/QdxVIrwh3zpvsH++2IXXn+xH+UjujMA5NkzhsSr4+hsSO2iAGOYMJbroNfhsTD onvQQ6NTaHPu/+PZs0otVK4KMWHwZGWOV6YU00TWTfRgzRmGEsSMe91oeBIXVv9w v6d09twaLSY0lUkAAbcdu5fuFBtXu4Bxy60qyHEKkAdWWHEUYaZLrODhVjoGg2V4 KdAWS5X4A6kJMcPcoOvG6RFtpf71boaip9o/DRLUWhGdIQnI38UgSCUmz1XMYnik Sq8r74vqCm8IhIOLTlxnPrMHHbKv7JZhY3Ow9fxnS6HZRNI0aPX31Yml6NULqnWh MsQh+6gZXd3xC1O7txEQn4a15Lk0OLXa8HJcIn5ADNxqz5/r/g0mPUG9HmPSIalO ISFF/9UKQFcAd0RjHR+bEEH2VMznz59UWKfdOsmwFZtZSCmR1ucj0xAKDj+oP1JS ZsgZ09K2GezrL4GEueocISo9ACIWgDWH8T7/bTxlBok0IYbybAfmOe+MZInL1Tf4 pklmoXm3ntgV3Pq8Ptk05LYyIgAaUIltuSiR3AFaXIADX0wNtV0ZgysIWgHf3BSA 18j+I1yPG1IwBdM8xNwxi56xMQR84uY5tsIyafbfj+laRI2nH5OIYjNZnrKpm957 4xZUgIECBA== =2ogY -----END PGP SIGNATURE----- Merge tag 'for-linus-20180727' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Bigger than usual at this time, mostly due to the O_DIRECT corruption issue and the fact that I was on vacation last week. This contains: - NVMe pull request with two fixes for the FC code, and two target fixes (Christoph) - a DIF bio reset iteration fix (Greg Edwards) - two nbd reply and requeue fixes (Josef) - SCSI timeout fixup (Keith) - a small series that fixes an issue with bio_iov_iter_get_pages(), which ended up causing corruption for larger sized O_DIRECT writes that ended up racing with buffered writes (Martin Wilck)" * tag 'for-linus-20180727' of git://git.kernel.dk/linux-block: block: reset bi_iter.bi_done after splitting bio block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs blkdev: __blkdev_direct_IO_simple: fix leak in error case block: bio_iov_iter_get_pages: fix size of last iovec nvmet: only check for filebacking on -ENOTBLK nvmet: fixup crash on NULL device path scsi: set timed out out mq requests to complete blk-mq: export setting request completion state nvme: if_ready checks to fail io to deleting controller nvmet-fc: fix target sgl list on large transfers nbd: handle unexpected replies better nbd: don't requeue the same request twice.	2018-07-27 12:51:00 -07:00
Linus Torvalds	864af0d40c	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "11 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: kvm, mm: account shadow page tables to kmemcg zswap: re-check zswap_is_full() after do zswap_shrink() include/linux/eventfd.h: include linux/errno.h mm: fix vma_is_anonymous() false-positives mm: use vma_init() to initialize VMAs on stack and data segments mm: introduce vma_init() mm: fix exports that inadvertently make put_page() EXPORT_SYMBOL_GPL ipc/sem.c: prevent queue.status tearing in semop mm: disallow mappings that conflict for devm_memremap_pages() kasan: only select SLUB_DEBUG with SYSFS=y delayacct: fix crash in delayacct_blkio_end() after delayacct init failure	2018-07-27 10:30:47 -07:00
Linus Torvalds	f636d300cd	Changes since last update: - Fix some uninitialized variable errors - Fix an incorrect check in metadata verifiers -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAltYpkoACgkQ+H93GTRK tOsMAA//Tyt2rjGGrvtPUiI9xhDDbYM+Eds19IWhye9LyNQCHXdmrCicsBvoEyCC 5XSAT5lofeLNIbiTS88aC0b4sr2LLban6YsTBHGTlRxUTrnCSpCCDIgXJswxLjmT jivIumvKL3sxgmXubwe6gnjoLCNGIy3JrdCu4vFf6JGWAj6U5HyZ5hjtj74nuPtg w6BMEptJIOmQwGzSjQY76dQ5ekliVuOtYISY6gRAfVPVvwURgIzZdQPi4qV5Kw/d n2nA6rvMBUcMUSVvXWS1ryOWsy4HrB9LXzbr5Kb0NgaVKnAqSCYGIGMJSEsiO/7Y P83Doo6N8fYh8QEUOLqJ76XTkkrzoo3fvo7IZXUGMERXx90UliEAI/k6hWy6awtT cCQatAcOp+8r5PvMJ9ZIivAwDId06PwpuDntOATIamGkNEo4vo0LO189fQP+i8RD LIbEcLcGOHVjjTZgGqJCfDWVPiFtG8ZdZp9bvmpW9aREzMGl/tXnvI2QsSwZu+lU 87efBqztYGm4U4D5grdV/ynbT1E4E9ggtI2pVHG2ipJnZ+UeTiOCw68lDcUDT0JA lU2fPUKzUR3v+U6s26AJFKcX2HCG4G75cJozBuH82xcPnUT0m3PMde0ZhFzVnvg4 w8T+bIS0Q/f310SSAitu1qfG5cx2f6I5j107jhldvcibRmqEZLE= =Ovtv -----END PGP SIGNATURE----- Merge tag 'xfs-4.18-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Darrick Wong: - Fix some uninitialized variable errors - Fix an incorrect check in metadata verifiers * tag 'xfs-4.18-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: properly handle free inodes in extent hint validators xfs: Initialize variables in xfs_alloc_get_rec before using them	2018-07-27 09:25:09 -07:00
Mauro Carvalho Chehab	d0dd962d8a	media: dvb: get rid of VIDEO_SET_SPU_PALETTE No upstream drivers use it. It doesn't make any sense to have a compat32 code for something that nobody uses upstream. Reported-by: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>	2018-07-27 06:41:35 -04:00
Kirill A. Shutemov	bfd40eaff5	mm: fix vma_is_anonymous() false-positives vma_is_anonymous() relies on ->vm_ops being NULL to detect anonymous VMA. This is unreliable as ->mmap may not set ->vm_ops. False-positive vma_is_anonymous() may lead to crashes: next ffff8801ce5e7040 prev ffff8801d20eca50 mm ffff88019c1e13c0 prot 27 anon_vma ffff88019680cdd8 vm_ops 0000000000000000 pgoff 0 file ffff8801b2ec2d00 private_data 0000000000000000 flags: 0xff(read\|write\|exec\|shared\|mayread\|maywrite\|mayexec\|mayshare) ------------[ cut here ]------------ kernel BUG at mm/memory.c:1422! invalid opcode: 0000 [#1] SMP KASAN CPU: 0 PID: 18486 Comm: syz-executor3 Not tainted 4.18.0-rc3+ #136 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:zap_pmd_range mm/memory.c:1421 [inline] RIP: 0010:zap_pud_range mm/memory.c:1466 [inline] RIP: 0010:zap_p4d_range mm/memory.c:1487 [inline] RIP: 0010:unmap_page_range+0x1c18/0x2220 mm/memory.c:1508 Call Trace: unmap_single_vma+0x1a0/0x310 mm/memory.c:1553 zap_page_range_single+0x3cc/0x580 mm/memory.c:1644 unmap_mapping_range_vma mm/memory.c:2792 [inline] unmap_mapping_range_tree mm/memory.c:2813 [inline] unmap_mapping_pages+0x3a7/0x5b0 mm/memory.c:2845 unmap_mapping_range+0x48/0x60 mm/memory.c:2880 truncate_pagecache+0x54/0x90 mm/truncate.c:800 truncate_setsize+0x70/0xb0 mm/truncate.c:826 simple_setattr+0xe9/0x110 fs/libfs.c:409 notify_change+0xf13/0x10f0 fs/attr.c:335 do_truncate+0x1ac/0x2b0 fs/open.c:63 do_sys_ftruncate+0x492/0x560 fs/open.c:205 __do_sys_ftruncate fs/open.c:215 [inline] __se_sys_ftruncate fs/open.c:213 [inline] __x64_sys_ftruncate+0x59/0x80 fs/open.c:213 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Reproducer: #include <stdio.h> #include <stddef.h> #include <stdint.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/ioctl.h> #include <sys/mman.h> #include <unistd.h> #include <fcntl.h> #define KCOV_INIT_TRACE _IOR('c', 1, unsigned long) #define KCOV_ENABLE _IO('c', 100) #define KCOV_DISABLE _IO('c', 101) #define COVER_SIZE (1024<<10) #define KCOV_TRACE_PC 0 #define KCOV_TRACE_CMP 1 int main(int argc, char *argv) { int fd; unsigned long cover; system("mount -t debugfs none /sys/kernel/debug"); fd = open("/sys/kernel/debug/kcov", O_RDWR); ioctl(fd, KCOV_INIT_TRACE, COVER_SIZE); cover = mmap(NULL, COVER_SIZE * sizeof(unsigned long), PROT_READ \| PROT_WRITE, MAP_SHARED, fd, 0); munmap(cover, COVER_SIZE * sizeof(unsigned long)); cover = mmap(NULL, COVER_SIZE * sizeof(unsigned long), PROT_READ \| PROT_WRITE, MAP_PRIVATE, fd, 0); memset(cover, 0, COVER_SIZE * sizeof(unsigned long)); ftruncate(fd, 3UL << 20); return 0; } This can be fixed by assigning anonymous VMAs own vm_ops and not relying on it being NULL. If ->mmap() failed to set ->vm_ops, mmap_region() will set it to dummy_vm_ops. This way we will have non-NULL ->vm_ops for all VMAs. Link: http://lkml.kernel.org/r/20180724121139.62570-4-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: syzbot+3f84280d52be9b7083cc@syzkaller.appspotmail.com Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-07-26 19:38:03 -07:00
Kirill A. Shutemov	2c4541e24c	mm: use vma_init() to initialize VMAs on stack and data segments Make sure to initialize all VMAs properly, not only those which come from vm_area_cachep. Link: http://lkml.kernel.org/r/20180724121139.62570-3-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-07-26 19:38:03 -07:00
Bob Peterson	3f30f929bb	gfs2: cleanup: call gfs2_rgrp_ondisk2lvb from gfs2_rgrp_out Before this patch gfs2_rgrp_ondisk2lvb was called after every call to gfs2_rgrp_out. This patch just calls it directly from within gfs2_rgrp_out, and moves the function to be before it so we don't need a function prototype. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>	2018-07-26 14:49:43 -05:00
Martin Wilck	9362dd1109	blkdev: __blkdev_direct_IO_simple: fix leak in error case Fixes: `72ecad22d9` ("block: support a full bio worth of IO for simplified bdev direct-io") Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-26 11:52:33 -06:00
Eric Sandeen	1c02d502c2	xfs: remove deprecated barrier/nobarrier mount The barrier mount options have been no-ops and deprecated since `4cf4573` xfs: deprecate barrier/nobarrier mount option i.e. kernel 4.10 / December 2016, with a stated deprecation schedule after v4.15. Should be fair game to remove them now. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-26 10:15:17 -07:00
Darrick J. Wong	44a8736bd2	xfs: clean up IRELE/iput callsites Replace the IRELE macro with a proper function so that we can do proper typechecking and so that we can stop open-coding iput in scrub, which means that we'll be able to ftrace inode lifetimes going through scrub correctly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-07-26 10:15:16 -07:00
Darrick J. Wong	89c3e8cf3c	xfs: kill IHOLD Nobody uses this macro, get rid of it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>	2018-07-26 10:15:16 -07:00
Brian Foster	b277c37f43	xfs: bypass final dfops roll in trans commit path Once xfs_defer_finish() has completed all deferred operations, it checks the dirty state of the transaction and rolls it once more to return a clean transaction for the caller. This primarily to cover the case where repeated xfs_defer_finish() calls are made in a loop and we need to make sure that the caller starts the next iteration with a clean transaction. Otherwise we risk transaction reservation overrun. This final transaction roll is not required in the transaction commit path, however, because the transaction is immediately committed and freed after dfops completion. Refactor the final roll into a separate helper such that we can avoid it in the transaction commit path. Lift the dfops reset as well so dfops remains valid until after the last call to xfs_defer_trans_roll(). The reset is also unnecessary in the transaction commit path because the transaction is about to complete. This eliminates unnecessary regrants of transactions where the associated transaction roll can be replaced by a transaction commit. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-26 10:15:16 -07:00
Brian Foster	9e28a242be	xfs: drop unnecessary xfs_defer_finish() dfops parameter Every caller of xfs_defer_finish() now passes the transaction and its associated ->t_dfops. The xfs_defer_ops parameter is therefore no longer necessary and can be removed. Since most xfs_defer_finish() callers also have to consider xfs_defer_cancel() on error, update the latter to also receive the transaction for consistency. The log recovery code contains an outlier case that cancels a dfops directly without an available transaction. Retain an internal wrapper to support this outlier case for the time being. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-26 10:15:16 -07:00
Brian Foster	d5cca7eb24	xfs: remove unnecessary dfops init calls in xattr code Each xfs_defer_init() call in the xattr code uses the internal dfops reference. In addition, a successful xfs_defer_finish() always returns with a reset xfs_defer_ops structure. Given that along with the fact that every xfs_defer_init() call in the xattr code is followed up by an xfs_defer_finish(), the former calls are no longer necessary and can be removed. Note that the xfs_defer_init() call in the remote value copy loop of xfs_attr_rmtval_set() is not followed by a finish, but the dfops is unused in this instance. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-26 10:15:15 -07:00
Brian Foster	c8eac49ef7	xfs: remove all boilerplate defer init/finish code At this point, the transaction subsystem completely manages deferred items internally such that the common and boilerplate xfs_trans_alloc() -> xfs_defer_init() -> xfs_defer_finish() -> xfs_trans_commit() sequence can be replaced with a simple transaction allocation and commit. Remove all such boilerplate deferred ops code. In doing so, we change each case over to use the dfops in the transaction and specifically eliminate: - The on-stack dfops and associated xfs_defer_init() call, as the internal dfops is initialized on transaction allocation. - xfs_bmap_finish() calls that precede a final xfs_trans_commit() of a transaction. - xfs_defer_cancel() calls in error handlers that precede a transaction cancel. The only deferred ops calls that remain are those that are non-deterministic with respect to the final commit of the associated transaction or are open-coded due to special handling. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-26 10:15:15 -07:00
Brian Foster	91ef75b657	xfs: use internal dfops during [b\|c]ui recovery bmap and refcount intent processing associates a dfops from the caller with a local transaction to collect all deferred items for post-processing. Use the internal dfops in both of these functions and move the deferred items to the parent dfops before the transaction commits. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-26 10:15:15 -07:00
Brian Foster	9c6bb0cf7b	xfs: use internal dfops in attr code Remove the unnecessary on-stack dfops structure and use the internal transaction dfops instead. The lower level xattr code already appropriately accesses ->t_dfops throughout. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-26 10:15:14 -07:00
Brian Foster	1e5ae1995a	xfs: use internal dfops in cow blocks cancel All callers either explicitly initialize a dfops or pass a transaction with an internal dfops. Drop the hacky old dfops replacement logic and use the one associated with the transaction. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-26 10:15:14 -07:00
Brian Foster	e021a2e5fc	xfs: support embedded dfops in transaction The dfops structure used by multi-transaction operations is typically stored on the stack and carried around by the associated transaction. The lifecycle of dfops does not quite match that of the transaction, but they are tightly related in that the former depends on the latter. The relationship of these objects is tight enough that we can avoid the cumbersome boilerplate code required in most cases to manage them separately by just embedding an xfs_defer_ops in the transaction itself. This means that a transaction allocation returns with an initialized dfops, a transaction commit finishes pending deferred items before the tx commit, a transaction cancel cancels the dfops before the transaction and a transaction dup operation transfers the current dfops state to the new transaction. The dup operation is slightly complicated by the fact that we can no longer just copy a dfops pointer from the old transaction to the new transaction. This is solved through a dfops move helper that transfers the pending items and other dfops state across the transactions. This also requires that transaction rolling code always refer to the transaction for the current dfops reference. Finally, to facilitate incremental conversion to the internal dfops and continue to support the current external dfops mode of operation, create the new ->t_dfops_internal field with a layer of indirection. On allocation, ->t_dfops points to the internal dfops. This state is overridden by callers who re-init a local dfops on the transaction. Once ->t_dfops is overridden, the external dfops reference is maintained as the transaction rolls. This patch adds the fundamental ability to support an internal dfops. All codepaths that perform deferred processing continue to override the internal dfops until they are converted over in subsequent patches. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-26 10:15:14 -07:00
Brian Foster	44fd294681	xfs: pack holes in xfs_defer_ops and xfs_trans Both structures have holes due to member alignment. Move dop_low to the end of xfs_defer ops to sanitize the cache line alignment and move t_flags to save 8 bytes in xfs_trans. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-26 10:15:13 -07:00
Brian Foster	509308b413	xfs: reset dfops to initial state after finish xfs_defer_init() is currently used in two particular situations. The first and most obvious case is raw initialization of an xfs_defer_ops struct. The other case is partial reinit of xfs_defer_ops on reuse due to iteration. Most instances of the first case will be replaced by a single init of a dfops embedded in the transaction. Init calls are still technically required for the second case because the dfops may have low space mode enabled or have joined items that need to be reset before the dfops should be reused. Since the current dfops usage expects either a final transaction commit after xfs_defer_finish() or xfs_defer_init() if dfops is to be reused, we can shift some of the init logic into xfs_defer_finish() such that the latter returns with a reinitialized dfops. This eliminates the second dependency noted above such that a dfops is immediately ready for reuse after an xfs_defer_finish() without the need to change any calling code. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-26 10:15:13 -07:00
Brian Foster	83200bfac6	xfs: remove unused deferred ops committed field dop_committed is set when deferred item processing rolls the transaction at least once, but is only ever accessed in tracepoints. The transaction roll/commit events are already available via independent tracepoints, so remove the otherwise unused field. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-26 10:15:13 -07:00
Brian Foster	03f4e4b26c	xfs: make deferred processing safe for embedded dfops xfs_defer_finish() has a couple quirks that are not safe with respect to the upcoming internal dfops functionality. First, xfs_defer_finish() attaches the passed in dfops structure to ->t_dfops and caches and restores the original value. Second, it continues to use the initial dfops reference before and after the transaction roll. These behaviors assume that dop is an independent memory allocation from the transaction itself, which may not always be true once transactions begin to use an embedded dfops structure. In the latter model, dfops processing creates a new xfs_defer_ops structure with each transaction and the associated state is migrated across to the new transaction. Fix up xfs_defer_finish() to handle the possibility of the current dfops changing after a transaction roll. Since ->t_dfops is used unconditionally in this path, it is no longer necessary to attach/restore ->t_dfops and pass it explicitly down to xfs_defer_trans_roll(). Update dop in the latter function and the caller to ensure that it always refers to the current dfops structure. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-26 10:15:12 -07:00
Brian Foster	dcbd44f799	xfs: fix transaction leak on remote attr set/remove failure The xattr remote value set/remove handlers both clear args.trans in the error path without having cancelled the transaction. This leaks the transaction, causes warnings around returning to userspace with locks held and leads to system lockups or other general problems. The higher level xfs_attr_[set\|remove]() functions already detect and cancel args.trans when set in the error path. Drop the NULL assignments from the rmtval handlers and allow the callers to clean up the transaction correctly. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-26 10:15:12 -07:00
Brian Foster	a61acc3c78	xfs: use ->t_dfops in log recovery intent processing xlog_finish_defer_ops() processes the deferred operations collected over the entire intent recovery sequence. We can't xfs_defer_init() here because the dfops is already populated. Attach it manually and eliminate the last caller of xfs_defer_finish() that doesn't pass ->t_dfops. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-26 10:15:12 -07:00
Brian Foster	02dff7bf81	xfs: pull up dfops from xfs_itruncate_extents() xfs_itruncate_extents[_flags]() uses a local dfops with a transaction provided by the caller. It uses hacky ->t_dfops replacement logic to avoid stomping over an already populated ->t_dfops. The latter never occurs for current callers and the logic itself is not really appropriate. Clean this up by updating all callers to initialize a dfops and to use that down in xfs_itruncate_extents(). This more closely resembles the upcoming logic where dfops will be embedded within the transaction. We can also replace the xfs_defer_init() in the xfs_itruncate_extents_flags() loop with an assert. Both dfops and firstblock should be in a valid state after xfs_defer_finish() and the inode joined to the dfops is fixed throughout the loop. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-26 10:15:12 -07:00
Andreas Gruenbacher	776125785a	gfs2: Special-case rindex for gfs2_grow To speed up the common case of appending to a file, gfs2_write_alloc_required presumes that writing beyond the end of a file will always require additional blocks to be allocated. This assumption is incorrect for preallocates files, but there are no negative consequences as long as some space is still left on the filesystem. One special file that always has some space preallocated beyond the end of the file is the rindex: when growing a filesystem, gfs2_grow adds one or more new resource groups and appends records describing those resource groups to the rindex; the preallocated space ensures that this is always possible. However, when a filesystem is completely full, gfs2_write_alloc_required will indicate that an additional allocation is required, and appending the next record to the rindex will fail even though space for that record has already been preallocated. To fix that, skip the incorrect optimization in gfs2_write_alloc_required, but for the rindex only. Other writes to preallocated space beyond the end of the file are still allowed to fail on completely full filesystems. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Bob Peterson <rpeterso@redhat.com>	2018-07-25 22:56:14 +02:00
Linus Torvalds	5c61ef1b7c	fscache fixes -----BEGIN PGP SIGNATURE----- iQIVAwUAW1h/7Pu3V2unywtrAQJFOg/+NQswGJcTsJGeTL8tW+8nGtJVeTP6XaVh xPEjdqTBmimt6ciaNP1LxLLt9jQ50S1f83rWGZeFBQoNgWinoe3VtzSVdlQKhZcT jC5LtFVlTxw5rrL4uCsJywLLjD0NeH41ISbvCStcyYJExOZr+f4/VJXKNcwKjAvf kD1xDGnVZsZiGLWFjwBVaPJwFigquoLEU564InMnZbvMW95uZOPGfnwxAGmKQX2n BV3WxVizCc0MwlHMJYjs0cVMZNviuC+qg7YBJIoio3+Dq8FIn7ISn98LbhCpG7mi FoiRi+7xs7VCGm9yqtkXL+euHcSzjnJPnlYxpU8xGqAay0qKxoHecZj2iMEX327K E4mujQ40oqkMLhwy3GhT9cIpvbbQPu7+kS+k9x7UqVnzlhsKEeMp7TEFqSEebO9H kIuvfRBD3uZY0B/loLCB3Cc/B9OoWAUi6IGBRclwS9+RUuBnZY/jb7iQsEvcOv9u 0EC0biSs1jizG1tLR0LmvjIyvS567t/DG/peLad1lOqPe6Up2mO4XIeyS9phwXAD ryupnKPr3tGRgvfJ4jLUZPC8/nrv5Fg9R0YhICEEBqhwKn1uTZgyc085d2EHOdQp fmfbXR/oz4TwjjwlgzrLMQbLB7GUpkgCFxsIPpEVeFH3RDbZNO1UbLovDMUuRaos lFWHzd4K4XA= =yuI9 -----END PGP SIGNATURE----- Merge tag 'fscache-fixes-20180725' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull fscache/cachefiles fixes from David Howells: - Allow cancelled operations to be queued so they can be cleaned up. - Fix a refcounting bug in the monitoring of reads on backend files whereby a race can occur between monitor objects being listed for work, the work processing being queued and the work processor running and destroying the monitor objects. - Fix a ref overput in object attachment, whereby a tentatively considered object is put in error handling without first being 'got'. - Fix a missing clear of the CACHEFILES_OBJECT_ACTIVE flag whereby an assertion occurs when we retry because it seems the object is now active. - Wait rather BUG'ing on an object collision in the depths of cachefiles as the active object should be being cleaned up - also depends on the one above. * tag 'fscache-fixes-20180725' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: cachefiles: Wait rather than BUG'ing on "Unexpected object collision" cachefiles: Fix missing clear of the CACHEFILES_OBJECT_ACTIVE flag fscache: Fix reference overput in fscache_attach_object() error handling cachefiles: Fix refcounting bug in backing-file read monitoring fscache: Allow cancelled operations to be enqueued	2018-07-25 10:55:24 -07:00
Kiran Kumar Modukuri	c2412ac45a	cachefiles: Wait rather than BUG'ing on "Unexpected object collision" If we meet a conflicting object that is marked FSCACHE_OBJECT_IS_LIVE in the active object tree, we have been emitting a BUG after logging information about it and the new object. Instead, we should wait for the CACHEFILES_OBJECT_ACTIVE flag to be cleared on the old object (or return an error). The ACTIVE flag should be cleared after it has been removed from the active object tree. A timeout of 60s is used in the wait, so we shouldn't be able to get stuck there. Fixes: `9ae326a690` ("CacheFiles: A cache that backs onto a mounted filesystem") Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>	2018-07-25 14:49:00 +01:00
Kiran Kumar Modukuri	5ce83d4bb7	cachefiles: Fix missing clear of the CACHEFILES_OBJECT_ACTIVE flag In cachefiles_mark_object_active(), the new object is marked active and then we try to add it to the active object tree. If a conflicting object is already present, we want to wait for that to go away. After the wait, we go round again and try to re-mark the object as being active - but it's already marked active from the first time we went through and a BUG is issued. Fix this by clearing the CACHEFILES_OBJECT_ACTIVE flag before we try again. Analysis from Kiran Kumar Modukuri: [Impact] Oops during heavy NFS + FSCache + Cachefiles CacheFiles: Error: Overlong wait for old active object to go away. BUG: unable to handle kernel NULL pointer dereference at 0000000000000002 CacheFiles: Error: Object already active kernel BUG at fs/cachefiles/namei.c:163! [Cause] In a heavily loaded system with big files being read and truncated, an fscache object for a cookie is being dropped and a new object being looked. The new object being looked for has to wait for the old object to go away before the new object is moved to active state. [Fix] Clear the flag 'CACHEFILES_OBJECT_ACTIVE' for the new object when retrying the object lookup. [Testcase] Have run ~100 hours of NFS stress tests and have not seen this bug recur. [Regression Potential] - Limited to fscache/cachefiles. Fixes: `9ae326a690` ("CacheFiles: A cache that backs onto a mounted filesystem") Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>	2018-07-25 14:49:00 +01:00
Kiran Kumar Modukuri	f29507ce66	fscache: Fix reference overput in fscache_attach_object() error handling When a cookie is allocated that causes fscache_object structs to be allocated, those objects are initialised with the cookie pointer, but aren't blessed with a ref on that cookie unless the attachment is successfully completed in fscache_attach_object(). If attachment fails because the parent object was dying or there was a collision, fscache_attach_object() returns without incrementing the cookie counter - but upon failure of this function, the object is released which then puts the cookie, whether or not a ref was taken on the cookie. Fix this by taking a ref on the cookie when it is assigned in fscache_object_init(), even when we're creating a root object. Analysis from Kiran Kumar: This bug has been seen in 4.4.0-124-generic #148-Ubuntu kernel BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776277 fscache cookie ref count updated incorrectly during fscache object allocation resulting in following Oops. kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/internal.h:321! kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639! [Cause] Two threads are trying to do operate on a cookie and two objects. (1) One thread tries to unmount the filesystem and in process goes over a huge list of objects marking them dead and deleting the objects. cookie->usage is also decremented in following path: nfs_fscache_release_super_cookie -> __fscache_relinquish_cookie ->__fscache_cookie_put ->BUG_ON(atomic_read(&cookie->usage) <= 0); (2) A second thread tries to lookup an object for reading data in following path: fscache_alloc_object 1) cachefiles_alloc_object -> fscache_object_init -> assign cookie, but usage not bumped. 2) fscache_attach_object -> fails in cant_attach_object because the cookie's backing object or cookie's->parent object are going away 3) fscache_put_object -> cachefiles_put_object ->fscache_object_destroy ->fscache_cookie_put ->BUG_ON(atomic_read(&cookie->usage) <= 0); [NOTE from dhowells] It's unclear as to the circumstances in which (2) can take place, given that thread (1) is in nfs_kill_super(), however a conflicting NFS mount with slightly different parameters that creates a different superblock would do it. A backtrace from Kiran seems to show that this is a possibility: kernel BUG at/build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639! ... RIP: __fscache_cookie_put+0x3a/0x40 [fscache] Call Trace: __fscache_relinquish_cookie+0x87/0x120 [fscache] nfs_fscache_release_super_cookie+0x2d/0xb0 [nfs] nfs_kill_super+0x29/0x40 [nfs] deactivate_locked_super+0x48/0x80 deactivate_super+0x5c/0x60 cleanup_mnt+0x3f/0x90 __cleanup_mnt+0x12/0x20 task_work_run+0x86/0xb0 exit_to_usermode_loop+0xc2/0xd0 syscall_return_slowpath+0x4e/0x60 int_ret_from_sys_call+0x25/0x9f [Fix] Bump up the cookie usage in fscache_object_init, when it is first being assigned a cookie atomically such that the cookie is added and bumped up if its refcount is not zero. Remove the assignment in fscache_attach_object(). [Testcase] I have run ~100 hours of NFS stress tests and not seen this bug recur. [Regression Potential] - Limited to fscache/cachefiles. Fixes: `ccc4fc3d11` ("FS-Cache: Implement the cookie management part of the netfs API") Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>	2018-07-25 14:49:00 +01:00
Kiran Kumar Modukuri	934140ab02	cachefiles: Fix refcounting bug in backing-file read monitoring cachefiles_read_waiter() has the right to access a 'monitor' object by virtue of being called under the waitqueue lock for one of the pages in its purview. However, it has no ref on that monitor object or on the associated operation. What it is allowed to do is to move the monitor object to the operation's to_do list, but once it drops the work_lock, it's actually no longer permitted to access that object. However, it is trying to enqueue the retrieval operation for processing - but it can only do this via a pointer in the monitor object, something it shouldn't be doing. If it doesn't enqueue the operation, the operation may not get processed. If the order is flipped so that the enqueue is first, then it's possible for the work processor to look at the to_do list before the monitor is enqueued upon it. Fix this by getting a ref on the operation so that we can trust that it will still be there once we've added the monitor to the to_do list and dropped the work_lock. The op can then be enqueued after the lock is dropped. The bug can manifest in one of a couple of ways. The first manifestation looks like: FS-Cache: FS-Cache: Assertion failed FS-Cache: 6 == 5 is false ------------[ cut here ]------------ kernel BUG at fs/fscache/operation.c:494! RIP: 0010:fscache_put_operation+0x1e3/0x1f0 ... fscache_op_work_func+0x26/0x50 process_one_work+0x131/0x290 worker_thread+0x45/0x360 kthread+0xf8/0x130 ? create_worker+0x190/0x190 ? kthread_cancel_work_sync+0x10/0x10 ret_from_fork+0x1f/0x30 This is due to the operation being in the DEAD state (6) rather than INITIALISED, COMPLETE or CANCELLED (5) because it's already passed through fscache_put_operation(). The bug can also manifest like the following: kernel BUG at fs/fscache/operation.c:69! ... [exception RIP: fscache_enqueue_operation+246] ... #7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6 #8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48 #9 [ffff883fff083c48] __wake_up_common at ffffffff810af028 I'm not entirely certain as to which is line 69 in Lei's kernel, so I'm not entirely clear which assertion failed. Fixes: `9ae326a690` ("CacheFiles: A cache that backs onto a mounted filesystem") Reported-by: Lei Xue <carmark.dlut@gmail.com> Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Reported-by: Anthony DeRobertis <aderobertis@metrics.net> Reported-by: NeilBrown <neilb@suse.com> Reported-by: Daniel Axtens <dja@axtens.net> Reported-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Daniel Axtens <dja@axtens.net>	2018-07-25 14:49:00 +01:00
Kiran Kumar Modukuri	d0eb06afe7	fscache: Allow cancelled operations to be enqueued Alter the state-check assertion in fscache_enqueue_operation() to allow cancelled operations to be given processing time so they can be cleaned up. Also fix a debugging statement that was requiring such operations to have an object assigned. Fixes: `9ae326a690` ("CacheFiles: A cache that backs onto a mounted filesystem") Reported-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>	2018-07-25 14:31:20 +01:00
David S. Miller	19725496da	Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net	2018-07-24 19:21:58 -07:00
Bob Peterson	f6753df35c	GFS2: rgrp free blocks used incorrectly Before this patch, several functions in rgrp.c checked the value of rgd->rd_free_clone. That does not take into account blocks that were reserved by a multi-block reservation. This causes a problem when space gets tight in the file system. For example, when function gfs2_inplace_reserve checks to see if a rgrp has enough blocks to satisfy the request, it can accept a rgrp that it should reject because, although there are enough blocks to satisfy the request _now_, those blocks may be reserved for another running process. A second problem with this occurs when we've reserved the remaining blocks in an rgrp: function rg_mblk_search() can reject an rgrp improperly because it calculates: u32 free_blocks = rgd->rd_free_clone - rgd->rd_reserved; But rd_reserved includes blocks that the current process just reserved in its own call to inplace_reserve. For example, it can reserve the last 128 blocks of an rgrp, then reject that same rgrp because the above calculates out to free_blocks = 0; Consequences include, but are not limited to, (1) leaving holes, and thus increasing file system fragmentation, and (2) reporting file system is full long before it actually is. This patch introduces a new function, rgd_free, which returns the number of clone-free blocks (blocks that are truly free as opposed to blocks that are still being used because an unlinked file is still open) minus the number of blocks reserved by processes, but not counting the blocks we ourselves reserved (because obviously we need to allocate them). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2018-07-25 00:09:09 +02:00
Colin Ian King	d1b0cb933c	gfs2: remove redundant variable 'moved' Variable 'moved' s being assigned but is never used hence it is redundant and can be removed. This has been the case ever since commit `c752666c`. Cleans up clang warning: warning: variable 'moved' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2018-07-25 00:08:59 +02:00
Andreas Gruenbacher	f95cbb44ab	gfs2: use iomap_readpage for blocksize == PAGE_SIZE We only use iomap_readpage for pages that don't have buffer heads attached yet: iomap_readpage would otherwise read pages from disk that are marked buffer_uptodate() but not PageUptodate(). Those pages may actually contain data more recent than what's on disk. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Bob Peterson <rpeterso@redhat.com>	2018-07-25 00:08:49 +02:00
Andreas Gruenbacher	1d45bb7f9d	gfs2: Use iomap for stuffed direct I/O reads Remove the fallback code from direct to buffered I/O for stuffed reads. For stuffed writes, we must keep the fallback code: the deferred glock we are holding under direct I/O doesn't allow to write to the inode or change the file size. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Bob Peterson <rpeterso@redhat.com>	2018-07-25 00:08:40 +02:00
Andreas Gruenbacher	0ed91eca11	Merge branch 'iomap-4.19-merge' into linux-gfs2/for-next Merge xfs branch 'iomap-4.19-merge' into linux-gfs2/for-next. This brings in readpage and direct I/O support for inline data. The IOMAP_F_BUFFER_HEAD flag introduced in commit "iomap: add initial support for writes without buffer heads" needs to be set for gfs2 as well, so do that in the merge. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2018-07-25 00:08:20 +02:00
Andreas Gruenbacher	c25892827c	gfs2: fallocate_chunk: Always initialize struct iomap In fallocate_chunk, always initialize the iomap before calling gfs2_iomap_get_alloc: future changes could otherwise cause things like iomap.flags to leak across calls. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Bob Peterson <rpeterso@redhat.com>	2018-07-25 00:06:48 +02:00
Bart Van Assche	7393059506	fs/cifs: Simplify ib_post_(send\|recv\|srq_recv)() calls Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL as third argument to ib_post_(send\|recv\|srq_recv)(). Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-24 16:06:36 -06:00
Bob Peterson	4a7727725d	GFS2: Fix recovery issues for spectators This patch fixes a couple problems dealing with spectators who remain with gfs2 mounts after the last non-spectator node fails. Before this patch, spectator mounts would try to acquire the dlm's mounted lock EX as part of its normal recovery sequence. The mounted lock is only used to determine whether the node is the first mounter, the first node to mount the file system, for the purposes of file system recovery and journal replay. It's not necessary for spectators: they should never do journal recovery. If they acquire the lock it will prevent another "real" first-mounter from acquiring the lock in EX mode, which means it also cannot do journal recovery because it doesn't think it's the first node to mount the file system. This patch checks if the mounter is a spectator, and if so, avoids grabbing the mounted lock. This allows a secondary mounter who is really the first non-spectator mounter, to do journal recovery: since the spectator doesn't acquire the lock, it can grab it in EX mode, and therefore consider itself to be the first mounter both as a "real" first mount, and as a first-real-after-spectator. Note that the control lock still needs to be taken in PR mode in order to fetch the lvb value so it has the current status of all journal's recovery. This is used as it is today by a first mounter to replay the journals. For spectators, it's merely used to fetch the status bits. All recovery is bypassed and the node waits until recovery is completed by a non-spectator node. I also improved the cryptic message given by control_mount when a spectator is waiting for a non-spectator to perform recovery. It also fixes a problem in gfs2_recover_set whereby spectators were never queueing recovery work for their own journal. They cannot do recovery themselves, but they still need to queue the work so they can check the recovery bits and clear the DFL_BLOCK_LOCKS bit once the recovery happens on another node. When the work queue runs on a spectator, it bypasses most of the work so it won't print a bunch of annoying messages. All it will print is a bunch of messages that look like this until recovery completes on the non-spectator node: GFS2: fsid=mycluster:scratch.s: recover generation 3 jid 0 GFS2: fsid=mycluster:scratch.s: recover jid 0 result busy These continue every 1.5 seconds until the recovery is done by the non-spectator, at which time it says: GFS2: fsid=mycluster:scratch.s: recover generation 4 done Then it proceeds with its mount. If the file system is mounted in spectator node and the last remaining non-spectator is fenced, any IO to the file system is blocked by dlm and the spectator waits until recovery is performed by a non-spectator. If a spectator tries to mount the file system before any non-spectators, it blocks and repeatedly gives this kernel message: GFS2: fsid=mycluster:scratch: Recovery is required. Waiting for a non-spectator to mount. GFS2: fsid=mycluster:scratch: Recovery is required. Waiting for a non-spectator to mount. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2018-07-25 00:06:24 +02:00
Christoph Hellwig	076ff2f0b8	exofs: use bio_clone_fast in _write_mirror The mirroring code never changes the bio data or biovecs. This means we can reuse the biovec allocation easily instead of duplicating it. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by Boaz Harrosh <ooo@electrozaur.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-24 14:43:20 -06:00
Eric Sandeen	d4a34e1655	xfs: properly handle free inodes in extent hint validators When inodes are freed in xfs_ifree(), di_flags is cleared (so extent size hints are removed) but the actual extent size fields are left intact. This causes the extent hint validators to fail on freed inodes which once had extent size hints. This can be observed (for example) by running xfs/229 twice on a non-crc xfs filesystem, or presumably on V5 with ikeep. Fixes: `7d71a67` ("xfs: verify extent size hint is valid in inode verifier") Fixes: `02a0fda` ("xfs: verify COW extent size hint is valid in inode verifier") Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-24 11:34:52 -07:00
Andreas Gruenbacher	a3479c7fc0	Merge branch 'iomap-write' into linux-gfs2/for-next Pull in the gfs2 iomap-write changes: Tweak the existing code to properly support iomap write and eliminate an unnecessary special case in gfs2_block_map. Implement iomap write support for buffered and direct I/O. Simplify some of the existing code and eliminate code that is no longer used: gfs2: Remove gfs2_write_{begin,end} gfs2: iomap direct I/O support gfs2: gfs2_extent_length cleanup gfs2: iomap buffered write support gfs2: Further iomap cleanups This is based on the following changes on the xfs 'iomap-4.19-merge' branch: iomap: add private pointer to struct iomap iomap: add a page_done callback iomap: generic inline data handling iomap: complete partial direct I/O writes synchronously iomap: mark newly allocated buffer heads as new fs: factor out a __generic_write_end helper Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2018-07-24 20:02:40 +02:00
Souptick Joarder	109dbb1e6f	fs: gfs2: Adding new return type vm_fault_t Use new return type vm_fault_t for gfs2_page_mkwrite handler. see commit `1c8f422059` ("mm: change return type to vm_fault_t") for reference. Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2018-07-24 20:02:11 +02:00
Chengguang Xu	910f3d58d0	gfs2: using posix_acl_xattr_size instead of posix_acl_to_xattr It seems better to get size by calling posix_acl_xattr_size() instead of calling posix_acl_to_xattr() with NULL buffer argument. posix_acl_xattr_size() never returns 0, so remove the unnecessary check. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2018-07-24 20:02:11 +02:00
Bob Peterson	e79e0e1428	gfs2: Don't reject a supposedly full bitmap if we have blocks reserved Before this patch, you could get into situations like this: 1. Process 1 searches for X free blocks, finds them, makes a reservation 2. Process 2 searches for free blocks in the same rgrp, but now the bitmap is full because process 1's reservation is skipped over. So it marks the bitmap as GBF_FULL. 3. Process 1 tries to allocate blocks from its own reservation, but since the GBF_FULL bit is set, it skips over the rgrp and searches elsewhere, thus not using its own reservation. This patch adds an additional check to allow processes to use their own reservations. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2018-07-24 20:02:11 +02:00
Darrick J. Wong	f467cad95f	xfs: force summary counter recalc at next mount Use the "bad summary count" mount flag from the previous patch to skip writing the unmount record to force log recovery at the next mount, which will recalculate the summary counters for us. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2018-07-23 09:08:01 -07:00
Darrick J. Wong	53235f2215	xfs: refactor unmount record write Refactor the writing of the unmount record into a separate helper. No functionality changes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2018-07-23 09:08:01 -07:00
Darrick J. Wong	2e9e6481e2	xfs: detect and fix bad summary counts at mount Filippo Giunchedi complained that xfs doesn't even perform basic sanity checks of the fs summary counters at mount time. Therefore, recalculate the summary counters from the AGFs after log recovery if the counts were bad (or we had to recover the fs). Enhance the recalculation routine to fail the mount entirely if the new values are also obviously incorrect. We use a mount state flag to record the "bad summary count" state so that the (subsequent) online fsck patches can detect subtlely incorrect counts and set the flag; clear it userspace asks for a repair; or force a recalculation at the next mount if nobody fixes it by unmount time. Reported-by: Filippo Giunchedi <fgiunchedi@wikimedia.org> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2018-07-23 09:08:01 -07:00
Darrick J. Wong	032d91f982	xfs: fix indentation and other whitespace problems in scrub/repair Now that we've shortened everything, fix up all the indentation and whitespace problems. There are no functional changes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-07-23 09:08:01 -07:00
Darrick J. Wong	1d8a748a8a	xfs: shorten struct xfs_scrub_context to struct xfs_scrub Shorten the name of the online fsck context structure. Whitespace damage will be fixed by a subsequent patch. There are no functional changes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-07-23 09:08:00 -07:00
Darrick J. Wong	b5e2196e9c	xfs: shorten xfs_repair_ prefix to xrep_ Shorten all the metadata repair xfs_repair_* symbols to xrep_. Whitespace damage will be fixed by a subsequent patch. There are no functional changes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-07-23 09:08:00 -07:00
Darrick J. Wong	c517b3aa02	xfs: shorten xfs_scrub_ prefix Shorten all the metadata checking xfs_scrub_ prefixes to xchk_. After this, the only xfs_scrub* symbols are the ones that pertain to both scrub and repair. Whitespace damage will be fixed in a subsequent patch. There are no functional changes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2018-07-23 09:08:00 -07:00
Darrick J. Wong	ef97ef26d2	xfs: clean up xfs_btree_del_cursor callers Less trivial cleanups of the error argument to xfs_btree_del_cursor; these require some minor code refactoring. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2018-07-23 09:08:00 -07:00
Darrick J. Wong	0b04b6b875	xfs: trivial xfs_btree_del_cursor cleanups The error argument to xfs_btree_del_cursor already understands the "nonzero for error" semantics, so remove pointless error testing in the callers and pass it directly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2018-07-23 09:08:00 -07:00
Darrick J. Wong	81b549aa62	xfs: return from _defer_finish with a clean transaction The following assertion was seen on generic/051: XFS: Assertion failed: tp->t_firstblock == NULLFSBLOCK, file: fs/xfs/libxfs5 ------------[ cut here ]------------ kernel BUG at fs/xfs/xfs_message.c:102! invalid opcode: 0000 [#1] SMP PTI CPU: 2 PID: 20757 Comm: fsstress Not tainted 4.18.0-rc4+ #3969 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/4 RIP: 0010:assfail+0x23/0x30 Code: c3 66 0f 1f 44 00 00 48 89 f1 41 89 d0 48 c7 c6 88 e0 8c 82 48 89 fa RSP: 0018:ffff88012dc43c08 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff88012dc43ca0 RCX: 0000000000000000 RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828480eb RBP: ffff88012aa92758 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: f000000000000000 R12: 0000000000000000 R13: ffff88012dc43d48 R14: ffff88013092e7e8 R15: 0000000000000014 FS: 00007f8d689b8e80(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8d689c7000 CR3: 000000012ba6a000 CR4: 00000000000006e0 Call Trace: xfs_defer_init+0xff/0x160 xfs_reflink_remap_extent+0x31b/0xa00 xfs_reflink_remap_blocks+0xec/0x4a0 xfs_reflink_remap_range+0x3a1/0x650 xfs_file_dedupe_range+0x39/0x50 vfs_dedupe_file_range+0x218/0x260 do_vfs_ioctl+0x262/0x6a0 ? __se_sys_newfstat+0x3c/0x60 ksys_ioctl+0x35/0x60 __x64_sys_ioctl+0x11/0x20 do_syscall_64+0x4b/0x190 entry_SYSCALL_64_after_hwframe+0x49/0xbe The root cause of the assertion failure is that xfs_defer_finish doesn't roll the transaction after processing all the deferred items. Therefore it returns a dirty transaction to the caller, which leaves the caller at risk of exceeding the transaction reservation if it logs more items. Brian Foster's patchset to move the defer_ops firstblock into the transaction requires t_firstblock == NULLFSBLOCK upon defer_ops initialization, which is how this was noticed at all. Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2018-07-23 09:08:00 -07:00
Darrick J. Wong	65cfcc3897	xfs: check leaf attribute block freemap in verifier Check the leaf attribute freemap when we're verifying the block. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2018-07-23 09:08:00 -07:00
Linus Torvalds	165ea0d1c2	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Fix several places that screw up cleanups after failures halfway through opening a file (one open-coding filp_clone_open() and getting it wrong, two misusing alloc_file()). That part is -stable fodder from the 'work.open' branch. And Christoph's regression fix for uapi breakage in aio series; include/uapi/linux/aio_abi.h shouldn't be pulling in the kernel definition of sigset_t, the reason for doing so in the first place had been bogus - there's no need to expose struct __aio_sigset in aio_abi.h at all" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: aio: don't expose __aio_sigset in uapi ocxlflash_getfile(): fix double-iput() on alloc_file() failures cxl_getfile(): fix double-iput() on alloc_file() failures drm_mode_create_lease_ioctl(): fix open-coded filp_clone_open()	2018-07-22 12:04:51 -07:00
Andy Shevchenko	c4326563d9	efivars: Call guid_parse() against guid_t type of variable uuid_le_to_bin() is deprecated API and take into consideration that variable, to where we store parsed data, is type of guid_t we switch to guid_parse() for sake of consistency. While here, add error checking to it. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukas Wunner <lukas@wunner.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20180720014726.24031-10-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-07-22 14:13:44 +02:00
Linus Torvalds	55b636b419	for-4.18-rc5-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAltSwkIACgkQxWXV+ddt WDtMBQ//UXMjHaXvFmC0SM6NczuYQR51hLYtJFIKig93XK5goVpUBTNbxO7LX/Tn 4zKoKyVhkW1884V6mRiC+G23QLbo0BQZA7DExfyJ3jylQdjZMBm+K+r19OtGQf5v CII7oUwni03KIiXIqiFAL5dLWebVQpG5EKJbh8GLZsmg6xNcyVaUqZ/fHXajbZiv ldEBtHBKIv7WWTJmylMBKMWnRz+jqU91fXPahoU6R5qivODrLt1o/PMuSjVNhaxe iDldHfdOaiQmLHB/1kOGyv492oW5mSSVNDE8LjEDZ61tDNlAcUyuKUWIRBxDEDtD 6D7rlVQXJ/N7sJ6+UYmJKsRpHL+NOkyzSZ0QEU/sm1Xpm8gkhHuuofRPrVCtd3l1 ZSbwvlrdyjigVEBfM3IbToQ/K6Rc1ZGId20OAs9PCQbb+mj9IxPIncZ7pI1c4hlh pPEjcYsp14JbCTjctFalcqTiFY5tHRQsx+GUFnDyOcdL7Mi+CoH+0Jy61Vgz9GQE 7s934cfEC0ot/f66kAL/PZzxUfC7TePqaa+sDfS5BIkJ4M6lPMxS5De5R4Z0+Nzr DXgQAlgXmxfRjpOYMTH9D0EDdSeJaNmVHgk7hFbiYk/KX3oyd4NmgI9Cfao8rQJv 2yd8wF2httfSJKD4b/Hv9r6Ho/Bw9PK59BvWOKYhSj6IGl32utw= =f7eB -----END PGP SIGNATURE----- Merge tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "A fix of a corruption regarding fsync and clone, under some very specific conditions explained in the patch. The fix is marked for stable 3.16+ so I'd like to get it merged now given the impact" * tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix file data corruption after cloning a range and fsync	2018-07-21 16:42:03 -07:00
Linus Torvalds	490fc05386	mm: make vm_area_alloc() initialize core fields Like vm_area_dup(), it initializes the anon_vma_chain head, and the basic mm pointer. The rest of the fields end up being different for different users, although the plan is to also initialize the 'vm_ops' field to a dummy entry. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-07-21 15:24:03 -07:00
Linus Torvalds	3928d4f5ee	mm: use helper functions for allocating and freeing vm_area structs The vm_area_struct is one of the most fundamental memory management objects, but the management of it is entirely open-coded evertwhere, ranging from allocation and freeing (using kmem_cache_[z]alloc and kmem_cache_free) to initializing all the fields. We want to unify this in order to end up having some unified initialization of the vmas, and the first step to this is to at least have basic allocation functions. Right now those functions are literally just wrappers around the kmem_cache_*() calls. This is a purely mechanical conversion: # new vma: kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL) -> vm_area_alloc() # copy old vma kmem_cache_alloc(vm_area_cachep, GFP_KERNEL) -> vm_area_dup(old) # free vma kmem_cache_free(vm_area_cachep, vma) -> vm_area_free(vma) to the point where the old vma passed in to the vm_area_dup() function isn't even used yet (because I've left all the old manual initialization alone). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-07-21 13:48:51 -07:00
OGAWA Hirofumi	35033ab988	fat: fix memory allocation failure handling of match_strdup() In parse_options(), if match_strdup() failed, parse_options() leaves opts->iocharset in unexpected state (i.e. still pointing the freed string). And this can be the cause of double free. To fix, this initialize opts->iocharset always when freeing. Link: http://lkml.kernel.org/r/8736wp9dzc.fsf@mail.parknet.co.jp Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Reported-by: syzbot+90b8e10515ae88228a92@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-07-21 12:50:46 -07:00
Dmitry Torokhov	5f81880d52	sysfs, kobject: allow creating kobject belonging to arbitrary users Normally kobjects and their sysfs representation belong to global root, however it is not necessarily the case for objects in separate namespaces. For example, objects in separate network namespace logically belong to the container's root and not global root. This change lays groundwork for allowing network namespace objects ownership to be transferred to container's root user by defining get_ownership() callback in ktype structure and using it in sysfs code to retrieve desired uid/gid when creating sysfs objects for given kobject. Co-Developed-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-20 23:44:35 -07:00
Dmitry Torokhov	488dee96bb	kernfs: allow creating kernfs objects with arbitrary uid/gid This change allows creating kernfs files and directories with arbitrary uid/gid instead of always using GLOBAL_ROOT_UID/GID by extending kernfs_create_dir_ns() and kernfs_create_file_ns() with uid/gid arguments. The "simple" kernfs_create_file() and kernfs_create_dir() are left alone and always create objects belonging to the global root. When creating symlinks ownership (uid/gid) is taken from the target kernfs object. Co-Developed-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-20 23:44:35 -07:00
Al Viro	f2df5da662	fold generic_readlink() into its only caller Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-19 17:35:51 -04:00
Maciej W. Rozycki	2f819db565	binfmt_elf: Respect error return from `regset->active' The regset API documented in <linux/regset.h> defines -ENODEV as the result of the `->active' handler to be used where the feature requested is not available on the hardware found. However code handling core file note generation in `fill_thread_core_info' interpretes any non-zero result from the `->active' handler as the regset requested being active. Consequently processing continues (and hopefully gracefully fails later on) rather than being abandoned right away for the regset requested. Fix the problem then by making the code proceed only if a positive result is returned from the `->active' handler. Signed-off-by: Maciej W. Rozycki <macro@mips.com> Signed-off-by: Paul Burton <paul.burton@mips.com> Fixes: `4206d3aa19` ("elf core dump: notes user_regset") Patchwork: https://patchwork.linux-mips.org/patch/19332/ Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: James Hogan <jhogan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-fsdevel@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org	2018-07-19 13:46:34 -07:00
Filipe Manana	bd3599a0e1	Btrfs: fix file data corruption after cloning a range and fsync When we clone a range into a file we can end up dropping existing extent maps (or trimming them) and replacing them with new ones if the range to be cloned overlaps with a range in the destination inode. When that happens we add the new extent maps to the list of modified extents in the inode's extent map tree, so that a "fast" fsync (the flag BTRFS_INODE_NEEDS_FULL_SYNC not set in the inode) will see the extent maps and log corresponding extent items. However, at the end of range cloning operation we do truncate all the pages in the affected range (in order to ensure future reads will not get stale data). Sometimes this truncation will release the corresponding extent maps besides the pages from the page cache. If this happens, then a "fast" fsync operation will miss logging some extent items, because it relies exclusively on the extent maps being present in the inode's extent tree, leading to data loss/corruption if the fsync ends up using the same transaction used by the clone operation (that transaction was not committed in the meanwhile). An extent map is released through the callback btrfs_invalidatepage(), which gets called by truncate_inode_pages_range(), and it calls __btrfs_releasepage(). The later ends up calling try_release_extent_mapping() which will release the extent map if some conditions are met, like the file size being greater than 16Mb, gfp flags allow blocking and the range not being locked (which is the case during the clone operation) nor being the extent map flagged as pinned (also the case for cloning). The following example, turned into a test for fstests, reproduces the issue: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ xfs_io -f -c "pwrite -S 0x18 9000K 6908K" /mnt/foo $ xfs_io -f -c "pwrite -S 0x20 2572K 156K" /mnt/bar $ xfs_io -c "fsync" /mnt/bar # reflink destination offset corresponds to the size of file bar, # 2728Kb minus 4Kb. $ xfs_io -c ""reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" /mnt/bar $ xfs_io -c "fsync" /mnt/bar $ md5sum /mnt/bar 95a95813a8c2abc9aa75a6c2914a077e /mnt/bar <power fail> $ mount /dev/sdb /mnt $ md5sum /mnt/bar 207fd8d0b161be8a84b945f0df8d5f8d /mnt/bar # digest should be 95a95813a8c2abc9aa75a6c2914a077e like before the # power failure In the above example, the destination offset of the clone operation corresponds to the size of the "bar" file minus 4Kb. So during the clone operation, the extent map covering the range from 2572Kb to 2728Kb gets trimmed so that it ends at offset 2724Kb, and a new extent map covering the range from 2724Kb to 11724Kb is created. So at the end of the clone operation when we ask to truncate the pages in the range from 2724Kb to 2724Kb + 15908Kb, the page invalidation callback ends up removing the new extent map (through try_release_extent_mapping()) when the page at offset 2724Kb is passed to that callback. Fix this by setting the bit BTRFS_INODE_NEEDS_FULL_SYNC whenever an extent map is removed at try_release_extent_mapping(), forcing the next fsync to search for modified extents in the fs/subvolume tree instead of relying on the presence of extent maps in memory. This way we can continue doing a "fast" fsync if the destination range of a clone operation does not overlap with an existing range or if any of the criteria necessary to remove an extent map at try_release_extent_mapping() is not met (file size not bigger then 16Mb or gfp flags do not allow blocking). CC: stable@vger.kernel.org # 3.16+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-07-19 15:36:31 +02:00
Linus Torvalds	04a1320651	for-4.18-rc5-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAltPMI0ACgkQxWXV+ddt WDvmXw//fyV+2hoARngzjd+4o32YHfxdf+Xv4XCnMsZVKOHKkqX8qrmMNyX0sd4w 5NwUZpv/mZ4LHnm4M+EMGJWjXL/oXkLGrDzndninNC+u7GlFVieZ/aF5D96z6rOm p45wGETYvAbZI7XZ3dLebpIDqr+eXOhx3lpJTAKY5sfTIwzwJ+KC5vFYdt+Rz4cr cbjwHhRUsRfu1I0SSjUVFIC5frtegIzbDgjWNiLLO44ozbDAH3j1SufOgNLb5GFM n+eh0xIHDNLOrH3aVKO19zk9NigVBu96/FJnIz0+Jzs67hifksfZWVDV5vKetUxA M46aqtTrSVb/NJ/RHkQkyWiJjZqioXXx+KsZjdU63fyv4iu0+o2HV0uY/Pifm+X/ fCS7xbQOhWJySQ+6mAjxXB9eo0RqO+RIGGIV9gJWZKt3S3DvAUmvd980jeHUtXRB VwMwmnvqvYaGWLWmaTRm1mjdmhCX2JdNN2RMmVN36tGfed0uopIFeax2rtWJ4153 V+8eZWaLkvvT3iGu+XLUhEfv3UCUy7N1LDk8toe7Xp+qIMvWus3GIsKAUCmJJ3b+ sGmbYSgn5v8TR65m5QO4/ZWmt4/bi/2Usd6Cq3vd0Op08kTWBTxjdelAVm+dlEYb sZLIMrxPg8ogEw8qX4GxROa8/1z9F/62RSmHfk4W7InY2AMJJAg= =Ga4m -----END PGP SIGNATURE----- Merge tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Three regression fixes. They're few-liners and fixing some corner cases missed in the origial patches" * tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: scrub: Don't use inode page cache in scrub_handle_errored_block() btrfs: fix use-after-free of cmp workspace pages btrfs: restore uuid_mutex in btrfs_open_devices	2018-07-18 11:13:25 -07:00
Michael Callahan	dbae2c5513	block: Define and use STAT_READ and STAT_WRITE Add defines for STAT_READ and STAT_WRITE for indexing the partition stat entries. This clarifies some fs/ code which has hardcoded 1 for STAT_WRITE and will make it easier to extend the stats with additional fields. tj: Refreshed on top of v4.17. Signed-off-by: Michael Callahan <michaelcallahan@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-18 08:44:18 -06:00
Tejun Heo	3f289dcb4b	block: make bdev_ops->rw_page() take a REQ_OP instead of bool `c11f0c0b5b` ("block/mm: make bdev_ops->rw_page() take a bool for read/write") replaced @op with boolean @is_write, which limited the amount of information going into ->rw_page() and more importantly page_endio(), which removed the need to expose block internals to mm. Unfortunately, we want to track discards separately and @is_write isn't enough information. This patch updates bdev_ops->rw_page() to take REQ_OP instead but leaves page_endio() to take bool @is_write. This allows the block part of operations to have enough information while not leaking it to mm. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Mike Christie <mchristi@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-18 08:44:14 -06:00
Arnd Bergmann	5f7a01e222	jffs2: use unsigned 32-bit timstamps consistently Most users of jffs2 are 32-bit systems that traditionally only support timestamps using a 32-bit signed time_t, in the range from years 1902 to 2038. On 64-bit systems, jffs2 however interpreted the same timestamps as unsigned values, reading back negative times (before 1970) as times between 2038 and 2106. Now that Linux supports 64-bit inode timestamps even on 32-bit systems, let's use the second interpretation everywhere to allow jffs2 to be used on 32-bit systems beyond 2038 without a fundamental change to the inode format. This has a slight risk of regressions, when existing files with timestamps before 1970 are present in file system images and are now interpreted as future time stamps. I considered moving the wraparound point a bit, e.g. to 1960, in order to deal with timestamps that ended up on Dec 31, 1969 due to incorrect timezone handling. However, this would complicate the implementation unnecessarily, so I went with the simplest possible method of extending the timestamps. Writing files with timestamps before 1970 or after 2106 now results in those times being clamped in the file system. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>	2018-07-18 16:44:01 +02:00
Arnd Bergmann	c4592b9c37	jffs2: use 64-bit intermediate timestamps The VFS now uses timespec64 timestamps consistently, but jffs2 still converts them to 32-bit numbers on the storage medium. As the helper functions for the conversion (get_seconds() and timespec_to_timespec64()) are now deprecated, let's change them over to the more modern replacements. This keeps the traditional interpretation of those values, where the on-disk 32-bit numbers are taken to be negative numbers, i.e. dates before 1970, on 32-bit machines, but future numbers past 2038 on 64-bit machines. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>	2018-07-18 16:43:58 +02:00
Christoph Hellwig	9ba546c019	aio: don't expose __aio_sigset in uapi glibc uses a different defintion of sigset_t than the kernel does, and the current version would pull in both. To fix this just do not expose the type at all - this somewhat mirrors pselect() where we do not even have a type for the magic sigmask argument, but just use pointer arithmetics. Fixes: `7a074e96` ("aio: implement io_pgetevents") Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Adrian Reber <adrian@lisas.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-17 23:26:58 -04:00
Carlos Maiolino	5089eafffb	libxfs: Fix a couple of sparse complaintis No significant changes, just silence a couple of sparse errors. Using cpu_to_be32(NULLAGINO), the NULLAGINO constant will be encoded in BE as a constant, avoiding a BE -> CPU conversion every iteraction of the loop, if be32_to_cpu(agi->agi_unlinked[i]) was used instead. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-17 14:25:58 -07:00
Gustavo A. R. Silva	e4e542a683	xfs: use swap macro in xfs_dir2_leafn_rebalance Make use of the swap macro and remove unnecessary variable tmp. This makes the code easier to read and maintain. Also, slightly refactor some code. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-17 14:25:57 -07:00
Gustavo A. R. Silva	897992b7e3	xfs_bmap_util: use swap macro Make use of the swap macro and remove some unnecessary variables. This makes the code easier to read and maintain. Also, reduces the stack usage. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-17 14:25:57 -07:00
Gustavo A. R. Silva	1d5bebbafc	xfs_attr_leaf: use swap macro in xfs_attr3_leaf_rebalance Make use of the swap macro and remove some unnecessary variables. This makes the code easier to read and maintain. Also, reduces the stack usage. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-17 14:25:57 -07:00
Darrick J. Wong	fa248de98a	xfs: don't assume a left rmap when allocating a new rmap The original rmap code assumed that there would always be at least one rmap in the rmapbt (the AG sb/agf/agi) and so errored out if it didn't find one. This assumption isn't true for the rmapbt repair function (and it won't be true for realtime rmap either), so remove the check and just deal with the situation. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>	2018-07-17 14:25:57 -07:00
Mike Christie	cc57c07343	configfs: fix registered group removal This patch fixes a bug where configfs_register_group had added a group in a tree, and userspace has done a rmdir on a dir somewhere above that group and we hit a kernel crash. The problem is configfs_rmdir will detach everything under it and unlink groups on the default_groups list. It will not unlink groups added with configfs_register_group so when configfs_unregister_group is called to drop its references to the group/items we crash when we try to access the freed dentrys. The patch just adds a check for if a rmdir has been done above us and if so just does the unlink part of unregistration. Sorry if you are getting this multiple times. I thouhgt I sent this to some of you and lkml, but I do not see it. Signed-off-by: Mike Christie <mchristi@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-07-17 06:14:07 -07:00
Qu Wenruo	665d4953cd	btrfs: scrub: Don't use inode page cache in scrub_handle_errored_block() In commit `ac0b4145d6` ("btrfs: scrub: Don't use inode pages for device replace") we removed the branch of copy_nocow_pages() to avoid corruption for compressed nodatasum extents. However above commit only solves the problem in scrub_extent(), if during scrub_pages() we failed to read some pages, sctx->no_io_error_seen will be non-zero and we go to fixup function scrub_handle_errored_block(). In scrub_handle_errored_block(), for sctx without csum (no matter if we're doing replace or scrub) we go to scrub_fixup_nodatasum() routine, which does the similar thing with copy_nocow_pages(), but does it without the extra check in copy_nocow_pages() routine. So for test cases like btrfs/100, where we emulate read errors during replace/scrub, we could corrupt compressed extent data again. This patch will fix it just by avoiding any "optimization" for nodatasum, just falls back to the normal fixup routine by try read from any good copy. This also solves WARN_ON() or dead lock caused by lame backref iteration in scrub_fixup_nodatasum() routine. The deadlock or WARN_ON() won't be triggered before commit `ac0b4145d6` ("btrfs: scrub: Don't use inode pages for device replace") since copy_nocow_pages() have better locking and extra check for data extent, and it's already doing the fixup work by try to read data from any good copy, so it won't go scrub_fixup_nodatasum() anyway. This patch disables the faulty code and will be removed completely in a followup patch. Fixes: `ac0b4145d6` ("btrfs: scrub: Don't use inode pages for device replace") Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-07-17 13:56:30 +02:00
Ingo Molnar	52b544bd38	Linux 4.18-rc5 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAltLpVUeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGWisH/ikONMwV7OrSk36Y 5rxzTFUoBk0Qffct88gtSNuRVCxaVb1ofCndvFJE6A6HfJkWpbBzH6eq90aakmJi f7uFcu4YmsQpeQaf9lpftWmY2vDf2fIadVTV0RnSMXks57wMax1cpBe7LJGpz13e f+g5XRVs1MdlZVtr6tG2SU3Y5AqVVVsYe/0DBPonEqeh9/JJbPFCuNkFOxxzAqPu VTnjyoOqG8qtZzjklNtR5rZn0Gv592tWX36eiWTQdThNmVFkGEAJwsHCQlY4OQYK 61QN4UhOHiu8e1ZuGDNEDhNVRnKtaaYUPFeWL1wLRW73ul4P3ZkpvpS8QTMwcFJI JjzNOkI= =ckcO -----END PGP SIGNATURE----- Merge tag 'v4.18-rc5' into locking/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-07-17 09:27:43 +02:00
Eric Biggers	fe10e398e8	reiserfs: fix buffer overflow with long warning messages ReiserFS prepares log messages into a 1024-byte buffer with no bounds checks. Long messages, such as the "unknown mount option" warning when userspace passes a crafted mount options string, overflow this buffer. This causes KASAN to report a global-out-of-bounds write. Fix it by truncating messages to the buffer size. Link: http://lkml.kernel.org/r/20180707203621.30922-1-ebiggers3@gmail.com Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: syzbot+b890b3335a4d8c608963@syzkaller.appspotmail.com Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-07-14 11:11:10 -07:00
Oscar Salvador	24962af7e1	fs, elf: make sure to page align bss in load_elf_library The current code does not make sure to page align bss before calling vm_brk(), and this can lead to a VM_BUG_ON() in __mm_populate() due to the requested lenght not being correctly aligned. Let us make sure to align it properly. Kees: only applicable to CONFIG_USELIB kernels: 32-bit and configured for libc5. Link: http://lkml.kernel.org/r/20180705145539.9627-1-osalvador@techadventures.net Signed-off-by: Oscar Salvador <osalvador@suse.de> Reported-by: syzbot+5dcb560fe12aa5091c06@syzkaller.appspotmail.com Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Acked-by: Kees Cook <keescook@chromium.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-07-14 11:11:10 -07:00
Tomas Bortoli	02f51d4593	autofs: fix slab out of bounds read in getname_kernel() The autofs subsystem does not check that the "path" parameter is present for all cases where it is required when it is passed in via the "param" struct. In particular it isn't checked for the AUTOFS_DEV_IOCTL_OPENMOUNT_CMD ioctl command. To solve it, modify validate_dev_ioctl(function to check that a path has been provided for ioctl commands that require it. Link: http://lkml.kernel.org/r/153060031527.26631.18306637892746301555.stgit@pluto.themaw.net Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com> Signed-off-by: Ian Kent <raven@themaw.net> Reported-by: syzbot+60c837b428dc84e83a93@syzkaller.appspotmail.com Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-07-14 11:11:09 -07:00
Vlastimil Babka	e70cc2bd57	fs/proc/task_mmu.c: fix Locked field in /proc/pid/smaps* Thomas reports: "While looking around in /proc on my v4.14.52 system I noticed that all processes got a lot of "Locked" memory in /proc/*/smaps. A lot more memory than a regular user can usually lock with mlock(). Commit `493b0e9d94` (in v4.14-rc1) seems to have changed the behavior of "Locked". Before that commit the code was like this. Notice the VM_LOCKED check. (vma->vm_flags & VM_LOCKED) ? (unsigned long)(mss.pss >> (10 + PSS_SHIFT)) : 0); After that commit Locked is now the same as Pss: (unsigned long)(mss->pss >> (10 + PSS_SHIFT))); This looks like a mistake." Indeed, the commit has added mss->pss_locked with the correct value that depends on VM_LOCKED, but forgot to actually use it. Fix it. Link: http://lkml.kernel.org/r/ebf6c7fb-fec3-6a26-544f-710ed193c154@suse.cz Fixes: `493b0e9d94` ("mm: add /proc/pid/smaps_rollup") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Daniel Colascione <dancol@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-07-14 11:11:09 -07:00
Naohiro Aota	97b191702b	btrfs: fix use-after-free of cmp workspace pages btrfs_cmp_data_free() puts cmp's src_pages and dst_pages, but leaves their page address intact. Now, if you hit "goto again" in btrfs_extent_same_range() and hit some error in btrfs_cmp_data_prepare(), you'll try to unlock/put already put pages. This is simple fix to reset the address to avoid use-after-free. Fixes: `67b07bd4be` ("Btrfs: reuse cmp workspace in EXTENT_SAME ioctl") Signed-off-by: Naohiro Aota <naota@elisp.net> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-07-13 17:31:35 +02:00
David Sterba	20c5bbc640	btrfs: restore uuid_mutex in btrfs_open_devices Commit `542c5908ab` ("btrfs: replace uuid_mutex by device_list_mutex in btrfs_open_devices") switched to device_list_mutex as we need that for the device list traversal, but we also need uuid_mutex to protect access to fs_devices::opened to be consistent with other users of that. Fixes: `542c5908ab` ("btrfs: replace uuid_mutex by device_list_mutex in btrfs_open_devices") Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-07-13 14:55:46 +02:00
Theodore Ts'o	8d5a803c6a	ext4: check for allocation block validity with block group locked With commit 044e6e3d74a3: "ext4: don't update checksum of new initialized bitmaps" the buffer valid bit will get set without actually setting up the checksum for the allocation bitmap, since the checksum will get calculated once we actually allocate an inode or block. If we are doing this, then we need to (re-)check the verified bit after we take the block group lock. Otherwise, we could race with another process reading and verifying the bitmap, which would then complain about the checksum being invalid. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1780137 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2018-07-12 19:08:05 -04:00
Thomas Gleixner	c6bb11147e	Merge branch 'fortglx/4.19/time' of https://git.linaro.org/people/john.stultz/linux into timers/core Pull timekeeping updates from John Stultz: - Make the timekeeping update more precise when NTP frequency is set directly by updating the multiplier. - Adjust selftests	2018-07-12 22:19:58 +02:00
Al Viro	5f336e722c	few more cleanups of link_path_walk() callers Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:31 -04:00
Al Viro	9b5858e99a	allow link_path_walk() to take ERR_PTR() There is a check for IS_ERR(name) immediately upstream of each call of link_path_walk(name, nd), with positives treated as if link_path_walk() failed with PTR_ERR(name). Taking that check into link_path_walk() itself simplifies things nicely. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:30 -04:00
Al Viro	edc2b1da77	make path_init() unconditionally paired with terminate_walk() including the failure exits Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:30 -04:00
Al Viro	ee1904ba44	make alloc_file() static Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:29 -04:00
Al Viro	183266f26f	new helper: alloc_file_clone() alloc_file_clone(old_file, mode, ops): create a new struct file with ->f_path equal to that of old_file. pipe converted. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:28 -04:00
Al Viro	152b6372c9	create_pipe_files(): switch the first allocation to alloc_file_pseudo() Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:27 -04:00
Al Viro	52c91f8b3b	anon_inode_getfile(): switch to alloc_file_pseudo() Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:27 -04:00
Al Viro	e68375c850	hugetlb_file_setup(): switch to alloc_file_pseudo() Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:26 -04:00
Al Viro	d93aa9d82a	new wrapper: alloc_file_pseudo() takes inode, vfsmount, name, O_... flags and file_operations and either returns a new struct file (in which case inode reference we held is consumed) or returns ERR_PTR(), in which case no refcounts are altered. converted aio_private_file() and sock_alloc_file() to it Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:23 -04:00
Al Viro	00a07c1591	switch atomic_open() and lookup_open() to returning 0 in all success cases caller can tell "opened" from "open it yourself" by looking at ->f_mode. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:22 -04:00
Al Viro	64e1ac4d46	->atomic_open(): return 0 in all success cases FMODE_OPENED can be used to distingusish "successful open" from the "called finish_no_open(), do it yourself" cases. Since finish_no_open() has been adjusted, no changes in the instances were actually needed. The caller has been adjusted. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:21 -04:00
Al Viro	3ec2eef116	get rid of 'opened' in path_openat() and the helpers downstream unused now Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:21 -04:00
Al Viro	44907d7900	get rid of 'opened' argument of ->atomic_open() - part 3 now it can be done... Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:20 -04:00
Al Viro	b452a458ca	getting rid of 'opened' argument of ->atomic_open() - part 2 __gfs2_lookup(), gfs2_create_inode(), nfs_finish_open() and fuse_create_open() don't need 'opened' anymore. Get rid of that argument in those. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:20 -04:00
Al Viro	be12af3ef5	getting rid of 'opened' argument of ->atomic_open() - part 1 'opened' argument of finish_open() is unused. Kill it. Signed-off-by Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:19 -04:00
Al Viro	6035a27b25	IMA: don't propagate opened through the entire thing just check ->f_mode in ima_appraise_measurement() Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:19 -04:00
Al Viro	73a09dd943	introduce FMODE_CREATED and switch to it Parallel to FILE_CREATED, goes into ->f_mode instead of *opened. NFS is a bit of a wart here - it doesn't have file at the point where FILE_CREATED used to be set, so we need to propagate it there (for now). IMA is another one (here and everywhere)... Note that this needs do_dentry_open() to leave old bits in ->f_mode alone - we want it to preserve FMODE_CREATED if it had been already set (no other bit can be there). Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:18 -04:00
Al Viro	aad888f828	switch all remaining checks for FILE_OPENED to FMODE_OPENED ... and don't bother with setting FILE_OPENED at all. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:18 -04:00
Al Viro	69527c554f	now we can fold open_check_o_direct() into do_dentry_open() These checks are better off in do_dentry_open(); the reason we couldn't put them there used to be that callers couldn't tell what kind of cleanup would do_dentry_open() failure call for. Now that we have FMODE_OPENED, cleanup is the same in all cases - it's simply fput(). So let's fold that into do_dentry_open(), as Christoph's patch tried to. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:17 -04:00
Al Viro	7c1c01ec20	lift fput() on late failures into path_openat() Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:17 -04:00
Al Viro	4d27f3266f	fold put_filp() into fput() Just check FMODE_OPENED in __fput() and be done with that... Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:16 -04:00
Al Viro	f5d11409e6	introduce FMODE_OPENED basically, "is that instance set up enough for regular fput(), or do we want put_filp() for that one". NOTE: the only alloc_file() caller that could be followed by put_filp() is in arch/ia64/kernel/perfmon.c, which is (Kconfig-level) broken. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:16 -04:00
Al Viro	e3f20ae210	security_file_open(): lose cred argument Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:15 -04:00
Al Viro	ae2bb293a3	get rid of cred argument of vfs_open() and do_dentry_open() always equal to ->f_cred Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:14 -04:00
Al Viro	ea73ea7279	pass ->f_flags value to alloc_empty_file() ... and have it set the f_flags-derived part of ->f_mode. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:13 -04:00
Al Viro	6de37b6dc0	pass creds to get_empty_filp(), make sure dentry_open() passes the right creds ... and rename get_empty_filp() to alloc_empty_file(). dentry_open() gets creds as argument, but the only thing that sees those is security_file_open() - file->f_cred still ends up with current_cred(). For almost all callers it's the same thing, but there are several broken cases. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:04:13 -04:00
Al Viro	c9c554f214	alloc_file(): switch to passing O_... flags instead of FMODE_... mode ... so that it could set both ->f_flags and ->f_mode, without callers having to set ->f_flags manually. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-07-12 10:02:57 -04:00
Mark Rutland	9b54bf9d6a	kernel: add kcompat_sys_{f,}statfs64() Using this helper allows us to avoid the in-kernel calls to the compat_sys_{f,}statfs64() sycalls, as are necessary for parameter mangling in arm64's compat handling. Following the example of ksys_* functions, kcompat_sys_* functions are intended to be a drop-in replacement for their compat_sys_* counterparts, with the same calling convention. This is necessary to enable conversion of arm64's syscall handling to use pt_regs wrappers. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-07-12 14:49:48 +01:00
Carlos Maiolino	efe8032773	xfs: Initialize variables in xfs_alloc_get_rec before using them Make sure we initialize bno and len, before jumping to out_bad_rec label, and risk calling xfs_warn() with uninitialized variables. Coverity: 100898 Coverity: 1437081 Coverity: 1437129 Coverity: 1437191 Coverity: 1437201 Coverity: 1437212 Coverity: 1437341 Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:36 -07:00
Eric Sandeen	a4722a643f	xfs: remove unused iolock arg from xfs_break_dax_layouts Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:36 -07:00
Brian Foster	bb00b6f1e2	xfs: kill __xfs_buf_submit_common() Now that there is only one caller, fold the common submission helper into __xfs_buf_submit(). Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:35 -07:00
Brian Foster	6af88cda00	xfs: combine [a]sync buffer submission apis The buffer I/O submission path consists of separate function calls per type. The buffer I/O type is already controlled via buffer state (XBF_ASYNC), however, so there is no real need for separate submission functions. Combine the buffer submission functions into a single function that processes the buffer appropriately based on XBF_ASYNC. Retain an internal helper with a conditional wait parameter to continue to support batched !XBF_ASYNC submission/completion required by delwri queues. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:35 -07:00
Brian Foster	e339dd8d8b	xfs: use sync buffer I/O for sync delwri queue submission If a delwri queue occurs of a buffer that sits on a delwri queue wait list, the queue sets _XBF_DELWRI_Q without changing the state of ->b_list. This occurs, for example, if another thread beats the current delwri waiter thread to the buffer lock after I/O completion. Once the waiter acquires the lock, it removes the buffer from the wait list and leaves a buffer with _XBF_DELWRI_Q set but not populated on a list. This results in a lost buffer submission and in turn can result in assert failures due to _XBF_DELWRI_Q being set on buffer reclaim or filesystem lockups if the buffer happens to cover an item in the AIL. This problem has been reproduced by repeated iterations of xfs/305 on high CPU count (28xcpu) systems with limited memory (~1GB). Dirty dquot reclaim races with an xfsaild push of a separate dquot backed by the same buffer such that the buffer sits on the reclaim wait list at the time xfsaild attempts to queue it. Since the latter dquot has been flush locked but the underlying buffer not submitted for I/O, the dquot pins the AIL and causes the filesystem to livelock. This race is essentially made possible by the buffer lock cycle involved with waiting on a synchronous delwri queue submission. Close the race by using synchronous buffer I/O for respective delwri queue submission. This means the buffer remains locked across the I/O and so is inaccessible from other contexts while in the intermediate wait list state. The sync buffer I/O wait mechanism is factored into a helper such that sync delwri buffer submission and serialization are batched operations. Designed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:34 -07:00
Brian Foster	eaebb515f1	xfs: refactor buffer submission into a common helper Sync and async buffer submission both do generally similar things with a couple odd exceptions. Refactor the core buffer submission code into a common helper to isolate buffer submission from completion handling of synchronous buffer I/O. This patch does not change behavior. It is a step towards support for using synchronous buffer I/O via synchronous delwri queue submission. Designed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:34 -07:00
Brian Foster	5fdd97944e	xfs: remove xfs_defer_init() firstblock param All but one caller of xfs_defer_init() passes in the ->t_firstblock of the associated transaction. The one outlier is xlog_recover_process_intents(), which simply passes a dummy value because a valid pointer is required. This firstblock variable can simply be removed. At this point we could remove the xfs_defer_init() firstblock parameter and initialize ->t_firstblock directly. Even that is not necessary, however, because ->t_firstblock is automatically reinitialized in the new transaction on a transaction roll. Since xfs_defer_init() should never occur more than once on a particular transaction (since the corresponding finish will roll it), replace the reinit from xfs_defer_init() with an assert that verifies the transaction has a NULLFSBLOCK firstblock. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:33 -07:00
Brian Foster	9c3bf5da80	xfs: use ->t_firstblock in inode inactivate Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:32 -07:00
Brian Foster	f537538921	xfs: use ->t_firstblock in extent swap Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:32 -07:00
Brian Foster	381d592848	xfs: use ->t_firstblock in reflink cow block cancel Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:31 -07:00
Brian Foster	fb91f4b5d6	xfs: replace no-op firstblock init with ->t_firstblock xfs_refcount_recover_cow_leftovers() has no need for a firstblock variable and so passes an unrelated xfs_fsblock_t to xfs_defer_init() to avoid declaring one. Replace this no-op initialization with ->t_firstblock. This will be optimized away by the removal of the xfs_defer_init() firstblock param. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:31 -07:00
Brian Foster	058529c5f5	xfs: use ->t_firstblock in dq alloc Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:30 -07:00
Brian Foster	64396ff2c2	xfs: remove xfs_alloc_arg firstblock field The xfs_alloc_arg.firstblock field is used to control the starting agno for an allocation. The structure already carries a pointer to the transaction, which carries the current firstblock value. Remove the field and access ->t_firstblock directly in the allocation code. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:30 -07:00
Brian Foster	cf612de732	xfs: remove xfs_btree_cur private firstblock field The bmbt cursor private structure has a firstblock field that is used to maintain locking order on bmbt allocations. The field holds an actual firstblock value (as opposed to a pointer), so it is initialized on cursor creation, updated on allocation and then the value is transferred back to the source before the cursor is destroyed. This value is always transferred from and back to the ->t_firstblock field. Since xfs_btree_cur already carries a reference to the transaction, we can remove this field from xfs_btree_cur and the associated copying. The bmbt allocations will update the value in the transaction directly. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:29 -07:00
Brian Foster	280253d213	xfs: remove bmap format helpers firstblock params The bmap format helpers receive firstblock via ->t_firstblock. Drop the param and access it directly. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:29 -07:00
Brian Foster	92f9da30f5	xfs: remove bmap extent add helper firstblock params The add extent helpers all receive firstblock via ->t_firstblock. Drop the parameter and access it directly. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:28 -07:00
Brian Foster	94c07b4dba	xfs: remove xfs_bmalloca firstblock field The xfs_bmalloca.firstblock field carries the firstblock value from the transaction into the bmap infrastructure. It's initialized in one place from ->t_firstblock, so drop the field and access ->t_firstblock directly throughout the bmap code. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:28 -07:00
Brian Foster	4b77a088d7	xfs: use ->t_firstblock in bmap extent split Also remove the unnecessary xfs_bmap_split_extent_at() parameter. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:27 -07:00
Brian Foster	333f950c89	xfs: remove bmap insert/collapse firstblock param The only callers pass ->t_firstblock. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:27 -07:00
Brian Foster	2af5284253	xfs: remove xfs_bunmapi() firstblock param All callers pass ->t_firstblock from the current transaction. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:25 -07:00
Brian Foster	a7beabeae2	xfs: remove xfs_bmapi_write() firstblock param All callers pass ->t_firstblock from the current transaction. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:25 -07:00
Brian Foster	d0a9d79572	xfs: use ->t_firstblock in insert/collapse range Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:24 -07:00
Brian Foster	580c4ff948	xfs: use ->t_firstblock in xfs_bmapi_remap() Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:24 -07:00
Brian Foster	372837978d	xfs: use ->t_firstblock for all xfs_bunmapi() callers Convert all xfs_bunmapi() callers to ->t_firstblock. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:23 -07:00
Brian Foster	650919f131	xfs: use ->t_firstblock for all xfs_bmapi_write() callers Convert all xfs_bmapi_write() users to ->t_firstblock. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:23 -07:00
Brian Foster	766139032f	xfs: use ->t_firstblock in xattr ops Similar to the dirops code, the xattr code uses an on-stack firstblock variable for the various operations. This code rolls the underlying transaction in various places, however, which means we cannot simply replace the local firstblock vars with ->t_firstblock. Doing so (without further changes) would invalidate the memory pointed to by xfs_da_args.firstblock as soon as the first transaction rolls. To avoid this problem, remove xfs_da_args.firstblock and replace all such accesses with ->t_firstblock at the same time. This ensures that accesses to the current firstblock always occur through the current transaction rather than a potentially invalid xfs_da_args pointer. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:22 -07:00
Brian Foster	825d75cd8c	xfs: use ->t_firstblock in attrfork add Note that this codepath is a user of struct xfs_da_args. Switch it over to ->t_firstblock in preparation to remove xfs_da_args.firstblock. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:21 -07:00
Brian Foster	381eee69f8	xfs: remove firstblock param from xfs dir ops All callers of the xfs_dir_*() functions pass ->t_firstblock as the firstblock parameter. Drop the parameter and access ->t_firstblock directly. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:21 -07:00
Brian Foster	f16dea54b7	xfs: use ->t_firstblock in dir ops Callers of the xfs_dir_() functions currently pass an on-stack firstblock variable. While the dirops infrastructure carries a pointer to this variable, it never rolls the transaction and so it is safe to use ->t_firstblock instead. Fix up the various xfs_dir_() callers to use ->t_firstblock. Also remove the unnecessary parameter for xfs_cross_rename(). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:20 -07:00
Brian Foster	bba59c5e4b	xfs: add firstblock field to xfs_trans A firstblock var is typically allocated and initialized along with xfs_defer_ops structures and passed around independent from the associated transaction. To facilitate combining the two, add an optional ->t_firstblock field to xfs_trans that can be used in place of an on-stack variable. The firstblock value follows the lifetime of the transaction, so initialize it on allocation and when a transaction rolls. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:20 -07:00
Brian Foster	3ae2d89174	xfs: allow null firstblock in xfs_bmapi_write() when tp is null xfs_bmapi_write() always expects a valid firstblock pointer. It immediately dereferences the pointer to help determine how to initialize the bma.minleft field. The remaining accesses are related to modifying btree format forks, which is only relevant for !COW fork callers. The reflink code passes a NULL transaction to xfs_bmapi_write() in a couple places that do COW fork unwritten conversion. The purpose of the firstblock field is to track the first block allocation in the current transaction, so technically firstblock should not be required for these callers either. Tweak xfs_bmapi_write() to initialize the bma correctly without accessing the firstblock pointer if no transaction is provided in the first place. Update the reflink callers to pass NULL instead of otherwise unused firstblock references. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:19 -07:00
Brian Foster	bcd2c9f335	xfs: refactor dfops init to attach to transaction Most callers of xfs_defer_init() immediately attach the dfops structure to a transaction. Add a transaction parameter to eliminate much of this boilerplate code. This also helps self-document the fact that many codepaths now expect a dfops pointer implicitly via xfs_trans->t_dfops. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:19 -07:00
Brian Foster	d5669ed581	xfs: use ->t_dfops in reflink cow recover path Use ->t_dfops of the leftover COW reservation cleanup transaction. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:18 -07:00
Brian Foster	27356a063a	xfs: use ->t_dfops in cancel cow blocks operation Use ->t_dfops of the transaction from the caller. Reset it before we return to avoid leaks of local stack memory. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:18 -07:00
Brian Foster	7a7943c7e0	xfs: use ->t_dfops for rmap extent swap operations xfs_swap_extent_rmap() uses a local dfops instance with a transaction from the caller. Since there is only one caller, pull the dfops structure into the caller and attach it to the transaction. This avoids the need to clear ->t_dfops to prevent invalid stack memory access. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:17 -07:00
Brian Foster	ed7ef8e55c	xfs: remove unused btree cursor bc_private.a.dfops field The xfs_btree_cur.bc_private.a.dfops field is only ever initialized by the refcountbt cursor init function. The only caller of that function with a non-NULL dfops is from deferred completion context, which already has attached to ->t_dfops. In addition to that, the only actual reference of a.dfops is the cursor duplication function, which means the field is effectively unused. Remove the dfops field from the bc_private.a union. Any future users can acquire the dfops from the transaction. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:17 -07:00
Brian Foster	42b394a925	xfs: remove xfs_btree_cur bmbt dfops field All assignments of xfs_btree_cur.bc_private.b.dfops originate from ->t_dfops. Replace accesses of the former with the latter and remove the unnecessary field. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:16 -07:00
Brian Foster	81ba8f3e94	xfs: remove dfops param from internal bmap extent helpers All callers of the various bmap extent helpers now use ->t_dfops. Remove the unnecessary dfops params and access ->t_dfops directly. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:16 -07:00
Brian Foster	f4a9cf97fa	xfs: use ->t_dfops for collapse/insert range operations Use ->t_dfops for the collapse and insert range transactions. These are the only callers of the respective bmap helpers, so replace the unnecessary dfops parameters with direct accesses to ->t_dfops. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:15 -07:00
Brian Foster	3e3673e302	xfs: remove struct xfs_bmalloca dfops field Now that bma.dfops is only assigned from ->t_dfops, replace all accesses to the former with the latter and remove the unnecessary field. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:14 -07:00
Brian Foster	ff3edf255d	xfs: remove xfs_bmapi_remap() dfops param All xfs_bmapi_remap() callers already use ->t_dfops. Note that deferred completion context unconditionally sets ->t_dfops if it hasn't already been set by the caller. Remove the unnecessary parameter and access ->t_dfops directly. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:14 -07:00
Brian Foster	ccd9d91148	xfs: remove xfs_bunmapi() dfops param Now that all xfs_bunmapi() callers use ->t_dfops, remove the unnecessary parameter and access ->t_dfops directly. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:13 -07:00
Brian Foster	4bcfa613a0	xfs: use ->t_dfops for all xfs_bunmapi() callers Use ->t_dfops for all remaining xfs_bunmapi() callers. This prepares the latter to no longer require a dfops parameter. Note that xfs_itruncate_extents_flags() associates a local dfops with a transaction provided from the caller. Since there are multiple callers, set and reset ->t_dfops before the function returns to avoid exposure of stack memory to the caller. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:13 -07:00
Brian Foster	6e702a5dcb	xfs: remove xfs_bmapi_write() dfops param Now that all callers use ->t_dfops, the xfs_bmapi_write() dfops parameter is no longer necessary. Remove it and access ->t_dfops directly. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:12 -07:00
Brian Foster	175d1a013e	xfs: use ->t_dfops for all xfs_bmapi_write() callers Attach ->t_dfops for all remaining callers of xfs_bmapi_write(). This prepares the latter to no longer require a separate dfops parameter. Note that xfs_symlink() already uses ->t_dfops. Fix up the local references for consistency. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:12 -07:00
Brian Foster	2ba1372125	xfs: use ->t_dfops in dqalloc transaction xfs_dquot_disk_alloc() receives a transaction from the caller and passes a local dfops along to xfs_bmapi_write(). If we attach this dfops to the transaction, we have to make sure to clear it before returning to avoid invalid access of stack memory. Since xfs_qm_dqread_alloc() is the only caller, pull dfops into the caller and attach it to the transaction to eliminate this pattern entirely. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:11 -07:00
Brian Foster	32a9b7c65c	xfs: replace xfs_da_args->dfops accesses with ->t_dfops and remove Now that xfs_da_args->dfops is always assigned from a ->t_dfops pointer (or one that is immediately attached), replace all downstream accesses of the former with the latter and remove the field from struct xfs_da_args. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:11 -07:00
Brian Foster	d76e6ce8ed	xfs: use ->t_dfops in extent split tx and remove param Attach the local dfops to ->t_dfops of the extent split transaction. Since this is the only caller of xfs_bmap_split_extent_at(), remove the dfops parameter as well. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-07-11 22:26:10 -07:00

... 5 6 7 8 9 ...

55278 Commits