platform_kernel-5.15

Commit Graph

Author	SHA1	Message	Date
Greg Kroah-Hartman	61abfd4773	Merge 5.15.31 into android13-5.15 Changes in 5.15.31 crypto: qcom-rng - ensure buffer for generate is completely filled ocfs2: fix crash when initialize filecheck kobj fails mm: swap: get rid of livelock in swapin readahead block: release rq qos structures for queue without disk drm/mgag200: Fix PLL setup for g200wb and g200ew efi: fix return value of __setup handlers alx: acquire mutex for alx_reinit in alx_change_mtu vsock: each transport cycles only on its own sockets esp6: fix check on ipv6_skip_exthdr's return value net: phy: marvell: Fix invalid comparison in the resume and suspend functions net/packet: fix slab-out-of-bounds access in packet_recvmsg() atm: eni: Add check for dma_map_single iavf: Fix double free in iavf_reset_task hv_netvsc: Add check for kvmalloc_array drm/imx: parallel-display: Remove bus flags check in imx_pd_bridge_atomic_check() drm/panel: simple: Fix Innolux G070Y2-L01 BPP settings net: handle ARPHRD_PIMREG in dev_is_mac_header_xmit() drm: Don't make DRM_PANEL_BRIDGE dependent on DRM_KMS_HELPERS net: dsa: Add missing of_node_put() in dsa_port_parse_of net: phy: mscc: Add MODULE_FIRMWARE macros bnx2x: fix built-in kernel driver load failure net: bcmgenet: skip invalid partial checksums net: mscc: ocelot: fix backwards compatibility with single-chain tc-flower offload iavf: Fix hang during reboot/shutdown arm64: fix clang warning about TRAMP_VALIAS usb: gadget: rndis: prevent integer overflow in rndis_set_response() usb: gadget: Fix use-after-free bug by not setting udc->dev.driver usb: usbtmc: Fix bug in pipe direction for control transfers scsi: mpt3sas: Page fault in reply q processing Input: aiptek - properly check endpoint type perf symbols: Fix symbol size calculation condition btrfs: skip reserved bytes warning on unmount after log cleanup failure Linux 5.15.31 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Iea69c3aeae614eb6b871993dc29fc1010c064f24	2022-03-24 13:16:37 +01:00
Jaegeuk Kim	f4210b9427	Merge remote-tracking branch 'aosp/upstream-f2fs-stable-linux-5.15.y' into android13-5.15 * aosp/upstream-f2fs-stable-linux-5.15.y: fscrypt: update documentation for direct I/O support f2fs: support direct I/O with fscrypt using blk-crypto ext4: support direct I/O with fscrypt using blk-crypto iomap: support direct I/O with fscrypt using blk-crypto fscrypt: add functions for direct I/O support f2fs: fix to do sanity check on .cp_pack_total_block_count f2fs: make gc_urgent and gc_segment_mode sysfs node readable f2fs: use aggressive GC policy during f2fs_disable_checkpoint() f2fs: fix compressed file start atomic write may cause data corruption f2fs: initialize sbi->gc_mode explicitly f2fs: introduce gc_urgent_mid mode f2fs: compress: fix to print raw data size in error path of lz4 decompression f2fs: remove redundant parameter judgment f2fs: use spin_lock to avoid hang f2fs: don't get FREEZE lock in f2fs_evict_inode in frozen fs f2fs: remove unnecessary read for F2FS_FITS_IN_INODE f2fs: introduce F2FS_UNFAIR_RWSEM to support unfair rwsem f2fs: avoid an infinite loop in f2fs_sync_dirty_inodes f2fs: fix to do sanity check on curseg->alloc_type f2fs: fix to avoid potential deadlock f2fs: quota: fix loop condition at f2fs_quota_sync() f2fs: Restore rwsem lockdep support f2fs: fix missing free nid in f2fs_handle_failed_inode f2fs: support idmapped mounts f2fs: add a way to limit roll forward recovery time f2fs: introduce F2FS_IPU_HONOR_OPU_WRITE ipu policy f2fs: adjust readahead block number during recovery f2fs: fix to unlock page correctly in error path of is_alive() f2fs: expose discard related parameters in sysfs f2fs: move discard parameters into discard_cmd_control fs: handle circular mappings correctly f2fs: fix to enable ATGC correctly via gc_idle sysfs interface f2fs: move f2fs to use reader-unfair rwsems Bug: 216636351 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@google.com> Change-Id: I53cc37765ba69df2a9b7b9c070e4938822354f05	2022-03-24 00:50:44 +00:00
Michel Lespinasse	6febc3942c	BACKPORT: FROMLIST: f2fs: implement speculative fault handling We just need to make sure f2fs_filemap_fault() doesn't block in the speculative case as it is called with an rcu read lock held. Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Link: https://lore.kernel.org/all/20210407014502.24091-33-michel@lespinasse.org/ Conflicts: fs/f2fs/file.c 1. The change in f2fs_filemap_fault is not needed since i_mmap_sem is not used anymore. Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: If7a46e131ee38ca02a4c5b8a76ab4eb742acbe95	2022-03-23 11:32:19 -07:00
Michel Lespinasse	a21ca34904	BACKPORT: FROMLIST: ext4: implement speculative fault handling We just need to make sure ext4_filemap_fault() doesn't block in the speculative case as it is called with an rcu read lock held. Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Link: https://lore.kernel.org/all/20210407014502.24091-32-michel@lespinasse.org/ Conflicts: fs/ext4/inode.c 1. The change in fs/ext4/inode.c is not needed since i_mmap_sem is not used anymore. Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Idafc81074cf7f4b31985bdb24e0cc1597c91b875	2022-03-23 11:32:19 -07:00
Michel Lespinasse	a2138fee6c	FROMLIST: fs: list file types that support speculative faults. Add a speculative field to the vm_operations_struct, which indicates if the associated file type supports speculative faults. Initially this is set for files that implement fault() with filemap_fault(). Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Link: https://lore.kernel.org/all/20210407014502.24091-30-michel@lespinasse.org/ Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ic92efdf13283c45e7da7bf703f4f85f8b392ba69	2022-03-23 11:32:19 -07:00
Michel Lespinasse	7045d2d838	FROMLIST: mm: implement speculative handling in do_fault_around() Call the vm_ops->map_pages method within an rcu read locked section. In the speculative case, verify the mmap sequence lock at the start of the section. A match guarantees that the original vma is still valid at that time, and that the associated vma->vm_file stays valid while the vm_ops->map_pages() method is running. Do not test vmf->pmd in the speculative case - we only speculate when a page table already exists, and and this saves us from having to handle synchronization around the vmf->pmd read. Change xfs_filemap_map_pages() account for the fact that it can not block anymore, as it is now running within an rcu read lock. Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Link: https://lore.kernel.org/all/20210407014502.24091-28-michel@lespinasse.org/ Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Id771c1e6fa9b883595a48d4df63f448a05916eda	2022-03-23 11:32:19 -07:00
Michel Lespinasse	48e35d053f	FROMLIST: mm: rcu safe vma->vm_file freeing Defer freeing of vma->vm_file when freeing vmas. This is to allow speculative page faults in the mapped file case. Signed-off-by: Michel Lespinasse <michel@lespinasse.org> Link: https://lore.kernel.org/all/20210407014502.24091-24-michel@lespinasse.org/ Bug: 161210518 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ic766bc2086db82eae9f3aadf0f23dd743be1c464	2022-03-23 11:32:18 -07:00
Filipe Manana	4c5d94990f	btrfs: skip reserved bytes warning on unmount after log cleanup failure commit 40cdc509877bacb438213b83c7541c5e24a1d9ec upstream. After the recent changes made by commit c2e39305299f01 ("btrfs: clear extent buffer uptodate when we fail to write it") and its followup fix, commit 651740a5024117 ("btrfs: check WRITE_ERR when trying to read an extent buffer"), we can now end up not cleaning up space reservations of log tree extent buffers after a transaction abort happens, as well as not cleaning up still dirty extent buffers. This happens because if writeback for a log tree extent buffer failed, then we have cleared the bit EXTENT_BUFFER_UPTODATE from the extent buffer and we have also set the bit EXTENT_BUFFER_WRITE_ERR on it. Later on, when trying to free the log tree with free_log_tree(), which iterates over the tree, we can end up getting an -EIO error when trying to read a node or a leaf, since read_extent_buffer_pages() returns -EIO if an extent buffer does not have EXTENT_BUFFER_UPTODATE set and has the EXTENT_BUFFER_WRITE_ERR bit set. Getting that -EIO means that we return immediately as we can not iterate over the entire tree. In that case we never update the reserved space for an extent buffer in the respective block group and space_info object. When this happens we get the following traces when unmounting the fs: [174957.284509] BTRFS: error (device dm-0) in cleanup_transaction:1913: errno=-5 IO failure [174957.286497] BTRFS: error (device dm-0) in free_log_tree:3420: errno=-5 IO failure [174957.399379] ------------[ cut here ]------------ [174957.402497] WARNING: CPU: 2 PID: 3206883 at fs/btrfs/block-group.c:127 btrfs_put_block_group+0x77/0xb0 [btrfs] [174957.407523] Modules linked in: btrfs overlay dm_zero (...) [174957.424917] CPU: 2 PID: 3206883 Comm: umount Tainted: G W 5.16.0-rc5-btrfs-next-109 #1 [174957.426689] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [174957.428716] RIP: 0010:btrfs_put_block_group+0x77/0xb0 [btrfs] [174957.429717] Code: 21 48 8b bd (...) [174957.432867] RSP: 0018:ffffb70d41cffdd0 EFLAGS: 00010206 [174957.433632] RAX: 0000000000000001 RBX: ffff8b09c3848000 RCX: ffff8b0758edd1c8 [174957.434689] RDX: 0000000000000001 RSI: ffffffffc0b467e7 RDI: ffff8b0758edd000 [174957.436068] RBP: ffff8b0758edd000 R08: 0000000000000000 R09: 0000000000000000 [174957.437114] R10: 0000000000000246 R11: 0000000000000000 R12: ffff8b09c3848148 [174957.438140] R13: ffff8b09c3848198 R14: ffff8b0758edd188 R15: dead000000000100 [174957.439317] FS: 00007f328fb82800(0000) GS:ffff8b0a2d200000(0000) knlGS:0000000000000000 [174957.440402] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [174957.441164] CR2: 00007fff13563e98 CR3: 0000000404f4e005 CR4: 0000000000370ee0 [174957.442117] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [174957.443076] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [174957.443948] Call Trace: [174957.444264] <TASK> [174957.444538] btrfs_free_block_groups+0x255/0x3c0 [btrfs] [174957.445238] close_ctree+0x301/0x357 [btrfs] [174957.445803] ? call_rcu+0x16c/0x290 [174957.446250] generic_shutdown_super+0x74/0x120 [174957.446832] kill_anon_super+0x14/0x30 [174957.447305] btrfs_kill_super+0x12/0x20 [btrfs] [174957.447890] deactivate_locked_super+0x31/0xa0 [174957.448440] cleanup_mnt+0x147/0x1c0 [174957.448888] task_work_run+0x5c/0xa0 [174957.449336] exit_to_user_mode_prepare+0x1e5/0x1f0 [174957.449934] syscall_exit_to_user_mode+0x16/0x40 [174957.450512] do_syscall_64+0x48/0xc0 [174957.450980] entry_SYSCALL_64_after_hwframe+0x44/0xae [174957.451605] RIP: 0033:0x7f328fdc4a97 [174957.452059] Code: 03 0c 00 f7 (...) [174957.454320] RSP: 002b:00007fff13564ec8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [174957.455262] RAX: 0000000000000000 RBX: 00007f328feea264 RCX: 00007f328fdc4a97 [174957.456131] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000560b8ae51dd0 [174957.457118] RBP: 0000560b8ae51ba0 R08: 0000000000000000 R09: 00007fff13563c40 [174957.458005] R10: 00007f328fe49fc0 R11: 0000000000000246 R12: 0000000000000000 [174957.459113] R13: 0000560b8ae51dd0 R14: 0000560b8ae51cb0 R15: 0000000000000000 [174957.460193] </TASK> [174957.460534] irq event stamp: 0 [174957.461003] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [174957.461947] hardirqs last disabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040 [174957.463147] softirqs last enabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040 [174957.465116] softirqs last disabled at (0): [<0000000000000000>] 0x0 [174957.466323] ---[ end trace bc7ee0c490bce3af ]--- [174957.467282] ------------[ cut here ]------------ [174957.468184] WARNING: CPU: 2 PID: 3206883 at fs/btrfs/block-group.c:3976 btrfs_free_block_groups+0x330/0x3c0 [btrfs] [174957.470066] Modules linked in: btrfs overlay dm_zero (...) [174957.483137] CPU: 2 PID: 3206883 Comm: umount Tainted: G W 5.16.0-rc5-btrfs-next-109 #1 [174957.484691] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [174957.486853] RIP: 0010:btrfs_free_block_groups+0x330/0x3c0 [btrfs] [174957.488050] Code: 00 00 00 ad de (...) [174957.491479] RSP: 0018:ffffb70d41cffde0 EFLAGS: 00010206 [174957.492520] RAX: ffff8b08d79310b0 RBX: ffff8b09c3848000 RCX: 0000000000000000 [174957.493868] RDX: 0000000000000001 RSI: fffff443055ee600 RDI: ffffffffb1131846 [174957.495183] RBP: ffff8b08d79310b0 R08: 0000000000000000 R09: 0000000000000000 [174957.496580] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8b08d7931000 [174957.498027] R13: ffff8b09c38492b0 R14: dead000000000122 R15: dead000000000100 [174957.499438] FS: 00007f328fb82800(0000) GS:ffff8b0a2d200000(0000) knlGS:0000000000000000 [174957.500990] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [174957.502117] CR2: 00007fff13563e98 CR3: 0000000404f4e005 CR4: 0000000000370ee0 [174957.503513] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [174957.504864] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [174957.506167] Call Trace: [174957.506654] <TASK> [174957.507047] close_ctree+0x301/0x357 [btrfs] [174957.507867] ? call_rcu+0x16c/0x290 [174957.508567] generic_shutdown_super+0x74/0x120 [174957.509447] kill_anon_super+0x14/0x30 [174957.510194] btrfs_kill_super+0x12/0x20 [btrfs] [174957.511123] deactivate_locked_super+0x31/0xa0 [174957.511976] cleanup_mnt+0x147/0x1c0 [174957.512610] task_work_run+0x5c/0xa0 [174957.513309] exit_to_user_mode_prepare+0x1e5/0x1f0 [174957.514231] syscall_exit_to_user_mode+0x16/0x40 [174957.515069] do_syscall_64+0x48/0xc0 [174957.515718] entry_SYSCALL_64_after_hwframe+0x44/0xae [174957.516688] RIP: 0033:0x7f328fdc4a97 [174957.517413] Code: 03 0c 00 f7 d8 (...) [174957.521052] RSP: 002b:00007fff13564ec8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [174957.522514] RAX: 0000000000000000 RBX: 00007f328feea264 RCX: 00007f328fdc4a97 [174957.523950] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000560b8ae51dd0 [174957.525375] RBP: 0000560b8ae51ba0 R08: 0000000000000000 R09: 00007fff13563c40 [174957.526763] R10: 00007f328fe49fc0 R11: 0000000000000246 R12: 0000000000000000 [174957.528058] R13: 0000560b8ae51dd0 R14: 0000560b8ae51cb0 R15: 0000000000000000 [174957.529404] </TASK> [174957.529843] irq event stamp: 0 [174957.530256] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [174957.531061] hardirqs last disabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040 [174957.532075] softirqs last enabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040 [174957.533083] softirqs last disabled at (0): [<0000000000000000>] 0x0 [174957.533865] ---[ end trace bc7ee0c490bce3b0 ]--- [174957.534452] BTRFS info (device dm-0): space_info 4 has 1070841856 free, is not full [174957.535404] BTRFS info (device dm-0): space_info total=1073741824, used=2785280, pinned=0, reserved=49152, may_use=0, readonly=65536 zone_unusable=0 [174957.537029] BTRFS info (device dm-0): global_block_rsv: size 0 reserved 0 [174957.537859] BTRFS info (device dm-0): trans_block_rsv: size 0 reserved 0 [174957.538697] BTRFS info (device dm-0): chunk_block_rsv: size 0 reserved 0 [174957.539552] BTRFS info (device dm-0): delayed_block_rsv: size 0 reserved 0 [174957.540403] BTRFS info (device dm-0): delayed_refs_rsv: size 0 reserved 0 This also means that in case we have log tree extent buffers that are still dirty, we can end up not cleaning them up in case we find an extent buffer with EXTENT_BUFFER_WRITE_ERR set on it, as in that case we have no way for iterating over the rest of the tree. This issue is very often triggered with test cases generic/475 and generic/648 from fstests. The issue could almost be fixed by iterating over the io tree attached to each log root which keeps tracks of the range of allocated extent buffers, log_root->dirty_log_pages, however that does not work and has some inconveniences: 1) After we sync the log, we clear the range of the extent buffers from the io tree, so we can't find them after writeback. We could keep the ranges in the io tree, with a separate bit to signal they represent extent buffers already written, but that means we need to hold into more memory until the transaction commits. How much more memory is used depends a lot on whether we are able to allocate contiguous extent buffers on disk (and how often) for a log tree - if we are able to, then a single extent state record can represent multiple extent buffers, otherwise we need multiple extent state record structures to track each extent buffer. In fact, my earlier approach did that: https://lore.kernel.org/linux-btrfs/3aae7c6728257c7ce2279d6660ee2797e5e34bbd.1641300250.git.fdmanana@suse.com/ However that can cause a very significant negative impact on performance, not only due to the extra memory usage but also because we get a larger and deeper dirty_log_pages io tree. We got a report that, on beefy machines at least, we can get such performance drop with fsmark for example: https://lore.kernel.org/linux-btrfs/20220117082426.GE32491@xsang-OptiPlex-9020/ 2) We would be doing it only to deal with an unexpected and exceptional case, which is basically failure to read an extent buffer from disk due to IO failures. On a healthy system we don't expect transaction aborts to happen after all; 3) Instead of relying on iterating the log tree or tracking the ranges of extent buffers in the dirty_log_pages io tree, using the radix tree that tracks extent buffers (fs_info->buffer_radix) to find all log tree extent buffers is not reliable either, because after writeback of an extent buffer it can be evicted from memory by the release page callback of the btree inode (btree_releasepage()). Since there's no way to be able to properly cleanup a log tree without being able to read its extent buffers from disk and without using more memory to track the logical ranges of the allocated extent buffers do the following: 1) When we fail to cleanup a log tree, setup a flag that indicates that failure; 2) Trigger writeback of all log tree extent buffers that are still dirty, and wait for the writeback to complete. This is just to cleanup their state, page states, page leaks, etc; 3) When unmounting the fs, ignore if the number of bytes reserved in a block group and in a space_info is not 0 if, and only if, we failed to cleanup a log tree. Also ignore only for metadata block groups and the metadata space_info object. This is far from a perfect solution, but it serves to silence test failures such as those from generic/475 and generic/648. However having a non-zero value for the reserved bytes counters on unmount after a transaction abort, is not such a terrible thing and it's completely harmless, it does not affect the filesystem integrity in any way. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-23 09:16:43 +01:00
Joseph Qi	b786b64dcb	ocfs2: fix crash when initialize filecheck kobj fails commit 7b0b1332cfdb94489836b67d088a779699f8e47e upstream. Once s_root is set, genric_shutdown_super() will be called if fill_super() fails. That means, we will call ocfs2_dismount_volume() twice in such case, which can lead to kernel crash. Fix this issue by initializing filecheck kobj before setting s_root. Link: https://lkml.kernel.org/r/20220310081930.86305-1-joseph.qi@linux.alibaba.com Fixes: `5f483c4abb` ("ocfs2: add kobject for online file check") Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-23 09:16:40 +01:00
Eric Biggers	95f58823a6	f2fs: support direct I/O with fscrypt using blk-crypto Encrypted files traditionally haven't supported DIO, due to the need to encrypt/decrypt the data. However, when the encryption is implemented using inline encryption (blk-crypto) instead of the traditional filesystem-layer encryption, it is straightforward to support DIO. Therefore, make f2fs support DIO on files that are using inline encryption. Since f2fs uses iomap for DIO, and fscrypt support was already added to iomap DIO, this just requires two small changes: - Let DIO proceed when supported, by checking fscrypt_dio_supported() instead of assuming that encrypted files never support DIO. - In f2fs_iomap_begin(), use fscrypt_limit_io_blocks() to limit the length of the mapping in the rare case where a DUN discontiguity occurs in the middle of an extent. The iomap DIO implementation requires this, since it assumes that it can submit a bio covering (up to) the whole mapping, without checking fscrypt constraints itself. Co-developed-by: Satya Tangirala <satyat@google.com> Signed-off-by: Satya Tangirala <satyat@google.com> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Link: https://lore.kernel.org/r/20220128233940.79464-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2022-03-21 13:53:37 -07:00
Eric Biggers	895c7aafed	ext4: support direct I/O with fscrypt using blk-crypto Encrypted files traditionally haven't supported DIO, due to the need to encrypt/decrypt the data. However, when the encryption is implemented using inline encryption (blk-crypto) instead of the traditional filesystem-layer encryption, it is straightforward to support DIO. Therefore, make ext4 support DIO on files that are using inline encryption. Since ext4 uses iomap for DIO, and fscrypt support was already added to iomap DIO, this just requires two small changes: - Let DIO proceed when supported, by checking fscrypt_dio_supported() instead of assuming that encrypted files never support DIO. - In ext4_iomap_begin(), use fscrypt_limit_io_blocks() to limit the length of the mapping in the rare case where a DUN discontiguity occurs in the middle of an extent. The iomap DIO implementation requires this, since it assumes that it can submit a bio covering (up to) the whole mapping, without checking fscrypt constraints itself. Co-developed-by: Satya Tangirala <satyat@google.com> Signed-off-by: Satya Tangirala <satyat@google.com> Acked-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org> Link: https://lore.kernel.org/r/20220128233940.79464-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2022-03-21 13:53:21 -07:00
Eric Biggers	44373e0889	iomap: support direct I/O with fscrypt using blk-crypto Encrypted files traditionally haven't supported DIO, due to the need to encrypt/decrypt the data. However, when the encryption is implemented using inline encryption (blk-crypto) instead of the traditional filesystem-layer encryption, it is straightforward to support DIO. Add support for this to the iomap DIO implementation by calling fscrypt_set_bio_crypt_ctx() to set encryption contexts on the bios. Don't check for the rare case where a DUN (crypto data unit number) discontiguity creates a boundary that bios must not cross. Instead, filesystems are expected to handle this in ->iomap_begin() by limiting the length of the mapping so that iomap doesn't have to worry about it. Co-developed-by: Satya Tangirala <satyat@google.com> Signed-off-by: Satya Tangirala <satyat@google.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220128233940.79464-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2022-03-21 13:51:22 -07:00
Eric Biggers	aa132b78b3	fscrypt: add functions for direct I/O support Encrypted files traditionally haven't supported DIO, due to the need to encrypt/decrypt the data. However, when the encryption is implemented using inline encryption (blk-crypto) instead of the traditional filesystem-layer encryption, it is straightforward to support DIO. In preparation for supporting this, add the following functions: - fscrypt_dio_supported() checks whether a DIO request is supported as far as encryption is concerned. Encrypted files will only support DIO when inline encryption is used and the I/O request is properly aligned; this function checks these preconditions. - fscrypt_limit_io_blocks() limits the length of a bio to avoid crossing a place in the file that a bio with an encryption context cannot cross due to a DUN discontiguity. This function is needed by filesystems that use the iomap DIO implementation (which operates directly on logical ranges, so it won't use fscrypt_mergeable_bio()) and that support FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32. Co-developed-by: Satya Tangirala <satyat@google.com> Signed-off-by: Satya Tangirala <satyat@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220128233940.79464-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2022-03-21 13:50:20 -07:00
Tadeusz Struk	fdf79bad05	ANDROID: incremental-fs: populate userns before calling vfs_rename The old and new mount user name spaces need to be populated before calling vfs_rename(). Otherwise vfs_rename will try to dereference a null ptr and segfault. Bug: 211066171 Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Change-Id: I3656073581218107fc3b1a52ebe7bcfd81a10fc2	2022-03-21 20:23:31 +00:00
Chao Yu	58b6425d90	f2fs: fix to do sanity check on .cp_pack_total_block_count As bughunter reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215709 f2fs may hang when mounting a fuzzed image, the dmesg shows as below: __filemap_get_folio+0x3a9/0x590 pagecache_get_page+0x18/0x60 __get_meta_page+0x95/0x460 [f2fs] get_checkpoint_version+0x2a/0x1e0 [f2fs] validate_checkpoint+0x8e/0x2a0 [f2fs] f2fs_get_valid_checkpoint+0xd0/0x620 [f2fs] f2fs_fill_super+0xc01/0x1d40 [f2fs] mount_bdev+0x18a/0x1c0 f2fs_mount+0x15/0x20 [f2fs] legacy_get_tree+0x28/0x50 vfs_get_tree+0x27/0xc0 path_mount+0x480/0xaa0 do_mount+0x7c/0xa0 __x64_sys_mount+0x8b/0xe0 do_syscall_64+0x38/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae The root cause is cp_pack_total_block_count field in checkpoint was fuzzed to one, as calcuated, two cp pack block locates in the same block address, so then read latter cp pack block, it will block on the page lock due to the lock has already held when reading previous cp pack block, fix it by adding sanity check for cp_pack_total_block_count. Cc: stable@vger.kernel.org Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-03-21 09:19:56 -07:00
Daeho Jeong	9c53c520f0	f2fs: make gc_urgent and gc_segment_mode sysfs node readable Changed a way of showing values of them to use strings. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-03-21 09:18:39 -07:00
Chao Yu	e8bde3f715	f2fs: use aggressive GC policy during f2fs_disable_checkpoint() Let's enable GC_URGENT_HIGH mode during f2fs_disable_checkpoint(), so that we can use SSR allocator for GCed data/node persistence, it can improve the performance due to it avoiding migration of data/node locates in selected target segment of SSR allocator. Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-03-18 09:22:30 -07:00
Fengnan Chang	1d838d8e1b	f2fs: fix compressed file start atomic write may cause data corruption When compressed file has blocks, f2fs_ioc_start_atomic_write will succeed, but compressed flag will be remained in inode. If write partial compreseed cluster and commit atomic write will cause data corruption. This is the reproduction process: Step 1: create a compressed file ,write 64K data , call fsync(), then the blocks are write as compressed cluster. Step2: iotcl(F2FS_IOC_START_ATOMIC_WRITE) --- this should be fail, but not. write page 0 and page 3. iotcl(F2FS_IOC_COMMIT_ATOMIC_WRITE) -- page 0 and 3 write as normal file, Step3: drop cache. read page 0-4 -- Since page 0 has a valid block address, read as non-compressed cluster, page 1 and 2 will be filled with compressed data or zero. The root cause is, after commit `7eab7a6968` ("f2fs: compress: remove unneeded read when rewrite whole cluster"), in step 2, f2fs_write_begin() only set target page dirty, and in f2fs_commit_inmem_pages(), we will write partial raw pages into compressed cluster, result in corrupting compressed cluster layout. Fixes: `4c8ff7095b` ("f2fs: support data compression") Fixes: `7eab7a6968` ("f2fs: compress: remove unneeded read when rewrite whole cluster") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Fengnan Chang <changfengnan@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-03-18 09:22:30 -07:00
Greg Kroah-Hartman	28b046777f	Merge 5.15.29 into android-5.15 Changes in 5.15.29 arm64: dts: qcom: sm8350: Describe GCC dependency clocks arm64: dts: qcom: sm8350: Correct UFS symbol clocks HID: elo: Revert USB reference counting HID: hid-thrustmaster: fix OOB read in thrustmaster_interrupts ARM: boot: dts: bcm2711: Fix HVS register range clk: qcom: gdsc: Add support to update GDSC transition delay clk: qcom: dispcc: Update the transition delay for MDSS GDSC HID: vivaldi: fix sysfs attributes leak arm64: dts: armada-3720-turris-mox: Add missing ethernet0 alias tipc: fix kernel panic when enabling bearer vdpa/mlx5: add validation for VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command vduse: Fix returning wrong type in vduse_domain_alloc_iova() net: phy: meson-gxl: fix interrupt handling in forced mode mISDN: Fix memory leak in dsp_pipeline_build() vhost: fix hung thread due to erroneous iotlb entries virtio-blk: Don't use MAX_DISCARD_SEGMENTS if max_discard_seg is zero vdpa: fix use-after-free on vp_vdpa_remove isdn: hfcpci: check the return value of dma_set_mask() in setup_hw() net: qlogic: check the return value of dma_alloc_coherent() in qed_vf_hw_prepare() esp: Fix possible buffer overflow in ESP transformation esp: Fix BEET mode inter address family tunneling on GSO qed: return status of qed_iov_get_link smsc95xx: Ignore -ENODEV errors when device is unplugged gpiolib: acpi: Convert ACPI value of debounce to microseconds drm/sun4i: mixer: Fix P010 and P210 format numbers net: dsa: mt7530: fix incorrect test in mt753x_phylink_validate() ARM: dts: aspeed: Fix AST2600 quad spi group iavf: Fix handling of vlan strip virtual channel messages i40e: stop disabling VFs due to PF error responses ice: stop disabling VFs due to PF error responses ice: Fix error with handling of bonding MTU ice: Don't use GFP_KERNEL in atomic context ice: Fix curr_link_speed advertised speed ethernet: Fix error handling in xemaclite_of_probe tipc: fix incorrect order of state message data sanity check net: ethernet: ti: cpts: Handle error for clk_enable net: ethernet: lpc_eth: Handle error for clk_enable net: marvell: prestera: Add missing of_node_put() in prestera_switch_set_base_mac_addr ax25: Fix NULL pointer dereference in ax25_kill_by_device net/mlx5: Fix size field in bufferx_reg struct net/mlx5: Fix a race on command flush flow net/mlx5e: Lag, Only handle events from highest priority multipath entry NFC: port100: fix use-after-free in port100_send_complete selftests: pmtu.sh: Kill tcpdump processes launched by subshell. selftests: pmtu.sh: Kill nettest processes launched in subshell. gpio: ts4900: Do not set DAT and OE together gianfar: ethtool: Fix refcount leak in gfar_get_ts_info net: phy: DP83822: clear MISR2 register to disable interrupts sctp: fix kernel-infoleak for SCTP sockets net: bcmgenet: Don't claim WOL when its not available net: phy: meson-gxl: improve link-up behavior selftests/bpf: Add test for bpf_timer overwriting crash swiotlb: fix info leak with DMA_FROM_DEVICE usb: dwc3: pci: add support for the Intel Raptor Lake-S pinctrl: tigerlake: Revert "Add Alder Lake-M ACPI ID" KVM: Fix lockdep false negative during host resume kvm: x86: Disable KVM_HC_CLOCK_PAIRING if tsc is in always catchup mode spi: rockchip: Fix error in getting num-cs property spi: rockchip: terminate dma transmission when slave abort drm/vc4: hdmi: Unregister codec device on unbind x86/kvm: Don't use pv tlb/ipi/sched_yield if on 1 vCPU net-sysfs: add check for netdevice being present to speed_show hwmon: (pmbus) Clear pmbus fault/warning bits after read PCI: Mark all AMD Navi10 and Navi14 GPU ATS as broken gpio: Return EPROBE_DEFER if gc->to_irq is NULL drm/amdgpu: bypass tiling flag check in virtual display case (v2) Revert "xen-netback: remove 'hotplug-status' once it has served its purpose" Revert "xen-netback: Check for hotplug-status existence before watching" ipv6: prevent a possible race condition with lifetimes tracing: Ensure trace buffer is at least 4096 bytes large tracing/osnoise: Make osnoise_main to sleep for microseconds selftest/vm: fix map_fixed_noreplace test failure selftests/memfd: clean up mapping in mfd_fail_write ARM: Spectre-BHB: provide empty stub for non-config fuse: fix fileattr op failure fuse: fix pipe buffer lifetime for direct_io staging: rtl8723bs: Fix access-point mode deadlock staging: gdm724x: fix use after free in gdm_lte_rx() net: macb: Fix lost RX packet wakeup race in NAPI receive riscv: alternative only works on !XIP_KERNEL mmc: meson: Fix usage of meson_mmc_post_req() riscv: Fix auipc+jalr relocation range checks tracing/osnoise: Force quiescent states while tracing arm64: dts: marvell: armada-37xx: Remap IO space to bus address 0x0 arm64: Ensure execute-only permissions are not allowed without EPAN arm64: kasan: fix include error in MTE functions swiotlb: rework "fix info leak with DMA_FROM_DEVICE" KVM: x86/mmu: kvm_faultin_pfn has to return false if pfh is returned virtio: unexport virtio_finalize_features virtio: acknowledge all features before access net/mlx5: Fix offloading with ESWITCH_IPV4_TTL_MODIFY_ENABLE ARM: fix Thumb2 regression with Spectre BHB watch_queue: Fix filter limit check watch_queue, pipe: Free watchqueue state after clearing pipe ring watch_queue: Fix to release page in ->release() watch_queue: Fix to always request a pow-of-2 pipe ring size watch_queue: Fix the alloc bitmap size to reflect notes allocated watch_queue: Free the alloc bitmap when the watch_queue is torn down watch_queue: Fix lack of barrier/sync/lock between post and read watch_queue: Make comment about setting ->defunct more accurate x86/boot: Fix memremap of setup_indirect structures x86/boot: Add setup_indirect support in early_memremap_is_setup_data() x86/sgx: Free backing memory after faulting the enclave page x86/traps: Mark do_int3() NOKPROBE_SYMBOL drm/panel: Select DRM_DP_HELPER for DRM_PANEL_EDP btrfs: make send work with concurrent block group relocation drm/i915: Workaround broken BIOS DBUF configuration on TGL/RKL riscv: dts: k210: fix broken IRQs on hart1 block: drop unused includes in <linux/genhd.h> Revert "net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN" vhost: allow batching hint without size Linux 5.15.29 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I5c9c6006b90a8283a81fd5f7c79776e1a0cfb6b1	2022-03-18 07:55:37 +01:00
Chao Yu	8b5e471072	f2fs: initialize sbi->gc_mode explicitly It needs to initialized sbi->gc_mode to GC_NORMAL explicitly. Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-03-17 23:51:11 -07:00
Daeho Jeong	c3fa5f22ba	f2fs: introduce gc_urgent_mid mode We need a mid level of gc urgent mode to do GC forcibly in a period of given gc_urgent_sleep_time, but not like using greedy GC approach and switching to SSR mode such as gc urgent high mode. This can be used for more aggressive periodic storage clean up. Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-03-17 09:33:11 -07:00
Chao Yu	6736da13f0	f2fs: compress: fix to print raw data size in error path of lz4 decompression In lz4_decompress_pages(), if size of decompressed data is not equal to expected one, we should print the size rather than size of target buffer for decompressed data, fix it. Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-03-17 09:33:11 -07:00
Wang Xiaojun	4bd475852c	f2fs: remove redundant parameter judgment iput() has already judged the incoming parameter, so there is no need to repeat the judgment here. Signed-off-by: Wang Xiaojun <wangxiaojun11@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-03-17 09:33:11 -07:00
Greg Kroah-Hartman	16f06ae351	Merge 5.15.27 into android-5.15 Changes in 5.15.27 mac80211_hwsim: report NOACK frames in tx_status mac80211_hwsim: initialize ieee80211_tx_info at hw_scan_work i2c: bcm2835: Avoid clock stretching timeouts ASoC: rt5668: do not block workqueue if card is unbound ASoC: rt5682: do not block workqueue if card is unbound regulator: core: fix false positive in regulator_late_cleanup() Input: clear BTN_RIGHT/MIDDLE on buttonpads btrfs: get rid of warning on transaction commit when using flushoncommit KVM: arm64: vgic: Read HW interrupt pending state from the HW block: loop:use kstatfs.f_bsize of backing file to set discard granularity tipc: fix a bit overflow in tipc_crypto_key_rcv() cifs: do not use uninitialized data in the owner/group sid cifs: fix double free race when mount fails in cifs_get_root() HID: amd_sfh: Handle amd_sfh work buffer in PM ops HID: amd_sfh: Add functionality to clear interrupts HID: amd_sfh: Add interrupt handler to process interrupts cifs: modefromsids must add an ACE for authenticated users selftests/seccomp: Fix seccomp failure by adding missing headers drm/amd/pm: correct UMD pstate clocks for Dimgrey Cavefish and Beige Goby selftests/ftrace: Do not trace do_softirq because of PREEMPT_RT dmaengine: shdma: Fix runtime PM imbalance on error i2c: cadence: allow COMPILE_TEST i2c: imx: allow COMPILE_TEST i2c: qup: allow COMPILE_TEST net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990 block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern usb: gadget: don't release an existing dev->buf usb: gadget: clear related members when goto fail exfat: reuse exfat_inode_info variable instead of calling EXFAT_I() exfat: fix i_blocks for files truncated over 4 GiB tracing: Add test for user space strings when filtering on string pointers arm64: Mark start_backtrace() notrace and NOKPROBE_SYMBOL serial: stm32: prevent TDR register overwrite when sending x_char ext4: drop ineligible txn start stop APIs ext4: simplify updating of fast commit stats ext4: fast commit may not fallback for ineligible commit ext4: fast commit may miss file actions sched/fair: Fix fault in reweight_entity ata: pata_hpt37x: fix PCI clock detection drm/amdgpu: check vm ready by amdgpu_vm->evicting flag tracing: Add ustring operation to filtering string pointers ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report() NFSD: Have legacy NFSD WRITE decoders use xdr_stream_subsegment() NFSD: Fix zero-length NFSv3 WRITEs io_uring: fix no lock protection for ctx->cq_extra tools/resolve_btf_ids: Close ELF file on error mtd: spi-nor: Fix mtd size for s3an flashes MIPS: fix local_{add,sub}_return on MIPS64 signal: In get_signal test for signal_group_exit every time through the loop PCI: mediatek-gen3: Disable DVFSRC voltage request PCI: rcar: Check if device is runtime suspended instead of __clk_is_enabled() PCI: dwc: Do not remap invalid res PCI: aardvark: Fix checking for MEM resource type KVM: VMX: Don't unblock vCPU w/ Posted IRQ if IRQs are disabled in guest KVM: s390: Ensure kvm_arch_no_poll() is read once when blocking vCPU KVM: VMX: Read Posted Interrupt "control" exactly once per loop iteration KVM: X86: Ensure that dirty PDPTRs are loaded KVM: x86: Handle 32-bit wrap of EIP for EMULTYPE_SKIP with flat code seg KVM: x86: Exit to userspace if emulation prepared a completion callback i3c: fix incorrect address slot lookup on 64-bit i3c/master/mipi-i3c-hci: Fix a potentially infinite loop in 'hci_dat_v1_get_index()' tracing: Do not let synth_events block other dyn_event systems during create Input: ti_am335x_tsc - set ADCREFM for X configuration Input: ti_am335x_tsc - fix STEPCONFIG setup for Z2 PCI: mvebu: Check for errors from pci_bridge_emul_init() call PCI: mvebu: Do not modify PCI IO type bits in conf_write PCI: mvebu: Fix support for bus mastering and PCI_COMMAND on emulated bridge PCI: mvebu: Fix configuring secondary bus of PCIe Root Port via emulated bridge PCI: mvebu: Setup PCIe controller to Root Complex mode PCI: mvebu: Fix support for PCI_BRIDGE_CTL_BUS_RESET on emulated bridge PCI: mvebu: Fix support for PCI_EXP_DEVCTL on emulated bridge PCI: mvebu: Fix support for PCI_EXP_RTSTA on emulated bridge PCI: mvebu: Fix support for DEVCAP2, DEVCTL2 and LNKCTL2 registers on emulated bridge NFSD: Fix verifier returned in stable WRITEs Revert "nfsd: skip some unnecessary stats in the v4 case" nfsd: fix crash on COPY_NOTIFY with special stateid x86/hyperv: Properly deal with empty cpumasks in hyperv_flush_tlb_multi() drm/i915: don't call free_mmap_offset when purging SUNRPC: Fix sockaddr handling in the svc_xprt_create_error trace point SUNRPC: Fix sockaddr handling in svcsock_accept_class trace points drm/sun4i: dw-hdmi: Fix missing put_device() call in sun8i_hdmi_phy_get drm/atomic: Check new_crtc_state->active to determine if CRTC needs disable in self refresh mode ntb_hw_switchtec: Fix pff ioread to read into mmio_part_cfg_all ntb_hw_switchtec: Fix bug with more than 32 partitions drm/amdkfd: Check for null pointer after calling kmemdup drm/amdgpu: use spin_lock_irqsave to avoid deadlock by local interrupt i3c: master: dw: check return of dw_i3c_master_get_free_pos() dma-buf: cma_heap: Fix mutex locking section tracing/uprobes: Check the return value of kstrdup() for tu->filename tracing/probes: check the return value of kstrndup() for pbuf mm: defer kmemleak object creation of module_alloc() kasan: fix quarantine conflicting with init_on_free selftests/vm: make charge_reserved_hugetlb.sh work with existing cgroup setting hugetlbfs: fix off-by-one error in hugetlb_vmdelete_list() drm/amdgpu/display: Only set vblank_disable_immediate when PSR is not enabled drm/amdgpu: filter out radeon PCI device IDs drm/amdgpu: filter out radeon secondary ids as well drm/amd/display: Use adjusted DCN301 watermarks drm/amd/display: move FPU associated DSC code to DML folder ethtool: Fix link extended state for big endian octeontx2-af: Optimize KPU1 processing for variable-length headers octeontx2-af: Reset PTP config in FLR handler octeontx2-af: cn10k: RPM hardware timestamp configuration octeontx2-af: cn10k: Use appropriate register for LMAC enable octeontx2-af: Adjust LA pointer for cpt parse header octeontx2-af: Add KPU changes to parse NGIO as separate layer net/mlx5e: IPsec: Refactor checksum code in tx data path net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated traffic bpf: Use u64_stats_t in struct bpf_prog_stats bpf: Fix possible race in inc_misses_counter drm/amd/display: Update watermark values for DCN301 drm: mxsfb: Set fallback bus format when the bridge doesn't provide one drm: mxsfb: Fix NULL pointer dereference riscv/mm: Add XIP_FIXUP for phys_ram_base drm/i915/display: split out dpt out of intel_display.c drm/i915/display: Move DRRS code its own file drm/i915: Disable DRRS on IVB/HSW port != A gve: Recording rx queue before sending to napi net: dsa: ocelot: seville: utilize of_mdiobus_register net: dsa: seville: register the mdiobus under devres ibmvnic: don't release napi in __ibmvnic_open() of: net: move of_net under net/ net: ethernet: litex: Add the dependency on HAS_IOMEM drm/mediatek: mtk_dsi: Reset the dsi0 hardware cifs: protect session channel fields with chan_lock cifs: fix confusing unneeded warning message on smb2.1 and earlier drm/amd/display: Fix stream->link_enc unassigned during stream removal bnxt_en: Fix occasional ethtool -t loopback test failures drm/amd/display: For vblank_disable_immediate, check PSR is really used PCI: mvebu: Fix device enumeration regression net: of: fix stub of_net helpers for CONFIG_NET=n ALSA: intel_hdmi: Fix reference to PCM buffer address ucounts: Fix systemd LimitNPROC with private users regression riscv/efi_stub: Fix get_boot_hartid_from_fdt() return value riscv: Fix config KASAN && SPARSEMEM && !SPARSE_VMEMMAP riscv: Fix config KASAN && DEBUG_VIRTUAL iwlwifi: mvm: check debugfs_dir ptr before use ASoC: ops: Shift tested values in snd_soc_put_volsw() by +min iommu/vt-d: Fix double list_add when enabling VMD in scalable mode iommu/amd: Recover from event log overflow drm/i915: s/JSP2/ICP2/ PCH drm/amd/display: Reduce dmesg error to a debug print xen/netfront: destroy queues before real_num_tx_queues is zeroed thermal: core: Fix TZ_GET_TRIP NULL pointer dereference mac80211: fix EAPoL rekey fail in 802.3 rx path blktrace: fix use after free for struct blk_trace ntb: intel: fix port config status offset for SPR mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls xfrm: fix MTU regression netfilter: fix use-after-free in __nf_register_net_hook() bpf, sockmap: Do not ignore orig_len parameter xfrm: fix the if_id check in changelink xfrm: enforce validity of offload input flags e1000e: Correct NVM checksum verification flow net: fix up skbs delta_truesize in UDP GRO frag_list netfilter: nf_queue: don't assume sk is full socket netfilter: nf_queue: fix possible use-after-free netfilter: nf_queue: handle socket prefetch batman-adv: Request iflink once in batadv-on-batadv check batman-adv: Request iflink once in batadv_get_real_netdevice batman-adv: Don't expect inter-netns unique iflink indices net: ipv6: ensure we call ipv6_mc_down() at most once net: dcb: flush lingering app table entries for unregistered devices net: ipa: add an interconnect dependency net/smc: fix connection leak net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server btrfs: fix ENOSPC failure when attempting direct IO write into NOCOW range mac80211: fix forwarded mesh frames AC & queue selection net: stmmac: fix return value of __setup handler mac80211: treat some SAE auth steps as final iavf: Fix missing check for running netdev net: sxgbe: fix return value of __setup handler ibmvnic: register netdev after init of adapter net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe() ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc() iavf: Fix deadlock in iavf_reset_task efivars: Respect "block" flag in efivar_entry_set_safe() auxdisplay: lcd2s: Fix lcd2s_redefine_char() feature firmware: arm_scmi: Remove space in MODULE_ALIAS name ASoC: cs4265: Fix the duplicated control name auxdisplay: lcd2s: Fix memory leak in ->remove() auxdisplay: lcd2s: Use proper API to free the instance of charlcd object can: gs_usb: change active_channels's type from atomic_t to u8 iommu/tegra-smmu: Fix missing put_device() call in tegra_smmu_find arm64: dts: rockchip: Switch RK3399-Gru DP to SPDIF output igc: igc_read_phy_reg_gpy: drop premature return ARM: Fix kgdb breakpoint for Thumb2 mips: setup: fix setnocoherentio() boolean setting ARM: 9182/1: mmu: fix returns from early_param() and __setup() functions mptcp: Correctly set DATA_FIN timeout when number of retransmits is large selftests: mlxsw: tc_police_scale: Make test more robust pinctrl: sunxi: Use unique lockdep classes for IRQs igc: igc_write_phy_reg_gpy: drop premature return ibmvnic: free reset-work-item when flushing memfd: fix F_SEAL_WRITE after shmem huge page allocated s390/extable: fix exception table sorting sched: Fix yet more sched_fork() races arm64: dts: juno: Remove GICv2m dma-range iommu/amd: Fix I/O page table memory leak MIPS: ralink: mt7621: do memory detection on KSEG1 ARM: dts: switch timer config to common devkit8000 devicetree ARM: dts: Use 32KiHz oscillator on devkit8000 soc: fsl: guts: Revert commit `3c0d64e867` soc: fsl: guts: Add a missing memory allocation failure check soc: fsl: qe: Check of ioremap return value netfilter: nf_tables: prefer kfree_rcu(ptr, rcu) variant ARM: tegra: Move panels to AUX bus can: etas_es58x: change opened_channel_cnt's type from atomic_t to u8 net: stmmac: enhance XDP ZC driver level switching performance net: stmmac: only enable DMA interrupts when ready ibmvnic: initialize rc before completing wait ibmvnic: define flush_reset_queue helper ibmvnic: complete init_done on transport events net: chelsio: cxgb3: check the return value of pci_find_capability() net: sparx5: Fix add vlan when invalid operation iavf: Refactor iavf state machine tracking iavf: Add __IAVF_INIT_FAILED state iavf: Combine init and watchdog state machines iavf: Add trace while removing device iavf: Rework mutexes for better synchronisation iavf: Add helper function to go from pci_dev to adapter iavf: Fix kernel BUG in free_msi_irqs iavf: Add waiting so the port is initialized in remove iavf: Fix init state closure on remove iavf: Fix locking for VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS iavf: Fix race in init state iavf: Fix __IAVF_RESETTING state usage drm/i915/guc/slpc: Correct the param count for unset param drm/bridge: ti-sn65dsi86: Properly undo autosuspend e1000e: Fix possible HW unit hang after an s0ix exit MIPS: ralink: mt7621: use bitwise NOT instead of logical nl80211: Handle nla_memdup failures in handle_nan_filter drm/amdgpu: fix suspend/resume hang regression net: dcb: disable softirqs in dcbnl_flush_dev() selftests: mlxsw: resource_scale: Fix return value net: stmmac: perserve TX and RX coalesce value during XDP setup iavf: do not override the adapter state in the watchdog task (again) iavf: missing unlocks in iavf_watchdog_task() MAINTAINERS: adjust file entry for of_net.c after movement Input: elan_i2c - move regulator_[en\|dis]able() out of elan_[en\|dis]able_power() Input: elan_i2c - fix regulator enable count imbalance after suspend/resume Input: samsung-keypad - properly state IOMEM dependency HID: add mapping for KEY_DICTATE HID: add mapping for KEY_ALL_APPLICATIONS tracing/histogram: Fix sorting on old "cpu" value tracing: Fix return value of __setup handlers btrfs: fix lost prealloc extents beyond eof after full fsync btrfs: fix relocation crash due to premature return from btrfs_commit_transaction() btrfs: do not WARN_ON() if we have PageError set btrfs: qgroup: fix deadlock between rescan worker and remove qgroup btrfs: add missing run of delayed items after unlink during log replay btrfs: do not start relocation until in progress drops are done Revert "xfrm: xfrm_state_mtu should return at least 1280 for ipv6" proc: fix documentation and description of pagemap KVM: x86/mmu: Passing up the error state of mmu_alloc_shadow_roots() hamradio: fix macro redefine warning Linux 5.15.27 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ie338dd23e0eb61feb540b4256b5d1840fee4db84	2022-03-17 14:02:09 +01:00
Filipe Manana	a1ce40f8ae	btrfs: make send work with concurrent block group relocation commit d96b34248c2f4ea8cd09286090f2f6f77102eaab upstream. We don't allow send and balance/relocation to run in parallel in order to prevent send failing or silently producing some bad stream. This is because while send is using an extent (specially metadata) or about to read a metadata extent and expecting it belongs to a specific parent node, relocation can run, the transaction used for the relocation is committed and the extent gets reallocated while send is still using the extent, so it ends up with a different content than expected. This can result in just failing to read a metadata extent due to failure of the validation checks (parent transid, level, etc), failure to find a backreference for a data extent, and other unexpected failures. Besides reallocation, there's also a similar problem of an extent getting discarded when it's unpinned after the transaction used for block group relocation is committed. The restriction between balance and send was added in commit `9e967495e0` ("Btrfs: prevent send failures and crashes due to concurrent relocation"), kernel 5.3, while the more general restriction between send and relocation was added in commit `1cea5cf0e6` ("btrfs: ensure relocation never runs while we have send operations running"), kernel 5.14. Both send and relocation can be very long running operations. Relocation because it has to do a lot of IO and expensive backreference lookups in case there are many snapshots, and send due to read IO when operating on very large trees. This makes it inconvenient for users and tools to deal with scheduling both operations. For zoned filesystem we also have automatic block group relocation, so send can fail with -EAGAIN when users least expect it or send can end up delaying the block group relocation for too long. In the future we might also get the automatic block group relocation for non zoned filesystems. This change makes it possible for send and relocation to run in parallel. This is achieved the following way: 1) For all tree searches, send acquires a read lock on the commit root semaphore; 2) After each tree search, and before releasing the commit root semaphore, the leaf is cloned and placed in the search path (struct btrfs_path); 3) After releasing the commit root semaphore, the changed_cb() callback is invoked, which operates on the leaf and writes commands to the pipe (or file in case send/receive is not used with a pipe). It's important here to not hold a lock on the commit root semaphore, because if we did we could deadlock when sending and receiving to the same filesystem using a pipe - the send task blocks on the pipe because it's full, the receive task, which is the only consumer of the pipe, triggers a transaction commit when attempting to create a subvolume or reserve space for a write operation for example, but the transaction commit blocks trying to write lock the commit root semaphore, resulting in a deadlock; 4) Before moving to the next key, or advancing to the next change in case of an incremental send, check if a transaction used for relocation was committed (or is about to finish its commit). If so, release the search path(s) and restart the search, to where we were before, so that we don't operate on stale extent buffers. The search restarts are always possible because both the send and parent roots are RO, and no one can add, remove of update keys (change their offset) in RO trees - the only exception is deduplication, but that is still not allowed to run in parallel with send; 5) Periodically check if there is contention on the commit root semaphore, which means there is a transaction commit trying to write lock it, and release the semaphore and reschedule if there is contention, so as to avoid causing any significant delays to transaction commits. This leaves some room for optimizations for send to have less path releases and re searching the trees when there's relocation running, but for now it's kept simple as it performs quite well (on very large trees with resulting send streams in the order of a few hundred gigabytes). Test case btrfs/187, from fstests, stresses relocation, send and deduplication attempting to run in parallel, but without verifying if send succeeds and if it produces correct streams. A new test case will be added that exercises relocation happening in parallel with send and then checks that send succeeds and the resulting streams are correct. A final note is that for now this still leaves the mutual exclusion between send operations and deduplication on files belonging to a root used by send operations. A solution for that will be slightly more complex but it will eventually be built on top of this change. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-16 14:23:46 +01:00
David Howells	eb38c2e9fc	watch_queue: Fix lack of barrier/sync/lock between post and read commit 2ed147f015af2b48f41c6f0b6746aa9ea85c19f3 upstream. There's nothing to synchronise post_one_notification() versus pipe_read(). Whilst posting is done under pipe->rd_wait.lock, the reader only takes pipe->mutex which cannot bar notification posting as that may need to be made from contexts that cannot sleep. Fix this by setting pipe->head with a barrier in post_one_notification() and reading pipe->head with a barrier in pipe_read(). If that's not sufficient, the rd_wait.lock will need to be taken, possibly in a ->confirm() op so that it only applies to notifications. The lock would, however, have to be dropped before copy_page_to_iter() is invoked. Fixes: `c73be61ced` ("pipe: Add general notification queue support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-16 14:23:44 +01:00
David Howells	8275b6699c	watch_queue, pipe: Free watchqueue state after clearing pipe ring commit db8facfc9fafacefe8a835416a6b77c838088f8b upstream. In free_pipe_info(), free the watchqueue state after clearing the pipe ring as each pipe ring descriptor has a release function, and in the case of a notification message, this is watch_queue_pipe_buf_release() which tries to mark the allocation bitmap that was previously released. Fix this by moving the put of the pipe's ref on the watch queue to after the ring has been cleared. We still need to call watch_queue_clear() before doing that to make sure that the pipe is disconnected from any notification sources first. Fixes: `c73be61ced` ("pipe: Add general notification queue support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-16 14:23:44 +01:00
Miklos Szeredi	ca62747b38	fuse: fix pipe buffer lifetime for direct_io commit 0c4bcfdecb1ac0967619ee7ff44871d93c08c909 upstream. In FOPEN_DIRECT_IO mode, fuse_file_write_iter() calls fuse_direct_write_iter(), which normally calls fuse_direct_io(), which then imports the write buffer with fuse_get_user_pages(), which uses iov_iter_get_pages() to grab references to userspace pages instead of actually copying memory. On the filesystem device side, these pages can then either be read to userspace (via fuse_dev_read()), or splice()d over into a pipe using fuse_dev_splice_read() as pipe buffers with &nosteal_pipe_buf_ops. This is wrong because after fuse_dev_do_read() unlocks the FUSE request, the userspace filesystem can mark the request as completed, causing write() to return. At that point, the userspace filesystem should no longer have access to the pipe buffer. Fix by copying pages coming from the user address space to new pipe buffers. Reported-by: Jann Horn <jannh@google.com> Fixes: `c3021629a0` ("fuse: support splice() reading from fuse device") Cc: <stable@vger.kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-16 14:23:42 +01:00
Miklos Szeredi	d60d34b4d6	fuse: fix fileattr op failure commit a679a61520d8a7b0211a1da990404daf5cc80b72 upstream. The fileattr API conversion broke lsattr on ntfs3g. Previously the ioctl(... FS_IOC_GETFLAGS) returned an EINVAL error, but after the conversion the error returned by the fuse filesystem was not propagated back to the ioctl() system call, resulting in success being returned with bogus values. Fix by checking for outarg.result in fuse_priv_ioctl(), just as generic ioctl code does. Reported-by: Jean-Pierre André <jean-pierre.andre@wanadoo.fr> Fixes: `72227eac17` ("fuse: convert to fileattr") Cc: <stable@vger.kernel.org> # v5.13 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-16 14:23:42 +01:00
Greg Kroah-Hartman	26481b5161	Merge 5.15.26 into android13-5.15 Changes in 5.15.26 mm/filemap: Fix handling of THPs in generic_file_buffered_read() cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug cgroup-v1: Correct privileges check in release_agent writes x86/ptrace: Fix xfpregs_set()'s incorrect xmm clearing btrfs: tree-checker: check item_size for inode_item btrfs: tree-checker: check item_size for dev_item clk: jz4725b: fix mmc0 clock gating io_uring: don't convert to jiffies for waiting on timeouts io_uring: disallow modification of rsrc_data during quiesce selinux: fix misuse of mutex_is_locked() vhost/vsock: don't check owner in vhost_vsock_stop() while releasing parisc/unaligned: Fix fldd and fstd unaligned handlers on 32-bit kernel parisc/unaligned: Fix ldw() and stw() unalignment handlers KVM: x86/mmu: make apf token non-zero to fix bug drm/amd/display: Protect update_bw_bounding_box FPU code. drm/amd/pm: fix some OEM SKU specific stability issues drm/amd: Check if ASPM is enabled from PCIe subsystem drm/amdgpu: disable MMHUB PG for Picasso drm/amdgpu: do not enable asic reset for raven2 drm/i915: Widen the QGV point mask drm/i915: Correctly populate use_sagv_wm for all pipes drm/i915: Fix bw atomic check when switching between SAGV vs. no SAGV sr9700: sanity check for packet length USB: zaurus: support another broken Zaurus CDC-NCM: avoid overflow in sanity checking netfilter: xt_socket: fix a typo in socket_mt_destroy() netfilter: xt_socket: missing ifdef CONFIG_IP6_NF_IPTABLES dependency netfilter: nf_tables_offload: incorrect flow offload action array size tee: export teedev_open() and teedev_close_context() optee: use driver internal tee_context for some rpc ping: remove pr_err from ping_lookup Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC" gpu: host1x: Always return syncpoint value when waiting perf evlist: Fix failed to use cpu list for uncore events perf data: Fix double free in perf_session__delete() mptcp: fix race in incoming ADD_ADDR option processing mptcp: add mibs counter for ignored incoming options selftests: mptcp: fix diag instability selftests: mptcp: be more conservative with cookie MPJ limits bnx2x: fix driver load from initrd bnxt_en: Fix active FEC reporting to ethtool bnxt_en: Fix offline ethtool selftest with RDMA enabled bnxt_en: Fix incorrect multicast rx mask setting when not requested hwmon: Handle failure to register sensor with thermal zone correctly net/mlx5: Fix tc max supported prio for nic mode ice: check the return of ice_ptp_gettimex64 ice: initialize local variable 'tlv' net/mlx5: Update the list of the PCI supported devices bpf: Fix crash due to incorrect copy_map_value bpf: Do not try bpf_msg_push_data with len 0 selftests: bpf: Check bpf_msg_push_data return value bpf: Fix a bpf_timer initialization issue bpf: Add schedule points in batch ops io_uring: add a schedule point in io_add_buffers() net: __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor friends nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info tipc: Fix end of loop tests for list_for_each_entry() gso: do not skip outer ip header in case of ipip and net_failover net: mv643xx_eth: process retval from of_get_mac_address openvswitch: Fix setting ipv6 fields causing hw csum failure drm/edid: Always set RGB444 net/mlx5e: Fix wrong return value on ioctl EEPROM query failure drm/vc4: crtc: Fix runtime_pm reference counting drm/i915/dg2: Print PHY name properly on calibration error net/sched: act_ct: Fix flow table lookup after ct clear or switching zones net: ll_temac: check the return value of devm_kmalloc() net: Force inlining of checksum functions in net/checksum.h netfilter: nf_tables: unregister flowtable hooks on netns exit nfp: flower: Fix a potential leak in nfp_tunnel_add_shared_mac() net: mdio-ipq4019: add delay after clock enable netfilter: nf_tables: fix memory leak during stateful obj update net/smc: Use a mutex for locking "struct smc_pnettable" surface: surface3_power: Fix battery readings on batteries without a serial number udp_tunnel: Fix end of loop test in udp_tunnel_nic_unregister() net/mlx5: DR, Cache STE shadow memory ibmvnic: schedule failover only if vioctl fails net/mlx5: DR, Don't allow match on IP w/o matching on full ethertype/ip_version net/mlx5: Fix possible deadlock on rule deletion net/mlx5: Fix wrong limitation of metadata match on ecpf net/mlx5: DR, Fix the threshold that defines when pool sync is initiated net/mlx5e: MPLSoUDP decap, fix check for unsupported matches net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets net/mlx5: Update log_max_qp value to be 17 at most spi: spi-zynq-qspi: Fix a NULL pointer dereference in zynq_qspi_exec_mem_op() gpio: rockchip: Reset int_bothedge when changing trigger regmap-irq: Update interrupt clear register for proper reset net-timestamp: convert sk->sk_tskey to atomic_t RDMA/rtrs-clt: Fix possible double free in error case RDMA/rtrs-clt: Move free_permit from free_clt to rtrs_clt_close bnxt_en: Increase firmware message response DMA wait time configfs: fix a race in configfs_{,un}register_subsystem() RDMA/ib_srp: Fix a deadlock tracing: Dump stacktrace trigger to the corresponding instance tracing: Have traceon and traceoff trigger honor the instance iio:imu:adis16480: fix buffering for devices with no burst mode iio: adc: men_z188_adc: Fix a resource leak in an error handling path iio: adc: tsc2046: fix memory corruption by preventing array overflow iio: adc: ad7124: fix mask used for setting AIN_BUFP & AIN_BUFM bits iio: accel: fxls8962af: add padding to regmap for SPI iio: imu: st_lsm6dsx: wait for settling time in st_lsm6dsx_read_oneshot iio: Fix error handling for PM sc16is7xx: Fix for incorrect data being transmitted ata: pata_hpt37x: disable primary channel on HPT371 Revert "USB: serial: ch341: add new Product ID for CH341A" usb: gadget: rndis: add spinlock for rndis response list USB: gadget: validate endpoint index for xilinx udc tracefs: Set the group ownership in apply_options() not parse_options() USB: serial: option: add support for DW5829e USB: serial: option: add Telit LE910R1 compositions usb: dwc2: drd: fix soft connect when gadget is unconfigured usb: dwc3: pci: Add "snps,dis_u2_susphy_quirk" for Intel Bay Trail usb: dwc3: pci: Fix Bay Trail phy GPIO mappings usb: dwc3: gadget: Let the interrupt handler disable bottom halves. xhci: re-initialize the HC during resume if HCE was set xhci: Prevent futile URB re-submissions due to incorrect return value. nvmem: core: Fix a conflict between MTD and NVMEM on wp-gpios property mtd: core: Fix a conflict between MTD and NVMEM on wp-gpios property driver core: Free DMA range map when device is released btrfs: prevent copying too big compressed lzo segment RDMA/cma: Do not change route.addr.src_addr outside state checks thermal: int340x: fix memory leak in int3400_notify() staging: fbtft: fb_st7789v: reset display before initialization tps6598x: clear int mask on probe failure IB/qib: Fix duplicate sysfs directory name riscv: fix nommu_k210_sdcard_defconfig riscv: fix oops caused by irqsoff latency tracer tty: n_gsm: fix encoding of control signal octet bit DV tty: n_gsm: fix proper link termination after failed open tty: n_gsm: fix NULL pointer access due to DLCI release tty: n_gsm: fix wrong tty control line for flow control tty: n_gsm: fix wrong modem processing in convergence layer type 2 tty: n_gsm: fix deadlock in gsmtty_open() pinctrl: fix loop in k210_pinconf_get_drive() pinctrl: k210: Fix bias-pull-up gpio: tegra186: Fix chip_data type confusion memblock: use kfree() to release kmalloced memblock regions ice: Fix race conditions between virtchnl handling and VF ndo ops ice: fix concurrent reset and removal of VFs Linux 5.15.26 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ied0cc9bd48b7af71a064107676f37b0dd39ce3cf	2022-03-16 12:53:52 +01:00
Jaegeuk Kim	66947ef483	f2fs: use spin_lock to avoid hang [14696.634553] task:cat state:D stack: 0 pid:1613738 ppid:1613735 flags:0x00000004 [14696.638285] Call Trace: [14696.639038] <TASK> [14696.640032] __schedule+0x302/0x930 [14696.640969] schedule+0x58/0xd0 [14696.641799] schedule_preempt_disabled+0x18/0x30 [14696.642890] __mutex_lock.constprop.0+0x2fb/0x4f0 [14696.644035] ? mod_objcg_state+0x10c/0x310 [14696.645040] ? obj_cgroup_charge+0xe1/0x170 [14696.646067] __mutex_lock_slowpath+0x13/0x20 [14696.647126] mutex_lock+0x34/0x40 [14696.648070] stat_show+0x25/0x17c0 [f2fs] [14696.649218] seq_read_iter+0x120/0x4b0 [14696.650289] ? aa_file_perm+0x12a/0x500 [14696.651357] ? lru_cache_add+0x1c/0x20 [14696.652470] seq_read+0xfd/0x140 [14696.653445] full_proxy_read+0x5c/0x80 [14696.654535] vfs_read+0xa0/0x1a0 [14696.655497] ksys_read+0x67/0xe0 [14696.656502] __x64_sys_read+0x1a/0x20 [14696.657580] do_syscall_64+0x3b/0xc0 [14696.658671] entry_SYSCALL_64_after_hwframe+0x44/0xae [14696.660068] RIP: 0033:0x7efe39df1cb2 [14696.661133] RSP: 002b:00007ffc8badd948 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [14696.662958] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007efe39df1cb2 [14696.664757] RDX: 0000000000020000 RSI: 00007efe399df000 RDI: 0000000000000003 [14696.666542] RBP: 00007efe399df000 R08: 00007efe399de010 R09: 00007efe399de010 [14696.668363] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000000 [14696.670155] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 [14696.671965] </TASK> [14696.672826] task:umount state:D stack: 0 pid:1614985 ppid:1614984 flags:0x00004000 [14696.674930] Call Trace: [14696.675903] <TASK> [14696.676780] __schedule+0x302/0x930 [14696.677927] schedule+0x58/0xd0 [14696.679019] schedule_preempt_disabled+0x18/0x30 [14696.680412] __mutex_lock.constprop.0+0x2fb/0x4f0 [14696.681783] ? destroy_inode+0x65/0x80 [14696.683006] __mutex_lock_slowpath+0x13/0x20 [14696.684305] mutex_lock+0x34/0x40 [14696.685442] f2fs_destroy_stats+0x1e/0x60 [f2fs] [14696.686803] f2fs_put_super+0x158/0x390 [f2fs] [14696.688238] generic_shutdown_super+0x7a/0x120 [14696.689621] kill_block_super+0x27/0x50 [14696.690894] kill_f2fs_super+0x7f/0x100 [f2fs] [14696.692311] deactivate_locked_super+0x35/0xa0 [14696.693698] deactivate_super+0x40/0x50 [14696.694985] cleanup_mnt+0x139/0x190 [14696.696209] __cleanup_mnt+0x12/0x20 [14696.697390] task_work_run+0x64/0xa0 [14696.698587] exit_to_user_mode_prepare+0x1b7/0x1c0 [14696.700053] syscall_exit_to_user_mode+0x27/0x50 [14696.701418] do_syscall_64+0x48/0xc0 [14696.702630] entry_SYSCALL_64_after_hwframe+0x44/0xae Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-03-09 19:03:32 -08:00
Jaegeuk Kim	211cd7f9cd	f2fs: don't get FREEZE lock in f2fs_evict_inode in frozen fs Let's purge inode cache in order to avoid the below deadlock. [freeze test] shrinkder freeze_super - pwercpu_down_write(SB_FREEZE_FS) - super_cache_scan - down_read(&sb->s_umount) - prune_icache_sb - dispose_list - evict - f2fs_evict_inode thaw_super - down_write(&sb->s_umount); - __percpu_down_read(SB_FREEZE_FS) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-03-09 19:03:32 -08:00
Jia Yang	0c2f7a24da	f2fs: remove unnecessary read for F2FS_FITS_IN_INODE F2FS_FITS_IN_INODE only cares the type of f2fs inode, so there is no need to read node page of f2fs inode. Signed-off-by: Jia Yang <jiayang5@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-03-09 19:03:32 -08:00
Tadeusz Struk	097c689d48	Revert "ANDROID: incremental-fs: fix mount_fs issue" This reverts three increment-fs commits: `d5faa13b59` `10412e10c6` `7ad88c9349` This is to fix the incrementalinstall test. Can now install the same apk twice, and repeated installs are stable. Bug: 217661925 Bug: 219731048 Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Change-Id: Ia8488d728218881ed17e4d68cab21b0b152e3ca4	2022-03-08 17:30:03 -08:00
Yun Zhou	416e3a0e42	proc: fix documentation and description of pagemap commit dd21bfa425c098b95ca86845f8e7d1ec1ddf6e4a upstream. Since bit 57 was exported for uffd-wp write-protected (commit fb8e37f35a2f: "mm/pagemap: export uffd-wp protection information"), fixing it can reduce some unnecessary confusion. Link: https://lkml.kernel.org/r/20220301044538.3042713-1-yun.zhou@windriver.com Fixes: `fb8e37f35a` ("mm/pagemap: export uffd-wp protection information") Signed-off-by: Yun Zhou <yun.zhou@windriver.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Tiberiu A Georgescu <tiberiu.georgescu@nutanix.com> Cc: Florian Schmidt <florian.schmidt@nutanix.com> Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com> Cc: SeongJae Park <sj@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Colin Cross <ccross@google.com> Cc: Alistair Popple <apopple@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-08 19:12:54 +01:00
Josef Bacik	6599d5e8bd	btrfs: do not start relocation until in progress drops are done commit b4be6aefa73c9a6899ef3ba9c5faaa8a66e333ef upstream. We hit a bug with a recovering relocation on mount for one of our file systems in production. I reproduced this locally by injecting errors into snapshot delete with balance running at the same time. This presented as an error while looking up an extent item WARNING: CPU: 5 PID: 1501 at fs/btrfs/extent-tree.c:866 lookup_inline_extent_backref+0x647/0x680 CPU: 5 PID: 1501 Comm: btrfs-balance Not tainted 5.16.0-rc8+ #8 RIP: 0010:lookup_inline_extent_backref+0x647/0x680 RSP: 0018:ffffae0a023ab960 EFLAGS: 00010202 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000 RBP: ffff943fd2a39b60 R08: 0000000000000000 R09: 0000000000000001 R10: 0001434088152de0 R11: 0000000000000000 R12: 0000000001d05000 R13: ffff943fd2a39b60 R14: ffff943fdb96f2a0 R15: ffff9442fc923000 FS: 0000000000000000(0000) GS:ffff944e9eb40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1157b1fca8 CR3: 000000010f092000 CR4: 0000000000350ee0 Call Trace: <TASK> insert_inline_extent_backref+0x46/0xd0 __btrfs_inc_extent_ref.isra.0+0x5f/0x200 ? btrfs_merge_delayed_refs+0x164/0x190 __btrfs_run_delayed_refs+0x561/0xfa0 ? btrfs_search_slot+0x7b4/0xb30 ? btrfs_update_root+0x1a9/0x2c0 btrfs_run_delayed_refs+0x73/0x1f0 ? btrfs_update_root+0x1a9/0x2c0 btrfs_commit_transaction+0x50/0xa50 ? btrfs_update_reloc_root+0x122/0x220 prepare_to_merge+0x29f/0x320 relocate_block_group+0x2b8/0x550 btrfs_relocate_block_group+0x1a6/0x350 btrfs_relocate_chunk+0x27/0xe0 btrfs_balance+0x777/0xe60 balance_kthread+0x35/0x50 ? btrfs_balance+0xe60/0xe60 kthread+0x16b/0x190 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 </TASK> Normally snapshot deletion and relocation are excluded from running at the same time by the fs_info->cleaner_mutex. However if we had a pending balance waiting to get the ->cleaner_mutex, and a snapshot deletion was running, and then the box crashed, we would come up in a state where we have a half deleted snapshot. Again, in the normal case the snapshot deletion needs to complete before relocation can start, but in this case relocation could very well start before the snapshot deletion completes, as we simply add the root to the dead roots list and wait for the next time the cleaner runs to clean up the snapshot. Fix this by setting a bit on the fs_info if we have any DEAD_ROOT's that had a pending drop_progress key. If they do then we know we were in the middle of the drop operation and set a flag on the fs_info. Then balance can wait until this flag is cleared to start up again. If there are DEAD_ROOT's that don't have a drop_progress set then we're safe to start balance right away as we'll be properly protected by the cleaner_mutex. CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-08 19:12:54 +01:00
Filipe Manana	4aef4c9005	btrfs: add missing run of delayed items after unlink during log replay commit 4751dc99627e4d1465c5bfa8cb7ab31ed418eff5 upstream. During log replay, whenever we need to check if a name (dentry) exists in a directory we do searches on the subvolume tree for inode references or or directory entries (BTRFS_DIR_INDEX_KEY keys, and BTRFS_DIR_ITEM_KEY keys as well, before kernel 5.17). However when during log replay we unlink a name, through btrfs_unlink_inode(), we may not delete inode references and dir index keys from a subvolume tree and instead just add the deletions to the delayed inode's delayed items, which will only be run when we commit the transaction used for log replay. This means that after an unlink operation during log replay, if we attempt to search for the same name during log replay, we will not see that the name was already deleted, since the deletion is recorded only on the delayed items. We run delayed items after every unlink operation during log replay, except at unlink_old_inode_refs() and at add_inode_ref(). This was due to an overlook, as delayed items should be run after evert unlink, for the reasons stated above. So fix those two cases. Fixes: `0d836392ca` ("Btrfs: fix mount failure after fsync due to hard link recreation") Fixes: `1f250e929a` ("Btrfs: fix log replay failure after unlink and link combination") CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-08 19:12:54 +01:00
Sidong Yang	34146bbadc	btrfs: qgroup: fix deadlock between rescan worker and remove qgroup commit d4aef1e122d8bbdc15ce3bd0bc813d6b44a7d63a upstream. The commit e804861bd4e6 ("btrfs: fix deadlock between quota disable and qgroup rescan worker") by Kawasaki resolves deadlock between quota disable and qgroup rescan worker. But also there is a deadlock case like it. It's about enabling or disabling quota and creating or removing qgroup. It can be reproduced in simple script below. for i in {1..100} do btrfs quota enable /mnt & btrfs qgroup create 1/0 /mnt & btrfs qgroup destroy 1/0 /mnt & btrfs quota disable /mnt & done Here's why the deadlock happens: 1) The quota rescan task is running. 2) Task A calls btrfs_quota_disable(), locks the qgroup_ioctl_lock mutex, and then calls btrfs_qgroup_wait_for_completion(), to wait for the quota rescan task to complete. 3) Task B calls btrfs_remove_qgroup() and it blocks when trying to lock the qgroup_ioctl_lock mutex, because it's being held by task A. At that point task B is holding a transaction handle for the current transaction. 4) The quota rescan task calls btrfs_commit_transaction(). This results in it waiting for all other tasks to release their handles on the transaction, but task B is blocked on the qgroup_ioctl_lock mutex while holding a handle on the transaction, and that mutex is being held by task A, which is waiting for the quota rescan task to complete, resulting in a deadlock between these 3 tasks. To resolve this issue, the thread disabling quota should unlock qgroup_ioctl_lock before waiting rescan completion. Move btrfs_qgroup_wait_for_completion() after unlock of qgroup_ioctl_lock. Fixes: e804861bd4e6 ("btrfs: fix deadlock between quota disable and qgroup rescan worker") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Sidong Yang <realwakka@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-08 19:12:54 +01:00
Josef Bacik	e00077aa43	btrfs: do not WARN_ON() if we have PageError set commit a50e1fcbc9b85fd4e95b89a75c0884cb032a3e06 upstream. Whenever we do any extent buffer operations we call assert_eb_page_uptodate() to complain loudly if we're operating on an non-uptodate page. Our overnight tests caught this warning earlier this week WARNING: CPU: 1 PID: 553508 at fs/btrfs/extent_io.c:6849 assert_eb_page_uptodate+0x3f/0x50 CPU: 1 PID: 553508 Comm: kworker/u4:13 Tainted: G W 5.17.0-rc3+ #564 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Workqueue: btrfs-cache btrfs_work_helper RIP: 0010:assert_eb_page_uptodate+0x3f/0x50 RSP: 0018:ffffa961440a7c68 EFLAGS: 00010246 RAX: 0017ffffc0002112 RBX: ffffe6e74453f9c0 RCX: 0000000000001000 RDX: ffffe6e74467c887 RSI: ffffe6e74453f9c0 RDI: ffff8d4c5efc2fc0 RBP: 0000000000000d56 R08: ffff8d4d4a224000 R09: 0000000000000000 R10: 00015817fa9d1ef0 R11: 000000000000000c R12: 00000000000007b1 R13: ffff8d4c5efc2fc0 R14: 0000000001500000 R15: 0000000001cb1000 FS: 0000000000000000(0000) GS:ffff8d4dbbd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff31d3448d8 CR3: 0000000118be8004 CR4: 0000000000370ee0 Call Trace: extent_buffer_test_bit+0x3f/0x70 free_space_test_bit+0xa6/0xc0 load_free_space_tree+0x1f6/0x470 caching_thread+0x454/0x630 ? rcu_read_lock_sched_held+0x12/0x60 ? rcu_read_lock_sched_held+0x12/0x60 ? rcu_read_lock_sched_held+0x12/0x60 ? lock_release+0x1f0/0x2d0 btrfs_work_helper+0xf2/0x3e0 ? lock_release+0x1f0/0x2d0 ? finish_task_switch.isra.0+0xf9/0x3a0 process_one_work+0x26d/0x580 ? process_one_work+0x580/0x580 worker_thread+0x55/0x3b0 ? process_one_work+0x580/0x580 kthread+0xf0/0x120 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 This was partially fixed by c2e39305299f01 ("btrfs: clear extent buffer uptodate when we fail to write it"), however all that fix did was keep us from finding extent buffers after a failed writeout. It didn't keep us from continuing to use a buffer that we already had found. In this case we're searching the commit root to cache the block group, so we can start committing the transaction and switch the commit root and then start writing. After the switch we can look up an extent buffer that hasn't been written yet and start processing that block group. Then we fail to write that block out and clear Uptodate on the page, and then we start spewing these errors. Normally we're protected by the tree lock to a certain degree here. If we read a block we have that block read locked, and we block the writer from locking the block before we submit it for the write. However this isn't necessarily fool proof because the read could happen before we do the submit_bio and after we locked and unlocked the extent buffer. Also in this particular case we have path->skip_locking set, so that won't save us here. We'll simply get a block that was valid when we read it, but became invalid while we were using it. What we really want is to catch the case where we've "read" a block but it's not marked Uptodate. On read we ClearPageError(), so if we're !Uptodate and !Error we know we didn't do the right thing for reading the page. Fix this by checking !Uptodate && !Error, this way we will not complain if our buffer gets invalidated while we're using it, and we'll maintain the spirit of the check which is to make sure we have a fully in-cache block while we're messing with it. CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-08 19:12:54 +01:00
Omar Sandoval	725a6ac389	btrfs: fix relocation crash due to premature return from btrfs_commit_transaction() commit 5fd76bf31ccfecc06e2e6b29f8c809e934085b99 upstream. We are seeing crashes similar to the following trace: [38.969182] WARNING: CPU: 20 PID: 2105 at fs/btrfs/relocation.c:4070 btrfs_relocate_block_group+0x2dc/0x340 [btrfs] [38.973556] CPU: 20 PID: 2105 Comm: btrfs Not tainted 5.17.0-rc4 #54 [38.974580] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [38.976539] RIP: 0010:btrfs_relocate_block_group+0x2dc/0x340 [btrfs] [38.980336] RSP: 0000:ffffb0dd42e03c20 EFLAGS: 00010206 [38.981218] RAX: ffff96cfc4ede800 RBX: ffff96cfc3ce0000 RCX: 000000000002ca14 [38.982560] RDX: 0000000000000000 RSI: 4cfd109a0bcb5d7f RDI: ffff96cfc3ce0360 [38.983619] RBP: ffff96cfc309c000 R08: 0000000000000000 R09: 0000000000000000 [38.984678] R10: ffff96cec0000001 R11: ffffe84c80000000 R12: ffff96cfc4ede800 [38.985735] R13: 0000000000000000 R14: 0000000000000000 R15: ffff96cfc3ce0360 [38.987146] FS: 00007f11c15218c0(0000) GS:ffff96d6dfb00000(0000) knlGS:0000000000000000 [38.988662] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [38.989398] CR2: 00007ffc922c8e60 CR3: 00000001147a6001 CR4: 0000000000370ee0 [38.990279] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [38.991219] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [38.992528] Call Trace: [38.992854] <TASK> [38.993148] btrfs_relocate_chunk+0x27/0xe0 [btrfs] [38.993941] btrfs_balance+0x78e/0xea0 [btrfs] [38.994801] ? vsnprintf+0x33c/0x520 [38.995368] ? __kmalloc_track_caller+0x351/0x440 [38.996198] btrfs_ioctl_balance+0x2b9/0x3a0 [btrfs] [38.997084] btrfs_ioctl+0x11b0/0x2da0 [btrfs] [38.997867] ? mod_objcg_state+0xee/0x340 [38.998552] ? seq_release+0x24/0x30 [38.999184] ? proc_nr_files+0x30/0x30 [38.999654] ? call_rcu+0xc8/0x2f0 [39.000228] ? __x64_sys_ioctl+0x84/0xc0 [39.000872] ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs] [39.001973] __x64_sys_ioctl+0x84/0xc0 [39.002566] do_syscall_64+0x3a/0x80 [39.003011] entry_SYSCALL_64_after_hwframe+0x44/0xae [39.003735] RIP: 0033:0x7f11c166959b [39.007324] RSP: 002b:00007fff2543e998 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [39.008521] RAX: ffffffffffffffda RBX: 00007f11c1521698 RCX: 00007f11c166959b [39.009833] RDX: 00007fff2543ea40 RSI: 00000000c4009420 RDI: 0000000000000003 [39.011270] RBP: 0000000000000003 R08: 0000000000000013 R09: 00007f11c16f94e0 [39.012581] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff25440df3 [39.014046] R13: 0000000000000000 R14: 00007fff2543ea40 R15: 0000000000000001 [39.015040] </TASK> [39.015418] ---[ end trace 0000000000000000 ]--- [43.131559] ------------[ cut here ]------------ [43.132234] kernel BUG at fs/btrfs/extent-tree.c:2717! [43.133031] invalid opcode: 0000 [#1] PREEMPT SMP PTI [43.133702] CPU: 1 PID: 1839 Comm: btrfs Tainted: G W 5.17.0-rc4 #54 [43.134863] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [43.136426] RIP: 0010:unpin_extent_range+0x37a/0x4f0 [btrfs] [43.139913] RSP: 0000:ffffb0dd4216bc70 EFLAGS: 00010246 [43.140629] RAX: 0000000000000000 RBX: ffff96cfc34490f8 RCX: 0000000000000001 [43.141604] RDX: 0000000080000001 RSI: 0000000051d00000 RDI: 00000000ffffffff [43.142645] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff96cfd07dca50 [43.143669] R10: ffff96cfc46e8a00 R11: fffffffffffec000 R12: 0000000041d00000 [43.144657] R13: ffff96cfc3ce0000 R14: ffffb0dd4216bd08 R15: 0000000000000000 [43.145686] FS: 00007f7657dd68c0(0000) GS:ffff96d6df640000(0000) knlGS:0000000000000000 [43.146808] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [43.147584] CR2: 00007f7fe81bf5b0 CR3: 00000001093ee004 CR4: 0000000000370ee0 [43.148589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [43.149581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [43.150559] Call Trace: [43.150904] <TASK> [43.151253] btrfs_finish_extent_commit+0x88/0x290 [btrfs] [43.152127] btrfs_commit_transaction+0x74f/0xaa0 [btrfs] [43.152932] ? btrfs_attach_transaction_barrier+0x1e/0x50 [btrfs] [43.153786] btrfs_ioctl+0x1edc/0x2da0 [btrfs] [43.154475] ? __check_object_size+0x150/0x170 [43.155170] ? preempt_count_add+0x49/0xa0 [43.155753] ? __x64_sys_ioctl+0x84/0xc0 [43.156437] ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs] [43.157456] __x64_sys_ioctl+0x84/0xc0 [43.157980] do_syscall_64+0x3a/0x80 [43.158543] entry_SYSCALL_64_after_hwframe+0x44/0xae [43.159231] RIP: 0033:0x7f7657f1e59b [43.161819] RSP: 002b:00007ffda5cd1658 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [43.162702] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f7657f1e59b [43.163526] RDX: 0000000000000000 RSI: 0000000000009408 RDI: 0000000000000003 [43.164358] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 [43.165208] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [43.166029] R13: 00005621b91c3232 R14: 00005621b91ba580 R15: 00007ffda5cd1800 [43.166863] </TASK> [43.167125] Modules linked in: btrfs blake2b_generic xor pata_acpi ata_piix libata raid6_pq scsi_mod libcrc32c virtio_net virtio_rng net_failover rng_core failover scsi_common [43.169552] ---[ end trace 0000000000000000 ]--- [43.171226] RIP: 0010:unpin_extent_range+0x37a/0x4f0 [btrfs] [43.174767] RSP: 0000:ffffb0dd4216bc70 EFLAGS: 00010246 [43.175600] RAX: 0000000000000000 RBX: ffff96cfc34490f8 RCX: 0000000000000001 [43.176468] RDX: 0000000080000001 RSI: 0000000051d00000 RDI: 00000000ffffffff [43.177357] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff96cfd07dca50 [43.178271] R10: ffff96cfc46e8a00 R11: fffffffffffec000 R12: 0000000041d00000 [43.179178] R13: ffff96cfc3ce0000 R14: ffffb0dd4216bd08 R15: 0000000000000000 [43.180071] FS: 00007f7657dd68c0(0000) GS:ffff96d6df800000(0000) knlGS:0000000000000000 [43.181073] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [43.181808] CR2: 00007fe09905f010 CR3: 00000001093ee004 CR4: 0000000000370ee0 [43.182706] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [43.183591] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 We first hit the WARN_ON(rc->block_group->pinned > 0) in btrfs_relocate_block_group() and then the BUG_ON(!cache) in unpin_extent_range(). This tells us that we are exiting relocation and removing the block group with bytes still pinned for that block group. This is supposed to be impossible: the last thing relocate_block_group() does is commit the transaction to get rid of pinned extents. Commit `d0c2f4fa55` ("btrfs: make concurrent fsyncs wait less when waiting for a transaction commit") introduced an optimization so that commits from fsync don't have to wait for the previous commit to unpin extents. This was only intended to affect fsync, but it inadvertently made it possible for any commit to skip waiting for the previous commit to unpin. This is because if a call to btrfs_commit_transaction() finds that another thread is already committing the transaction, it waits for the other thread to complete the commit and then returns. If that other thread was in fsync, then it completes the commit without completing the previous commit. This makes the following sequence of events possible: Thread 1____________________\|Thread 2 (fsync)_____________________\|Thread 3 (balance)___________________ btrfs_commit_transaction(N) \| \| btrfs_run_delayed_refs \| \| pin extents \| \| ... \| \| state = UNBLOCKED \|btrfs_sync_file \| \| btrfs_start_transaction(N + 1) \|relocate_block_group \| \| btrfs_join_transaction(N + 1) \| btrfs_commit_transaction(N + 1) \| ... \| trans->state = COMMIT_START \| \| \| btrfs_commit_transaction(N + 1) \| \| wait_for_commit(N + 1, COMPLETED) \| wait_for_commit(N, SUPER_COMMITTED)\| state = SUPER_COMMITTED \| ... \| btrfs_finish_extent_commit\| \| unpin_extent_range() \| trans->state = COMPLETED \| \| \| return \| \| ... \| \|Thread 1 isn't done, so pinned > 0 \| \|and we WARN \| \| \| \|btrfs_remove_block_group unpin_extent_range() \| \| Thread 3 removed the \| \| block group, so we BUG\| \| There are other sequences involving SUPER_COMMITTED transactions that can cause a similar outcome. We could fix this by making relocation explicitly wait for unpinning, but there may be other cases that need it. Josef mentioned ENOSPC flushing and the free space cache inode as other potential victims. Rather than playing whack-a-mole, this fix is conservative and makes all commits not in fsync wait for all previous transactions, which is what the optimization intended. Fixes: `d0c2f4fa55` ("btrfs: make concurrent fsyncs wait less when waiting for a transaction commit") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-08 19:12:54 +01:00
Filipe Manana	5342e9f3da	btrfs: fix lost prealloc extents beyond eof after full fsync commit d99478874355d3a7b9d86dfb5d7590d5b1754b1f upstream. When doing a full fsync, if we have prealloc extents beyond (or at) eof, and the leaves that contain them were not modified in the current transaction, we end up not logging them. This results in losing those extents when we replay the log after a power failure, since the inode is truncated to the current value of the logged i_size. Just like for the fast fsync path, we need to always log all prealloc extents starting at or beyond i_size. The fast fsync case was fixed in commit `471d557afe` ("Btrfs: fix loss of prealloc extents past i_size after fsync log replay") but it missed the full fsync path. The problem exists since the very early days, when the log tree was added by commit `e02119d5a7` ("Btrfs: Add a write ahead tree log to optimize synchronous operations"). Example reproducer: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt # Create our test file with many file extent items, so that they span # several leaves of metadata, even if the node/page size is 64K. Use # direct IO and not fsync/O_SYNC because it's both faster and it avoids # clearing the full sync flag from the inode - we want the fsync below # to trigger the slow full sync code path. $ xfs_io -f -d -c "pwrite -b 4K 0 16M" /mnt/foo # Now add two preallocated extents to our file without extending the # file's size. One right at i_size, and another further beyond, leaving # a gap between the two prealloc extents. $ xfs_io -c "falloc -k 16M 1M" /mnt/foo $ xfs_io -c "falloc -k 20M 1M" /mnt/foo # Make sure everything is durably persisted and the transaction is # committed. This makes all created extents to have a generation lower # than the generation of the transaction used by the next write and # fsync. sync # Now overwrite only the first extent, which will result in modifying # only the first leaf of metadata for our inode. Then fsync it. This # fsync will use the slow code path (inode full sync bit is set) because # it's the first fsync since the inode was created/loaded. $ xfs_io -c "pwrite 0 4K" -c "fsync" /mnt/foo # Extent list before power failure. $ xfs_io -c "fiemap -v" /mnt/foo /mnt/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..7]: 2178048..2178055 8 0x0 1: [8..16383]: 26632..43007 16376 0x0 2: [16384..32767]: 2156544..2172927 16384 0x0 3: [32768..34815]: 2172928..2174975 2048 0x800 4: [34816..40959]: hole 6144 5: [40960..43007]: 2174976..2177023 2048 0x801 <power fail> # Mount fs again, trigger log replay. $ mount /dev/sdc /mnt # Extent list after power failure and log replay. $ xfs_io -c "fiemap -v" /mnt/foo /mnt/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..7]: 2178048..2178055 8 0x0 1: [8..16383]: 26632..43007 16376 0x0 2: [16384..32767]: 2156544..2172927 16384 0x1 # The prealloc extents at file offsets 16M and 20M are missing. So fix this by calling btrfs_log_prealloc_extents() when we are doing a full fsync, so that we always log all prealloc extents beyond eof. A test case for fstests will follow soon. CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-08 19:12:54 +01:00
Filipe Manana	5afd80c393	btrfs: fix ENOSPC failure when attempting direct IO write into NOCOW range commit f0bfa76a11e93d0fe2c896fcb566568c5e8b5d3f upstream. When doing a direct IO write against a file range that either has preallocated extents in that range or has regular extents and the file has the NOCOW attribute set, the write fails with -ENOSPC when all of the following conditions are met: 1) There are no data blocks groups with enough free space matching the size of the write; 2) There's not enough unallocated space for allocating a new data block group; 3) The extents in the target file range are not shared, neither through snapshots nor through reflinks. This is wrong because a NOCOW write can be done in such case, and in fact it's possible to do it using a buffered IO write, since when failing to allocate data space, the buffered IO path checks if a NOCOW write is possible. The failure in direct IO write path comes from the fact that early on, at btrfs_dio_iomap_begin(), we try to allocate data space for the write and if it that fails we return the error and stop - we never check if we can do NOCOW. But later, at btrfs_get_blocks_direct_write(), we check if we can do a NOCOW write into the range, or a subset of the range, and then release the previously reserved data space. Fix this by doing the data reservation only if needed, when we must COW, at btrfs_get_blocks_direct_write() instead of doing it at btrfs_dio_iomap_begin(). This also simplifies a bit the logic and removes the inneficiency of doing unnecessary data reservations. The following example test script reproduces the problem: $ cat dio-nocow-enospc.sh #!/bin/bash DEV=/dev/sdj MNT=/mnt/sdj # Use a small fixed size (1G) filesystem so that it's quick to fill # it up. # Make sure the mixed block groups feature is not enabled because we # later want to not have more space available for allocating data # extents but still have enough metadata space free for the file writes. mkfs.btrfs -f -b $((1024 * 1024 * 1024)) -O ^mixed-bg $DEV mount $DEV $MNT # Create our test file with the NOCOW attribute set. touch $MNT/foobar chattr +C $MNT/foobar # Now fill in all unallocated space with data for our test file. # This will allocate a data block group that will be full and leave # no (or a very small amount of) unallocated space in the device, so # that it will not be possible to allocate a new block group later. echo echo "Creating test file with initial data..." xfs_io -c "pwrite -S 0xab -b 1M 0 900M" $MNT/foobar # Now try a direct IO write against file range [0, 10M[. # This should succeed since this is a NOCOW file and an extent for the # range was previously allocated. echo echo "Trying direct IO write over allocated space..." xfs_io -d -c "pwrite -S 0xcd -b 10M 0 10M" $MNT/foobar umount $MNT When running the test: $ ./dio-nocow-enospc.sh (...) Creating test file with initial data... wrote 943718400/943718400 bytes at offset 0 900 MiB, 900 ops; 0:00:01.43 (625.526 MiB/sec and 625.5265 ops/sec) Trying direct IO write over allocated space... pwrite: No space left on device A test case for fstests will follow, testing both this direct IO write scenario as well as the buffered IO write scenario to make it less likely to get future regressions on the buffered IO case. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-08 19:12:46 +01:00
Steve French	aa280c04da	cifs: fix confusing unneeded warning message on smb2.1 and earlier [ Upstream commit 53923e0fe2098f90f339510aeaa0e1413ae99a16 ] When mounting with SMB2.1 or earlier, even with nomultichannel, we log the confusing warning message: "CIFS: VFS: multichannel is not supported on this protocol version, use 3.0 or above" Fix this so that we don't log this unless they really are trying to mount with multichannel. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215608 Reported-by: Kim Scarborough <kim@scarborough.kim> Cc: stable@vger.kernel.org # 5.11+ Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-03-08 19:12:41 +01:00
Shyam Prasad N	3d74c2c917	cifs: protect session channel fields with chan_lock [ Upstream commit 724244cdb3828522109c88e56a0242537aefabe9 ] Introducing a new spin lock to protect all the channel related fields in a cifs_ses struct. This lock should be taken whenever dealing with the channel fields, and should be held only for very short intervals which will not sleep. Currently, all channel related fields in cifs_ses structure are protected by session_mutex. However, this mutex is held for long periods (sometimes while waiting for a reply from server). This makes the codepath quite tricky to change. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-03-08 19:12:41 +01:00
Sean Christopherson	3f20cf3cd4	hugetlbfs: fix off-by-one error in hugetlb_vmdelete_list() [ Upstream commit d6aba4c8e20d4d2bf65d589953f6d891c178f3a3 ] Pass "end - 1" instead of "end" when walking the interval tree in hugetlb_vmdelete_list() to fix an inclusive vs. exclusive bug. The two callers that pass a non-zero "end" treat it as exclusive, whereas the interval tree iterator expects an inclusive "last". E.g. punching a hole in a file that precisely matches the size of a single hugepage, with a vma starting right on the boundary, will result in unmap_hugepage_range() being called twice, with the second call having start==end. The off-by-one error doesn't cause functional problems as __unmap_hugepage_range() turns into a massive nop due to short-circuiting its for-loop on "address < end". But, the mmu_notifier invocations to invalid_range_{start,end}() are passed a bogus zero-sized range, which may be unexpected behavior for secondary MMUs. The bug was exposed by commit ed922739c919 ("KVM: Use interval tree to do fast hva lookup in memslots"), currently queued in the KVM tree for 5.17, which added a WARN to detect ranges with start==end. Link: https://lkml.kernel.org/r/20211228234257.1926057-1-seanjc@google.com Fixes: `1bfad99ab4` ("hugetlbfs: hugetlb_vmtruncate_list() needs to take a range to delete") Signed-off-by: Sean Christopherson <seanjc@google.com> Reported-by: syzbot+4e697fe80a31aa7efe21@syzkaller.appspotmail.com Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-03-08 19:12:38 +01:00
J. Bruce Fields	4425ca3677	nfsd: fix crash on COPY_NOTIFY with special stateid [ Upstream commit 074b07d94e0bb6ddce5690a9b7e2373088e8b33a ] RTM says "If the special ONE stateid is passed to nfs4_preprocess_stateid_op(), it returns status=0 but does not set *cstid. nfsd4_copy_notify() depends on stid being set if status=0, and thus can crash if the client sends the right COPY_NOTIFY RPC." RFC 7862 says "The cna_src_stateid MUST refer to either open or locking states provided earlier by the server. If it is invalid, then the operation MUST fail." The RFC doesn't specify an error, and the choice doesn't matter much as this is clearly illegal client behavior, but bad_stateid seems reasonable. Simplest is just to guarantee that nfs4_preprocess_stateid_op, called with non-NULL cstid, errors out if it can't return a stateid. Reported-by: rtm@csail.mit.edu Fixes: `624322f1ad` ("NFSD add COPY_NOTIFY operation") Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Olga Kornievskaia <kolga@netapp.com> Tested-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-03-08 19:12:36 +01:00
Chuck Lever	0f84cfb465	Revert "nfsd: skip some unnecessary stats in the v4 case" [ Upstream commit 58f258f65267542959487dbe8b5641754411843d ] On the wire, I observed NFSv4 OPEN(CREATE) operations sometimes returning a reasonable-looking value in the cinfo.before field and zero in the cinfo.after field. RFC 8881 Section 10.8.1 says: > When a client is making changes to a given directory, it needs to > determine whether there have been changes made to the directory by > other clients. It does this by using the change attribute as > reported before and after the directory operation in the associated > change_info4 value returned for the operation. and > ... The post-operation change > value needs to be saved as the basis for future change_info4 > comparisons. A good quality client implementation therefore saves the zero cinfo.after value. During a subsequent OPEN operation, it will receive a different non-zero value in the cinfo.before field for that directory, and it will incorrectly believe the directory has changed, triggering an undesirable directory cache invalidation. There are filesystem types where fs_supports_change_attribute() returns false, tmpfs being one. On NFSv4 mounts, this means the fh_getattr() call site in fill_pre_wcc() and fill_post_wcc() is never invoked. Subsequently, nfsd4_change_attribute() is invoked with an uninitialized @stat argument. In fill_pre_wcc(), @stat contains stale stack garbage, which is then placed on the wire. In fill_post_wcc(), ->fh_post_wc is all zeroes, so zero is placed on the wire. Both of these values are meaningless. This fix can be applied immediately to stable kernels. Once there are more regression tests in this area, this optimization can be attempted again. Fixes: `428a23d2bf` ("nfsd: skip some unnecessary stats in the v4 case") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-03-08 19:12:36 +01:00
Chuck Lever	3abe2a70f5	NFSD: Fix verifier returned in stable WRITEs [ Upstream commit f11ad7aa653130b71e2e89bed207f387718216d5 ] RFC 8881 explains the purpose of the write verifier this way: > The final portion of the result is the field writeverf. This field > is the write verifier and is a cookie that the client can use to > determine whether a server has changed instance state (e.g., server > restart) between a call to WRITE and a subsequent call to either > WRITE or COMMIT. But then it says: > This cookie MUST be unchanged during a single instance of the > NFSv4.1 server and MUST be unique between instances of the NFSv4.1 > server. If the cookie changes, then the client MUST assume that > any data written with an UNSTABLE4 value for committed and an old > writeverf in the reply has been lost and will need to be > recovered. RFC 1813 has similar language for NFSv3. NFSv2 does not have a write verifier since it doesn't implement the COMMIT procedure. Since commit `19e0663ff9` ("nfsd: Ensure sampling of the write verifier is atomic with the write"), the Linux NFS server has returned a boot-time-based verifier for UNSTABLE WRITEs, but a zero verifier for FILE_SYNC and DATA_SYNC WRITEs. FILE_SYNC and DATA_SYNC WRITEs are not followed up with a COMMIT, so there's no need for clients to compare verifiers for stable writes. However, by returning a different verifier for stable and unstable writes, the above commit puts the Linux NFS server a step farther out of compliance with the first MUST above. At least one NFS client (FreeBSD) noticed the difference, making this a potential regression. Reported-by: Rick Macklem <rmacklem@uoguelph.ca> Link: https://lore.kernel.org/linux-nfs/YQXPR0101MB096857EEACF04A6DF1FC6D9BDD749@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM/T/ Fixes: `19e0663ff9` ("nfsd: Ensure sampling of the write verifier is atomic with the write") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-03-08 19:12:36 +01:00
Hao Xu	1bd12b7aae	io_uring: fix no lock protection for ctx->cq_extra [ Upstream commit e302f1046f4c209291b07ff7bc4d15ca26891f16 ] ctx->cq_extra should be protected by completion lock so that the req_need_defer() does the right check. Cc: stable@vger.kernel.org Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Link: https://lore.kernel.org/r/20211125092103.224502-2-haoxu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-03-08 19:12:33 +01:00
Chuck Lever	384d1b1138	NFSD: Fix zero-length NFSv3 WRITEs [ Upstream commit 6a2f774424bfdcc2df3e17de0cefe74a4269cad5 ] The Linux NFS server currently responds to a zero-length NFSv3 WRITE request with NFS3ERR_IO. It responds to a zero-length NFSv4 WRITE with NFS4_OK and count of zero. RFC 1813 says of the WRITE procedure's @count argument: count The number of bytes of data to be written. If count is 0, the WRITE will succeed and return a count of 0, barring errors due to permissions checking. RFC 8881 has similar language for NFSv4, though NFSv4 removed the explicit @count argument because that value is already contained in the opaque payload array. The synthetic client pynfs's WRT4 and WRT15 tests do emit zero- length WRITEs to exercise this spec requirement. Commit `fdec6114ee` ("nfsd4: zero-length WRITE should succeed") addressed the same problem there with the same fix. But interestingly the Linux NFS client does not appear to emit zero- length WRITEs, instead squelching them. I'm not aware of a test that can generate such WRITEs for NFSv3, so I wrote a naive C program to generate a zero-length WRITE and test this fix. Fixes: `8154ef2776` ("NFSD: Clean up legacy NFS WRITE argument XDR decoders") Reported-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-03-08 19:12:33 +01:00

1 2 3 4 5 ...

74153 Commits