platform_kernel-5.15

Commit Graph

Author	SHA1	Message	Date
Andreas Gruenbacher	18e16d6365	UPSTREAM: iomap: Support partial direct I/O on user copy failures commit 97308f8b0d867e9ef59528cd97f0db55ffdf5651 upstream In iomap_dio_rw, when iomap_apply returns an -EFAULT error and the IOMAP_DIO_PARTIAL flag is set, complete the request synchronously and return a partial result. This allows the caller to deal with the page fault and retry the remainder of the request. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `ea7a578588`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I5de5ec3e55f18305f76e78ab4c72ce3ee1272522	2022-06-10 08:12:03 +02:00
Andreas Gruenbacher	89f91db7ae	UPSTREAM: iomap: Fix iomap_dio_rw return value for user copies commit 42c498c18a94eed79896c50871889af52fa0822e upstream When a user copy fails in one of the helpers of iomap_dio_rw, fail with -EFAULT instead of returning 0. This matches what iomap_dio_bio_actor returns when it gets an -EFAULT from bio_iov_iter_get_pages. With these changes, iomap_dio_actor now consistently fails with -EFAULT when a user page cannot be faulted in. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `a00cc46f97`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ib80793f127fe9e0a72195802fcb5a865834c05fc	2022-06-10 08:12:03 +02:00
Andreas Gruenbacher	9d16bdb659	UPSTREAM: gfs2: Fix mmap + page fault deadlocks for buffered I/O commit 00bfe02f479688a67a29019d1228f1470e26f014 upstream In the .read_iter and .write_iter file operations, we're accessing user-space memory while holding the inode glock. There is a possibility that the memory is mapped to the same file, in which case we'd recurse on the same glock. We could detect and work around this simple case of recursive locking, but more complex scenarios exist that involve multiple glocks, processes, and cluster nodes, and working around all of those cases isn't practical or even possible. Avoid these kinds of problems by disabling page faults while holding the inode glock. If a page fault would occur, we either end up with a partial read or write or with -EFAULT if nothing could be read or written. In either case, we know that we're not done with the operation, so we indicate that we're willing to give up the inode glock and then we fault in the missing pages. If that made us lose the inode glock, we return a partial read or write. Otherwise, we resume the operation. This locking problem was originally reported by Jan Kara. Linus came up with the idea of disabling page faults. Many thanks to Al Viro and Matthew Wilcox for their feedback. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `81a7fc397a`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ia00865e0cd79d269d8b6be204a529eb97d0a632e	2022-06-10 08:12:03 +02:00
Andreas Gruenbacher	64ef40ba87	UPSTREAM: gfs2: Eliminate ip->i_gh commit 1b223f7065bc7d89c4677c27381817cc95b117a8 upstream Now that gfs2_file_buffered_write is the only remaining user of ip->i_gh, we can move the glock holder to the stack (or rather, use the one we already have on the stack); there is no need for keeping the holder in the inode anymore. This is slightly complicated by the fact that we're using ip->i_gh for the statfs inode in gfs2_file_buffered_write as well. Writing to the statfs inode isn't very common, so allocate the statfs holder dynamically when needed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `38b5849881`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Icf91ad1960e8dec76e009059fd8ead472920971e	2022-06-10 08:12:03 +02:00
Andreas Gruenbacher	f0a127f84b	UPSTREAM: gfs2: Move the inode glock locking to gfs2_file_buffered_write commit b924bdab7445946e2ed364a0e6e249d36f1f1158 upstream So far, for buffered writes, we were taking the inode glock in gfs2_iomap_begin and dropping it in gfs2_iomap_end with the intention of not holding the inode glock while iomap_write_actor faults in user pages. It turns out that iomap_write_actor is called inside iomap_begin ... iomap_end, so the user pages were still faulted in while holding the inode glock and the locking code in iomap_begin / iomap_end was completely pointless. Move the locking into gfs2_file_buffered_write instead. We'll take care of the potential deadlocks due to faulting in user pages while holding a glock in a subsequent patch. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `8d363d8173`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I4ab7278f9255974aa5b4ae0d9dc3157909145fff	2022-06-10 08:12:03 +02:00
Bob Peterson	4e70d7c309	UPSTREAM: gfs2: Introduce flag for glock holder auto-demotion commit dc732906c2450939c319fec6e258aa89ecb5a632 upstream This patch introduces a new HIF_MAY_DEMOTE flag and infrastructure that will allow glocks to be demoted automatically on locking conflicts. When a locking request comes in that isn't compatible with the locking state of an active holder and that holder has the HIF_MAY_DEMOTE flag set, the holder will be demoted before the incoming locking request is granted. Note that this mechanism demotes active holders (with the HIF_HOLDER flag set), while before we were only demoting glocks without any active holders. This allows processes to keep hold of locks that may form a cyclic locking dependency; the core glock logic will then break those dependencies in case a conflicting locking request occurs. We'll use this to avoid giving up the inode glock proactively before faulting in pages. Processes that allow a glock holder to be taken away indicate this by calling gfs2_holder_allow_demote(), which sets the HIF_MAY_DEMOTE flag. Later, they call gfs2_holder_disallow_demote() to clear the flag again, and then they check if their holder is still queued: if it is, they are still holding the glock; if it isn't, they can re-acquire the glock (or abort). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `416a705304`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ief7088f69c59f2bbe73867846ee53c0735e514bc	2022-06-10 08:12:03 +02:00
Andreas Gruenbacher	d0e98c116d	UPSTREAM: gfs2: Clean up function may_grant commit 6144464937fe1e6135b13a30502a339d549bf093 upstream Pass the first current glock holder into function may_grant and deobfuscate the logic there. While at it, switch from BUG_ON to GLOCK_BUG_ON in may_grant. To make that build cleanly, de-constify the may_grant arguments. We're now using function find_first_holder in do_promote, so move the function's definition above do_promote. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `b25cfbc0e7`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ib02a752cdf8da470494fa06782bc92ca3f047a6f	2022-06-10 08:12:03 +02:00
Andreas Gruenbacher	3b46c843f2	UPSTREAM: gfs2: Add wrapper for iomap_file_buffered_write commit 2eb7509a05443048fb4df60b782de3f03c6c298b upstream Add a wrapper around iomap_file_buffered_write. We'll add code for when the operation needs to be retried here later. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `b88b998579`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I144fa4b02e35bb8a1c9ac1f1f0ed149c5cdf91b6	2022-06-10 08:12:03 +02:00
Andreas Gruenbacher	e1c331f4ec	UPSTREAM: iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable commit a6294593e8a1290091d0b078d5d33da5e0cd3dfe upstream Turn iov_iter_fault_in_readable into a function that returns the number of bytes not faulted in, similar to copy_to_user, instead of returning a non-zero value when any of the requested pages couldn't be faulted in. This supports the existing users that require all pages to be faulted in as well as new users that are happy if any pages can be faulted in. Rename iov_iter_fault_in_readable to fault_in_iov_iter_readable to make sure this change doesn't silently break things. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `30e66b1dfc`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Idc687713cf0585664c6eae5a075b3b067ba76b5e	2022-06-10 08:12:03 +02:00
Andreas Gruenbacher	d9fb814064	UPSTREAM: gup: Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} commit bb523b406c849eef8f265a07cd7f320f1f177743 upstream Turn fault_in_pages_{readable,writeable} into versions that return the number of bytes not faulted in, similar to copy_to_user, instead of returning a non-zero value when any of the requested pages couldn't be faulted in. This supports the existing users that require all pages to be faulted in as well as new users that are happy if any pages can be faulted in. Rename the functions to fault_in_{readable,writeable} to make sure this change doesn't silently break things. Neither of these functions is entirely trivial and it doesn't seem useful to inline them, so move them to mm/gup.c. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `923f05a660`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: If7387298a868e4e72244bd9c123e5ef9cf833629	2022-06-10 08:12:03 +02:00
Liujie Xie	50e4cd9df7	ANDROID: vendor_hooks: Add hooks for memory when debug Add vendors hooks for recording memory used Bug: 182443489 Bug: 234407991 Signed-off-by: Liujie Xie <xieliujie@oppo.com> Change-Id: I62d8bb2b6650d8b187b433f97eb833ef0b784df1	2022-06-02 14:37:19 -07:00
Greg Kroah-Hartman	910b540ffa	This is the 5.15.41 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmKErdYACgkQONu9yGCS aT4LIg//bqU/qlVBJq3v3QCfN93SOXwECuPXi0e8p3aG/ptn1pzV0A5+rDD4TXeU mpw8oWSNQC0Wn8mag9poCvLFaSHuMIo2wPoPOwpRyb0BfhXCaWE4s1HbX49dNi2l 9V76KC4JH69zaHtEmnBy83zzr4nyRG6q3bTd1hD3uJaSg6+2E+DfIK7MUSAyhyPU DSi2CSHgxZkBEJIVncCPPhz4HX7RJIav51EwAZlnqW6eX3osEe9Qh1b82qcatXIF rC/w74mQoRnghL0FRwQ5Z4mTEcgBI3lYoKJMkt8iZgOm9nBq1F62gXysKojmjDYc AvP97Jt+hF9hDFUfR2wW/sA/rsRhw1mx0LoFu96fc15Nxiyh8b/MFD+b3a2CqtN9 szggwoezx90RbGOsc1zNhoX3VmZ2xMeVyAVg4RCscxXiO5FR1DqV4mUlbXoQx9ah I9+GYtc0vs9BP9pucK2kx7fzJ+eW93jIZVXHuh6mcspu7sdTBQam+ERH1gMIyCki OXcg0HCTLQi26MNMkrdX+RebwHxB7tbLXUXqYzQKEWjsLtCyv7LYmsRV+RZSyZ1p vqKs4gTjLPmcINebsT80mYZSH560lXkjl6Z0H5U23NQO8RG3QewrKZkHqwOn5zyQ aF5fnmVTVcMbT8BK3qCadr00Giz5usnw8R9am5EY1ELzo48207A= =35Fl -----END PGP SIGNATURE----- Merge 5.15.41 into android13-5.15 Changes in 5.15.41 batman-adv: Don't skb_split skbuffs with frag_list iwlwifi: iwl-dbg: Use del_timer_sync() before freeing hwmon: (tmp401) Add OF device ID table mac80211: Reset MBSSID parameters upon connection net: Fix features skip in for_each_netdev_feature() net: mscc: ocelot: fix last VCAP IS1/IS2 filter persisting in hardware when deleted net: mscc: ocelot: fix VCAP IS2 filters matching on both lookups net: mscc: ocelot: restrict tc-trap actions to VCAP IS2 lookup 0 net: mscc: ocelot: avoid corrupting hardware counters when moving VCAP filters fbdev: simplefb: Cleanup fb_info in .fb_destroy rather than .remove fbdev: efifb: Cleanup fb_info in .fb_destroy rather than .remove fbdev: vesafb: Cleanup fb_info in .fb_destroy rather than .remove platform/surface: aggregator: Fix initialization order when compiling as builtin module ice: Fix race during aux device (un)plugging ice: fix PTP stale Tx timestamps cleanup ipv4: drop dst in multicast routing path drm/nouveau: Fix a potential theorical leak in nouveau_get_backlight_name() netlink: do not reset transport header in netlink_recvmsg() net: chelsio: cxgb4: Avoid potential negative array offset fbdev: efifb: Fix a use-after-free due early fb_info cleanup sfc: Use swap() instead of open coding it net: sfc: fix memory leak due to ptp channel mac80211_hwsim: call ieee80211_tx_prepare_skb under RCU protection nfs: fix broken handling of the softreval mount option ionic: fix missing pci_release_regions() on error in ionic_probe() dim: initialize all struct fields hwmon: (ltq-cputemp) restrict it to SOC_XWAY procfs: prevent unprivileged processes accessing fdinfo dir selftests: vm: Makefile: rename TARGETS to VMTARGETS arm64: vdso: fix makefile dependency on vdso.so virtio: fix virtio transitional ids s390/ctcm: fix variable dereferenced before check s390/ctcm: fix potential memory leak s390/lcs: fix variable dereferenced before check net/sched: act_pedit: really ensure the skb is writable net: ethernet: mediatek: ppe: fix wrong size passed to memset() net: bcmgenet: Check for Wake-on-LAN interrupt probe deferral drm/vc4: hdmi: Fix build error for implicit function declaration net: dsa: bcm_sf2: Fix Wake-on-LAN with mac_link_down() net/smc: non blocking recvmsg() return -EAGAIN when no data and signal_pending net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe() tls: Fix context leak on tls_device_down drm/vmwgfx: Fix fencing on SVGAv3 gfs2: Fix filesystem block deallocation for short writes hwmon: (f71882fg) Fix negative temperature RDMA/irdma: Fix deadlock in irdma_cleanup_cm_core() iommu: arm-smmu: disable large page mappings for Nvidia arm-smmu ASoC: max98090: Reject invalid values in custom control put() ASoC: max98090: Generate notifications on changes for custom control ASoC: ops: Validate input values in snd_soc_put_volsw_range() s390: disable -Warray-bounds ASoC: SOF: Fix NULL pointer exception in sof_pci_probe callback net: emaclite: Don't advertise 1000BASE-T and do auto negotiation net: sfp: Add tx-fault workaround for Huawei MA5671A SFP ONT secure_seq: use the 64 bits of the siphash for port offset calculation tcp: use different parts of the port_offset for index and offset tcp: resalt the secret every 10 seconds tcp: add small random increments to the source port tcp: dynamically allocate the perturb table used by source ports tcp: increase source port perturb table to 2^16 tcp: drop the hash_32() part from the index calculation interconnect: Restore sync state by ignoring ipa-virt in provider count firmware_loader: use kernel credentials when reading firmware KVM: PPC: Book3S PR: Enable MSR_DR for switch_mmu_context() usb: xhci-mtk: fix fs isoc's transfer error x86/mm: Fix marking of unused sub-pmd ranges tty/serial: digicolor: fix possible null-ptr-deref in digicolor_uart_probe() tty: n_gsm: fix buffer over-read in gsm_dlci_data() tty: n_gsm: fix mux activation issues in gsm_config() usb: cdc-wdm: fix reading stuck on device close usb: typec: tcpci: Don't skip cleanup in .remove() on error usb: typec: tcpci_mt6360: Update for BMC PHY setting USB: serial: pl2303: add device id for HP LM930 Display USB: serial: qcserial: add support for Sierra Wireless EM7590 USB: serial: option: add Fibocom L610 modem USB: serial: option: add Fibocom MA510 modem slimbus: qcom: Fix IRQ check in qcom_slim_probe fsl_lpuart: Don't enable interrupts too early serial: 8250_mtk: Fix UART_EFR register address serial: 8250_mtk: Fix register address for XON/XOFF character ceph: fix setting of xattrs on async created inodes Revert "mm/memory-failure.c: skip huge_zero_page in memory_failure()" mm/huge_memory: do not overkill when splitting huge_zero_page drm/vmwgfx: Disable command buffers on svga3 without gbobjects drm/nouveau/tegra: Stop using iommu_present() i40e: i40e_main: fix a missing check on list iterator net: atlantic: always deep reset on pm op, fixing up my null deref regression net: phy: Fix race condition on link status change writeback: Avoid skipping inode writeback cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp() arm[64]/memremap: don't abuse pfn_valid() to ensure presence of linear map net: phy: micrel: Do not use kszphy_suspend/resume for KSZ8061 net: phy: micrel: Pass .probe for KS8737 SUNRPC: Ensure that the gssproxy client can start in a connected state drm/vmwgfx: Initialize drm_mode_fb_cmd2 Revert "drm/amd/pm: keep the BACO feature enabled for suspend" dma-buf: call dma_buf_stats_setup after dmabuf is in valid list mm/hwpoison: use pr_err() instead of dump_page() in get_any_page() SUNRPC: Ensure we flush any closed sockets before xs_xprt_free() ping: fix address binding wrt vrf usb: gadget: uvc: rename function to be more consistent usb: gadget: uvc: allow for application to cleanly shutdown Linux 5.15.41 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ia65cdbbddf553237d6a3a38efb9bcb2fcc3990ec	2022-05-18 11:31:34 +02:00
Trond Myklebust	54f6834b28	SUNRPC: Ensure we flush any closed sockets before xs_xprt_free() commit f00432063db1a0db484e85193eccc6845435b80e upstream. We must ensure that all sockets are closed before we call xprt_free() and release the reference to the net namespace. The problem is that calling fput() will defer closing the socket until delayed_fput() gets called. Let's fix the situation by allowing rpciod and the transport teardown code (which runs on the system wq) to call __fput_sync(), and directly close the socket. Reported-by: Felix Fu <foyjog@gmail.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: `a73881c96d` ("SUNRPC: Fix an Oops in udp_poll()") Cc: stable@vger.kernel.org # 5.1.x: 3be232f11a3c: SUNRPC: Prevent immediate close+reconnect Cc: stable@vger.kernel.org # 5.1.x: 89f42494f92f: SUNRPC: Don't call connect() more than once on a TCP socket Cc: stable@vger.kernel.org # 5.1.x Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Meena Shanmugam <meenashanmugam@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-18 10:26:57 +02:00
Jing Xia	80b6fb3d18	writeback: Avoid skipping inode writeback commit 846a3351ddfe4a86eede4bb26a205c3f38ef84d3 upstream. We have run into an issue that a task gets stuck in balance_dirty_pages_ratelimited() when perform I/O stress testing. The reason we observed is that an I_DIRTY_PAGES inode with lots of dirty pages is in b_dirty_time list and standard background writeback cannot writeback the inode. After studing the relevant code, the following scenario may lead to the issue: task1 task2 ----- ----- fuse_flush write_inode_now //in b_dirty_time writeback_single_inode __writeback_single_inode fuse_write_end filemap_dirty_folio __xa_set_mark:PAGECACHE_TAG_DIRTY lock inode->i_lock if mapping tagged PAGECACHE_TAG_DIRTY inode->i_state \|= I_DIRTY_PAGES unlock inode->i_lock __mark_inode_dirty:I_DIRTY_PAGES lock inode->i_lock -was dirty,inode stays in -b_dirty_time unlock inode->i_lock if(!(inode->i_state & I_DIRTY_All)) -not true,so nothing done This patch moves the dirty inode to b_dirty list when the inode currently is not queued in b_io or b_more_io list at the end of writeback_single_inode. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> CC: stable@vger.kernel.org Fixes: `0ae45f63d4` ("vfs: add support for a lazytime mount option") Signed-off-by: Jing Xia <jing.xia@unisoc.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220510023514.27399-1-jing.xia@unisoc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-18 10:26:56 +02:00
Jeff Layton	8c09cb115e	ceph: fix setting of xattrs on async created inodes commit 620239d9a32e9fe27c9204ec11e40058671aeeb6 upstream. Currently when we create a file, we spin up an xattr buffer to send along with the create request. If we end up doing an async create however, then we currently pass down a zero-length xattr buffer. Fix the code to send down the xattr buffer in req->r_pagelist. If the xattrs span more than a page, however give up and don't try to do an async create. Cc: stable@vger.kernel.org URL: https://bugzilla.redhat.com/show_bug.cgi?id=2063929 Fixes: `9a8d03ca2e` ("ceph: attempt to do async create when possible") Reported-by: John Fortin <fortinj66@gmail.com> Reported-by: Sri Ramanujam <sri@ramanujam.io> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-18 10:26:55 +02:00
Andreas Gruenbacher	41d5ad9596	gfs2: Fix filesystem block deallocation for short writes [ Upstream commit d031a8866e709c9d1ee5537a321b6192b4d2dc5b ] When a write cannot be carried out in full, gfs2_iomap_end() releases blocks that have been allocated for this write but haven't been used. To compute the end of the allocation, gfs2_iomap_end() incorrectly rounded the end of the attempted write down to the next block boundary to arrive at the end of the allocation. It would have to round up, but the end of the allocation is also available as iomap->offset + iomap->length, so just use that instead. In addition, use round_up() for computing the start of the unused range. Fixes: `64bc06bb32` ("gfs2: iomap buffered write support") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-05-18 10:26:51 +02:00
Kalesh Singh	62cbb09899	procfs: prevent unprivileged processes accessing fdinfo dir [ Upstream commit 1927e498aee1757b3df755a194cbfc5cc0f2b663 ] The file permissions on the fdinfo dir from were changed from S_IRUSR\|S_IXUSR to S_IRUGO\|S_IXUGO, and a PTRACE_MODE_READ check was added for opening the fdinfo files [1]. However, the ptrace permission check was not added to the directory, allowing anyone to get the open FD numbers by reading the fdinfo directory. Add the missing ptrace permission check for opening the fdinfo directory. [1] https://lkml.kernel.org/r/20210308170651.919148-1-kaleshsingh@google.com Link: https://lkml.kernel.org/r/20210713162008.1056986-1-kaleshsingh@google.com Fixes: `7bc3fa0172` ("procfs: allow reading fdinfo with PTRACE_MODE_READ") Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Hridya Valsaraju <hridya@google.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-05-18 10:26:50 +02:00
Dan Aloni	1a2e139e68	nfs: fix broken handling of the softreval mount option [ Upstream commit 085d16d5f949b64713d5e960d6c9bbf51bc1d511 ] Turns out that ever since this mount option was added, passing `softreval` in NFS mount options cancelled all other flags while not affecting the underlying flag `NFS_MOUNT_SOFTREVAL`. Fixes: `c74dfe97c1` ("NFS: Add mount option 'softreval'") Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-05-18 10:26:49 +02:00
Greg Kroah-Hartman	0bfa00b6ba	Merge 5.15.40 into android13-5.15 Changes in 5.15.40 x86/lib/atomic64_386_32: Rename things x86: Prepare asm files for straight-line-speculation x86: Prepare inline-asm for straight-line-speculation objtool: Add straight-line-speculation validation x86/alternative: Relax text_poke_bp() constraint kbuild: move objtool_args back to scripts/Makefile.build x86: Add straight-line-speculation mitigation tools arch: Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem memcpy' kvm/emulate: Fix SETcc emulation function offsets with SLS crypto: x86/poly1305 - Fixup SLS objtool: Fix SLS validation for kcov tail-call replacement Bluetooth: Fix the creation of hdev->name rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition udf: Avoid using stale lengthOfImpUse mm: fix missing cache flush for all tail pages of compound page mm: hugetlb: fix missing cache flush in copy_huge_page_from_user() mm: shmem: fix missing cache flush in shmem_mfill_atomic_pte() mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic() mm/hwpoison: fix error page recovered but reported "not recovered" mm/mlock: fix potential imbalanced rlimit ucounts adjustment mm: fix invalid page pointer returned with FOLL_PIN gups Linux 5.15.40 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ib068d0412565187435c8aeeeb22b683b6aa3a9b1	2022-05-18 09:40:16 +02:00
Greg Kroah-Hartman	2bde857bee	Merge 5.15.39 into android13-5.15 Changes in 5.15.39 MIPS: Fix CP0 counter erratum detection for R4k CPUs parisc: Merge model and model name into one line in /proc/cpuinfo ALSA: hda/realtek: Add quirk for Yoga Duet 7 13ITL6 speakers ALSA: fireworks: fix wrong return count shorter than expected by 4 bytes mmc: sdhci-msm: Reset GCC_SDCC_BCR register for SDHC mmc: sunxi-mmc: Fix DMA descriptors allocated above 32 bits mmc: core: Set HS clock speed before sending HS CMD13 gpiolib: of: fix bounds check for 'gpio-reserved-ranges' x86/fpu: Prevent FPU state corruption KVM: x86/svm: Account for family 17h event renumberings in amd_pmc_perf_hw_id iommu/vt-d: Calculate mask for non-aligned flushes iommu/arm-smmu-v3: Fix size calculation in arm_smmu_mm_invalidate_range() drm/amd/display: Avoid reading audio pattern past AUDIO_CHANNELS_COUNT drm/amdgpu: do not use passthrough mode in Xen dom0 RISC-V: relocate DTB if it's outside memory region Revert "SUNRPC: attempt AF_LOCAL connect on setup" timekeeping: Mark NMI safe time accessors as notrace firewire: fix potential uaf in outbound_phy_packet_callback() firewire: remove check of list iterator against head past the loop body firewire: core: extend card->lock in fw_core_handle_bus_reset net: stmmac: disable Split Header (SPH) for Intel platforms genirq: Synchronize interrupt thread startup ASoC: da7219: Fix change notifications for tone generator frequency ASoC: wm8958: Fix change notifications for DSP controls ASoC: meson: Fix event generation for AUI ACODEC mux ASoC: meson: Fix event generation for G12A tohdmi mux ASoC: meson: Fix event generation for AUI CODEC mux s390/dasd: fix data corruption for ESE devices s390/dasd: prevent double format of tracks for ESE devices s390/dasd: Fix read for ESE with blksize < 4k s390/dasd: Fix read inconsistency for ESE DASD devices can: grcan: grcan_close(): fix deadlock can: isotp: remove re-binding of bound socket can: grcan: use ofdev->dev when allocating DMA memory can: grcan: grcan_probe(): fix broken system id check for errata workaround needs can: grcan: only use the NAPI poll budget for RX nfc: replace improper check device_is_registered() in netlink related functions nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs NFC: netlink: fix sleep in atomic bug when firmware download timeout gpio: visconti: Fix fwnode of GPIO IRQ gpio: pca953x: fix irq_stat not updated when irq is disabled (irq_mask not set) hwmon: (adt7470) Fix warning on module removal hwmon: (pmbus) disable PEC if not enabled ASoC: dmaengine: Restore NULL prepare_slave_config() callback ASoC: soc-ops: fix error handling iommu/vt-d: Drop stop marker messages iommu/dart: check return value after calling platform_get_resource() net/mlx5e: Fix trust state reset in reload net/mlx5e: Don't match double-vlan packets if cvlan is not set net/mlx5e: CT: Fix queued up restore put() executing after relevant ft release net/mlx5e: Fix the calling of update_buffer_lossy() API net/mlx5: Avoid double clear or set of sync reset requested net/mlx5: Fix deadlock in sync reset flow selftests/seccomp: Don't call read() on TTY from background pgrp SUNRPC release the transport of a relocated task with an assigned transport RDMA/siw: Fix a condition race issue in MPA request processing RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state RDMA/irdma: Reduce iWARP QP destroy time RDMA/irdma: Fix possible crash due to NULL netdev in notifier NFSv4: Don't invalidate inode attributes on delegation return net: ethernet: mediatek: add missing of_node_put() in mtk_sgmii_init() net: dsa: mt7530: add missing of_node_put() in mt7530_setup() net: stmmac: dwmac-sun8i: add missing of_node_put() in sun8i_dwmac_register_mdio_mux() net: mdio: Fix ENOMEM return value in BCM6368 mux bus controller net: cpsw: add missing of_node_put() in cpsw_probe_dt() net: igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter() net: emaclite: Add error handling for of_address_to_resource() selftests/net: so_txtime: fix parsing of start time stamp on 32 bit systems selftests/net: so_txtime: usage(): fix documentation of default clock drm/msm/dp: remove fail safe mode related code btrfs: do not BUG_ON() on failure to update inode when setting xattr hinic: fix bug of wq out of bound access mld: respect RCU rules in ip6_mc_source() and ip6_mc_msfilter() rxrpc: Enable IPv6 checksums on transport socket selftests: mirror_gre_bridge_1q: Avoid changing PVID while interface is operational bnxt_en: Fix possible bnxt_open() failure caused by wrong RFS flag bnxt_en: Fix unnecessary dropping of RX packets selftests: ocelot: tc_flower_chains: specify conform-exceed action for policer smsc911x: allow using IRQ0 btrfs: force v2 space cache usage for subpage mount btrfs: always log symlinks in full mode drm/amdgpu: unify BO evicting method in amdgpu_ttm drm/amdgpu: explicitly check for s0ix when evicting resources drm/amdgpu: don't set s3 and s0ix at the same time drm/amdgpu: Ensure HDA function is suspended before ASIC reset gpio: mvebu: drop pwm base assignment kvm: x86/cpuid: Only provide CPUID leaf 0xA if host has architectural PMU fbdev: Make fb_release() return -ENODEV if fbdev was unregistered net/mlx5: Fix slab-out-of-bounds while reading resource dump menu net/mlx5e: Lag, Fix use-after-free in fib event handler net/mlx5e: Lag, Fix fib_info pointer assignment net/mlx5e: Lag, Don't skip fib events on current dst iommu/dart: Add missing module owner to ops structure kvm: selftests: do not use bitfields larger than 32-bits for PTEs KVM: selftests: Silence compiler warning in the kvm_page_table_test x86/kvm: Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume KVM: x86: Do not change ICR on write to APIC_SELF_IPI KVM: x86/mmu: avoid NULL-pointer dereference on page freeing bugs KVM: LAPIC: Enable timer posted-interrupt only when mwait/hlt is advertised selftest/vm: verify mmap addr in mremap_test selftest/vm: verify remap destination address in mremap_test mmc: rtsx: add 74 Clocks in power on flow Revert "parisc: Mark sched_clock unstable only if clocks are not syncronized" rcu: Fix callbacks processing time limit retaining cond_resched() rcu: Apply callbacks processing time limit only on softirq PCI: pci-bridge-emul: Add description for class_revision field PCI: pci-bridge-emul: Add definitions for missing capabilities registers PCI: aardvark: Add support for DEVCAP2, DEVCTL2, LNKCAP2 and LNKCTL2 registers on emulated bridge PCI: aardvark: Clear all MSIs at setup PCI: aardvark: Comment actions in driver remove method PCI: aardvark: Disable bus mastering when unbinding driver PCI: aardvark: Mask all interrupts when unbinding driver PCI: aardvark: Fix memory leak in driver unbind PCI: aardvark: Assert PERST# when unbinding driver PCI: aardvark: Disable link training when unbinding driver PCI: aardvark: Disable common PHY when unbinding driver PCI: aardvark: Replace custom PCIE_CORE_INT_* macros with PCI_INTERRUPT_* PCI: aardvark: Rewrite IRQ code to chained IRQ handler PCI: aardvark: Check return value of generic_handle_domain_irq() when processing INTx IRQ PCI: aardvark: Make MSI irq_chip structures static driver structures PCI: aardvark: Make msi_domain_info structure a static driver structure PCI: aardvark: Use dev_fwnode() instead of of_node_to_fwnode(dev->of_node) PCI: aardvark: Refactor unmasking summary MSI interrupt PCI: aardvark: Add support for masking MSI interrupts PCI: aardvark: Fix setting MSI address PCI: aardvark: Enable MSI-X support PCI: aardvark: Add support for ERR interrupt on emulated bridge PCI: aardvark: Optimize writing PCI_EXP_RTCTL_PMEIE and PCI_EXP_RTSTA_PME on emulated bridge PCI: aardvark: Add support for PME interrupts PCI: aardvark: Fix support for PME requester on emulated bridge PCI: aardvark: Use separate INTA interrupt for emulated root bridge PCI: aardvark: Remove irq_mask_ack() callback for INTx interrupts PCI: aardvark: Don't mask irq when mapping PCI: aardvark: Drop __maybe_unused from advk_pcie_disable_phy() PCI: aardvark: Update comment about link going down after link-up Linux 5.15.39 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ic51d6d05a0d99156c6fd844786e984aff8e7386a	2022-05-18 09:38:58 +02:00
Greg Kroah-Hartman	4154968fe8	Merge 5.15.38 into android13-5.15 Changes in 5.15.38 usb: mtu3: fix USB 3.0 dual-role-switch from device to host USB: quirks: add a Realtek card reader USB: quirks: add STRING quirk for VCOM device USB: serial: whiteheat: fix heap overflow in WHITEHEAT_GET_DTR_RTS USB: serial: cp210x: add PIDs for Kamstrup USB Meter Reader USB: serial: option: add support for Cinterion MV32-WA/MV32-WB USB: serial: option: add Telit 0x1057, 0x1058, 0x1075 compositions usb: xhci: tegra:Fix PM usage reference leak of tegra_xusb_unpowergate_partitions xhci: Enable runtime PM on second Alderlake controller xhci: stop polling roothubs after shutdown xhci: increase usb U3 -> U0 link resume timeout from 100ms to 500ms iio: dac: ad5592r: Fix the missing return value. iio: dac: ad5446: Fix read_raw not returning set value iio: magnetometer: ak8975: Fix the error handling in ak8975_power_on() iio: imu: inv_icm42600: Fix I2C init possible nack usb: misc: fix improper handling of refcount in uss720_probe() usb: core: Don't hold the device lock while sleeping in do_proc_control() usb: typec: ucsi: Fix reuse of completion structure usb: typec: ucsi: Fix role swapping usb: gadget: uvc: Fix crash when encoding data for usb request usb: gadget: configfs: clear deactivation flag in configfs_composite_unbind() usb: dwc3: Try usb-role-switch first in dwc3_drd_init usb: dwc3: core: Fix tx/rx threshold settings usb: dwc3: core: Only handle soft-reset in DCTL usb: dwc3: gadget: Return proper request status usb: dwc3: pci: add support for the Intel Meteor Lake-P usb: cdns3: Fix issue for clear halt endpoint usb: phy: generic: Get the vbus supply serial: imx: fix overrun interrupts in DMA mode serial: amba-pl011: do not time out prematurely when draining tx fifo serial: 8250: Also set sticky MCR bits in console restoration serial: 8250: Correct the clock for EndRun PTP/1588 PCIe device arch_topology: Do not set llc_sibling if llc_id is invalid ceph: fix possible NULL pointer dereference for req->r_session bus: mhi: host: pci_generic: Add missing poweroff() PM callback bus: mhi: host: pci_generic: Flush recovery worker during freeze arm64: dts: imx8mm-venice: fix spi2 pin configuration pinctrl: samsung: fix missing GPIOLIB on ARM64 Exynos config hex2bin: make the function hex_to_bin constant-time hex2bin: fix access beyond string end riscv: patch_text: Fixup last cpu should be master x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests iocost: don't reset the inuse weight of under-weighted debtors virtio_net: fix wrong buf address calculation when using xdp cpufreq: qcom-hw: fix the race between LMH worker and cpuhp cpufreq: qcom-cpufreq-hw: Fix throttle frequency value on EPSS platforms video: fbdev: udlfb: properly check endpoint type arm64: dts: meson: remove CPU opps below 1GHz for G12B boards arm64: dts: meson: remove CPU opps below 1GHz for SM1 boards iio:imu:bmi160: disable regulator in error path mtd: rawnand: fix ecc parameters for mt7622 xsk: Fix l2fwd for copy mode + busy poll combo arm64: dts: imx8qm: Correct SCU clock controller's compatible property USB: Fix xhci event ring dequeue pointer ERDP update issue ARM: dts: imx6qdl-apalis: Fix sgtl5000 detection issue arm64: dts: imx8mn: Fix SAI nodes arm64: dts: meson-sm1-bananapi-m5: fix wrong GPIO pin labeling for CON1 phy: samsung: Fix missing of_node_put() in exynos_sata_phy_probe phy: samsung: exynos5250-sata: fix missing device put in probe error paths ARM: OMAP2+: Fix refcount leak in omap_gic_of_init bus: ti-sysc: Make omap3 gpt12 quirk handling SoC specific ARM: dts: dra7: Fix suspend warning for vpe powerdomain phy: ti: omap-usb2: Fix error handling in omap_usb2_enable_clocks ARM: dts: at91: Map MCLK for wm8731 on at91sam9g20ek ARM: dts: at91: sama5d4_xplained: fix pinctrl phandle name ARM: dts: at91: fix pinctrl phandles phy: mapphone-mdm6600: Fix PM error handling in phy_mdm6600_probe phy: ti: Add missing pm_runtime_disable() in serdes_am654_probe interconnect: qcom: sdx55: Drop IP0 interconnects ARM: dts: Fix mmc order for omap3-gta04 ARM: dts: am3517-evm: Fix misc pinmuxing ARM: dts: logicpd-som-lv: Fix wrong pinmuxing on OMAP35 ipvs: correctly print the memory size of ip_vs_conn_tab phy: amlogic: fix error path in phy_g12a_usb3_pcie_probe() pinctrl: mediatek: moore: Fix build error mtd: rawnand: Fix return value check of wait_for_completion_timeout mtd: fix 'part' field data corruption in mtd_info pinctrl: stm32: Do not call stm32_gpio_get() for edge triggered IRQs in EOI memory: renesas-rpc-if: Fix HF/OSPI data transfer in Manual Mode net: dsa: Add missing of_node_put() in dsa_port_link_register_of netfilter: nft_set_rbtree: overlap detection with element re-addition after deletion bpf, lwt: Fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hook pinctrl: rockchip: fix RK3308 pinmux bits tcp: md5: incorrect tcp_header_len for incoming connections pinctrl: stm32: Keep pinctrl block clock enabled when LEVEL IRQ requested tcp: ensure to use the most recently sent skb when filling the rate sample wireguard: device: check for metadata_dst with skb_valid_dst() sctp: check asoc strreset_chunk in sctp_generate_reconf_event ARM: dts: imx6ull-colibri: fix vqmmc regulator arm64: dts: imx8mn-ddr4-evk: Describe the 32.768 kHz PMIC clock pinctrl: pistachio: fix use of irq_of_parse_and_map() cpufreq: fix memory leak in sun50i_cpufreq_nvmem_probe net: hns3: clear inited state and stop client after failed to register netdev net: hns3: modify the return code of hclge_get_ring_chain_from_mbx net: hns3: add validity check for message data length net: hns3: add return value for mailbox handling in PF net/smc: sync err code when tcp connection was refused ip_gre: Make o_seqno start from 0 in native mode ip6_gre: Make o_seqno start from 0 in native mode ip_gre, ip6_gre: Fix race condition on o_seqno in collect_md mode tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT tcp: make sure treq->af_specific is initialized bus: sunxi-rsb: Fix the return value of sunxi_rsb_device_create() clk: sunxi: sun9i-mmc: check return value after calling platform_get_resource() cpufreq: qcom-cpufreq-hw: Clear dcvs interrupts net: bcmgenet: hide status block before TX timestamping net: phy: marvell10g: fix return value on error net: dsa: mv88e6xxx: Fix port_hidden_wait to account for port_base_addr drm/sun4i: Remove obsolete references to PHYS_OFFSET net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLK io_uring: check reserved fields for send/sendmsg io_uring: check reserved fields for recv/recvmsg netfilter: conntrack: fix udp offload timeout sysctl drm/amdkfd: Fix GWS queue count drm/amd/display: Fix memory leak in dcn21_clock_source_create tls: Skip tls_append_frag on zero copy size bnx2x: fix napi API usage sequence net: fec: add missing of_node_put() in fec_enet_init_stop_mode() gfs2: Prevent endless loops in gfs2_file_buffered_write gfs2: Minor retry logic cleanup gfs2: Make sure not to return short direct writes gfs2: No short reads or writes upon glock contention perf arm-spe: Fix addresses of synthesized SPE events ixgbe: ensure IPsec VF<->PF compatibility Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits" tcp: fix F-RTO may not work correctly when receiving DSACK ASoC: Intel: soc-acpi: correct device endpoints for max98373 ASoC: wm8731: Disable the regulator when probing fails ext4: fix bug_on in start_this_handle during umount filesystem arch: xtensa: platforms: Fix deadlock in rs_close() ksmbd: increment reference count of parent fp ksmbd: set fixed sector size to FS_SECTOR_SIZE_INFORMATION bonding: do not discard lowest hash bit for non layer3+4 hashing x86: __memcpy_flushcache: fix wrong alignment if size > 2^32 cifs: destage any unwritten data to the server before calling copychunk_write drivers: net: hippi: Fix deadlock in rr_close() powerpc/perf: Fix 32bit compile selftest/vm: verify mmap addr in mremap_test selftest/vm: verify remap destination address in mremap_test Revert "ACPI: processor: idle: fix lockup regression on 32-bit ThinkPad T40" zonefs: Fix management of open zones zonefs: Clear inode information flags on inode creation kasan: prevent cpu_quarantine corruption when CPU offline and cache shrink occur at same time mtd: rawnand: qcom: fix memory corruption that causes panic netfilter: Update ip6_route_me_harder to consider L3 domain drm/i915: Check EDID for HDR static metadata when choosing blc drm/i915: Fix SEL_FETCH_PLANE_*(PIPE_B+) register addresses net: ethernet: stmmac: fix write to sgmii_adapter_base ACPI: processor: idle: Avoid falling back to C3 type C-states thermal: int340x: Fix attr.show callback prototype btrfs: fix leaked plug after failure syncing log on zoned filesystems ARM: dts: at91: sama7g5ek: enable pull-up on flexcom3 console lines ARM: dts: imx8mm-venice-gw{71xx,72xx,73xx}: fix OTG controller OC mode x86/cpu: Load microcode during restore_processor_state() perf symbol: Pass is_kallsyms to symbols__fixup_end() perf symbol: Update symbols__fixup_end() tty: n_gsm: fix restart handling via CLD command tty: n_gsm: fix decoupled mux resource tty: n_gsm: fix mux cleanup after unregister tty device tty: n_gsm: fix wrong signal octet encoding in convergence layer type 2 tty: n_gsm: fix malformed counter for out of frame data netfilter: nft_socket: only do sk lookups when indev is available tty: n_gsm: fix insufficient txframe size tty: n_gsm: fix wrong DLCI release order tty: n_gsm: fix missing explicit ldisc flush tty: n_gsm: fix wrong command retry handling tty: n_gsm: fix wrong command frame length field encoding tty: n_gsm: fix wrong signal octets encoding in MSC tty: n_gsm: fix missing tty wakeup in convergence layer type 2 tty: n_gsm: fix reset fifo race condition tty: n_gsm: fix incorrect UA handling tty: n_gsm: fix software flow control handling perf symbol: Remove arch__symbols__fixup_end() eeprom: at25: Use DMA safe buffers objtool: Fix code relocs vs weak symbols objtool: Fix type of reloc::addend powerpc/64: Add UADDR64 relocation support Linux 5.15.38 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ic5e398d47dd6240ecde38f635b3085fae7c3c0ab	2022-05-18 09:00:50 +02:00
Greg Kroah-Hartman	ecda2085fd	Revert 5.15.37 merge into android13-5.15 This reverts the merge of 5.15.37 into the android13-5.15 There are lots of ABI issues, and many of the commits are not needed in the Android tree at this time. Revert the merge (except for the Makefile change), so that future merges will continue to work, and the needed individual changes from this release will be manually added to the tree at a later point in time. Fixes: f7dace75d276 ("Merge 5.15.37 into android13-5.15") Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I0632858e5c0fb94fc14c0f4216997330eca260a7	2022-05-18 08:59:14 +02:00
Greg Kroah-Hartman	ef5fed3c1e	Merge 5.15.37 into android13-5.15 Changes in 5.15.37 floppy: disable FDRAWCMD by default bpf: Introduce composable reg, ret and arg types. bpf: Replace ARG_XXX_OR_NULL with ARG_XXX \| PTR_MAYBE_NULL bpf: Replace RET_XXX_OR_NULL with RET_XXX \| PTR_MAYBE_NULL bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX \| PTR_MAYBE_NULL bpf: Introduce MEM_RDONLY flag bpf: Convert PTR_TO_MEM_OR_NULL to composable types. bpf: Make per_cpu_ptr return rdonly PTR_TO_MEM. bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem. bpf/selftests: Test PTR_TO_RDONLY_MEM bpf: Fix crash due to out of bounds access into reg2btf_ids. spi: cadence-quadspi: fix write completion support ARM: dts: socfpga: change qspi to "intel,socfpga-qspi" mm: kfence: fix objcgs vector allocation gup: Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable iov_iter: Introduce fault_in_iov_iter_writeable gfs2: Add wrapper for iomap_file_buffered_write gfs2: Clean up function may_grant gfs2: Introduce flag for glock holder auto-demotion gfs2: Move the inode glock locking to gfs2_file_buffered_write gfs2: Eliminate ip->i_gh gfs2: Fix mmap + page fault deadlocks for buffered I/O iomap: Fix iomap_dio_rw return value for user copies iomap: Support partial direct I/O on user copy failures iomap: Add done_before argument to iomap_dio_rw gup: Introduce FOLL_NOFAULT flag to disable page faults iov_iter: Introduce nofault flag to disable page faults gfs2: Fix mmap + page fault deadlocks for direct I/O btrfs: fix deadlock due to page faults during direct IO reads and writes btrfs: fallback to blocking mode when doing async dio over multiple extents mm: gup: make fault_in_safe_writeable() use fixup_user_fault() selftests/bpf: Add test for reg2btf_ids out of bounds access Linux 5.15.37 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ica39b8856d6e3928a82f4e34f8b401f1a5cba5ee	2022-05-18 08:59:02 +02:00
Greg Kroah-Hartman	e95cdba8e2	Merge 5.15.36 into android13-5.15 Changes in 5.15.36 fs: remove __sync_filesystem block: remove __sync_blockdev block: simplify the block device syncing code vfs: make sync_filesystem return errors from ->sync_fs xfs: return errors in xfs_fs_sync_fs dma-mapping: remove bogus test for pfn_valid from dma_map_resource arm64/mm: drop HAVE_ARCH_PFN_VALID etherdevice: Adjust ether_addr* prototypes to silence -Wstringop-overead mm: page_alloc: fix building error on -Werror=array-compare perf tools: Fix segfault accessing sample_id xyarray mm, kfence: support kmem_dump_obj() for KFENCE objects gfs2: assign rgrp glock before compute_bitstructs scsi: ufs: core: scsi_get_lba() error fix net/sched: cls_u32: fix netns refcount changes in u32_change() ALSA: usb-audio: Clear MIDI port active flag after draining ALSA: hda/realtek: Add quirk for Clevo NP70PNP ASoC: atmel: Remove system clock tree configuration for at91sam9g20ek ASoC: topology: Correct error handling in soc_tplg_dapm_widget_create() ASoC: rk817: Use devm_clk_get() in rk817_platform_probe ASoC: msm8916-wcd-digital: Check failure for devm_snd_soc_register_component ASoC: codecs: wcd934x: do not switch off SIDO Buck when codec is in use dmaengine: idxd: fix device cleanup on disable dmaengine: imx-sdma: Fix error checking in sdma_event_remap dmaengine: mediatek:Fix PM usage reference leak of mtk_uart_apdma_alloc_chan_resources dmaengine: dw-edma: Fix unaligned 64bit access spi: spi-mtk-nor: initialize spi controller after resume esp: limit skb_page_frag_refill use to a single page spi: cadence-quadspi: fix incorrect supports_op() return value igc: Fix infinite loop in release_swfw_sync igc: Fix BUG: scheduling while atomic igc: Fix suspending when PTM is active ALSA: hda/hdmi: fix warning about PCM count when used with SOF rxrpc: Restore removed timer deletion net/smc: Fix sock leak when release after smc_shutdown() net/packet: fix packet_sock xmit return value checking ip6_gre: Avoid updating tunnel->tun_hlen in __gre6_xmit() ip6_gre: Fix skb_under_panic in __gre6_xmit() net: restore alpha order to Ethernet devices in config net/sched: cls_u32: fix possible leak in u32_init_knode() l3mdev: l3mdev_master_upper_ifindex_by_index_rcu should be using netdev_master_upper_dev_get_rcu ipv6: make ip6_rt_gc_expire an atomic_t can: isotp: stop timeout monitoring when no first frame was sent net: dsa: hellcreek: Calculate checksums in tagger net: mscc: ocelot: fix broken IP multicast flooding netlink: reset network and mac headers in netlink_dump() drm/i915/display/psr: Unset enable_psr2_sel_fetch if other checks in intel_psr2_config_valid() fails net: stmmac: Use readl_poll_timeout_atomic() in atomic state dmaengine: idxd: add RO check for wq max_batch_size write dmaengine: idxd: add RO check for wq max_transfer_size write dmaengine: idxd: skip clearing device context when device is read-only selftests: mlxsw: vxlan_flooding: Prevent flooding of unwanted packets arm64: mm: fix p?d_leaf() ARM: vexpress/spc: Avoid negative array index when !SMP reset: renesas: Check return value of reset_control_deassert() reset: tegra-bpmp: Restore Handle errors in BPMP response platform/x86: samsung-laptop: Fix an unsigned comparison which can never be negative ALSA: usb-audio: Fix undefined behavior due to shift overflowing the constant drm/msm/disp: check the return value of kzalloc() arm64: dts: imx: Fix imx8*-var-som touchscreen property sizes vxlan: fix error return code in vxlan_fdb_append cifs: Check the IOCB_DIRECT flag, not O_DIRECT net: atlantic: Avoid out-of-bounds indexing mt76: Fix undefined behavior due to shift overflowing the constant brcmfmac: sdio: Fix undefined behavior due to shift overflowing the constant dpaa_eth: Fix missing of_node_put in dpaa_get_ts_info() drm/msm/mdp5: check the return of kzalloc() net: macb: Restart tx only if queue pointer is lagging scsi: iscsi: Release endpoint ID when its freed scsi: iscsi: Merge suspend fields scsi: iscsi: Fix NOP handling during conn recovery scsi: qedi: Fix failed disconnect handling stat: fix inconsistency between struct stat and struct compat_stat VFS: filename_create(): fix incorrect intent. nvme: add a quirk to disable namespace identifiers nvme-pci: disable namespace identifiers for the MAXIO MAP1002/1202 nvme-pci: disable namespace identifiers for Qemu controllers EDAC/synopsys: Read the error count from the correct register mm/memory-failure.c: skip huge_zero_page in memory_failure() memcg: sync flush only if periodic flush is delayed mm, hugetlb: allow for "high" userspace addresses oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() ata: pata_marvell: Check the 'bmdma_addr' beforing reading dma: at_xdmac: fix a missing check on list iterator dmaengine: imx-sdma: fix init of uart scripts net: atlantic: invert deep par in pm functions, preventing null derefs Input: omap4-keypad - fix pm_runtime_get_sync() error checking scsi: sr: Do not leak information in ioctl sched/pelt: Fix attach_entity_load_avg() corner case perf/core: Fix perf_mmap fail when CONFIG_PERF_USE_VMALLOC enabled drm/panel/raspberrypi-touchscreen: Avoid NULL deref if not initialised drm/panel/raspberrypi-touchscreen: Initialise the bridge in prepare KVM: PPC: Fix TCE handling for VFIO drm/vc4: Use pm_runtime_resume_and_get to fix pm_runtime_get_sync() usage powerpc/perf: Fix power9 event alternatives powerpc/perf: Fix power10 event alternatives perf script: Always allow field 'data_src' for auxtrace perf report: Set PERF_SAMPLE_DATA_SRC bit for Arm SPE event xtensa: patch_text: Fixup last cpu should be master xtensa: fix a7 clobbering in coprocessor context load/store openvswitch: fix OOB access in reserve_sfa_size() gpio: Request interrupts after IRQ is initialized ASoC: soc-dapm: fix two incorrect uses of list iterator e1000e: Fix possible overflow in LTR decoding ARC: entry: fix syscall_trace_exit argument arm_pmu: Validate single/group leader events KVM: x86/pmu: Update AMD PMC sample period to fix guest NMI-watchdog KVM: x86: Pend KVM_REQ_APICV_UPDATE during vCPU creation to fix a race KVM: nVMX: Defer APICv updates while L2 is active until L1 is active KVM: SVM: Flush when freeing encrypted pages even on SME_COHERENT CPUs netfilter: conntrack: convert to refcount_t api netfilter: conntrack: avoid useless indirection during conntrack destruction ext4: fix fallocate to use file_modified to update permissions consistently ext4: fix symlink file size not match to file content ext4: fix use-after-free in ext4_search_dir ext4: limit length to bitmap_maxbytes - blocksize in punch_hole ext4, doc: fix incorrect h_reserved size ext4: fix overhead calculation to account for the reserved gdt blocks ext4: force overhead calculation if the s_overhead_cluster makes no sense netfilter: nft_ct: fix use after free when attaching zone template jbd2: fix a potential race while discarding reserved buffers after an abort spi: atmel-quadspi: Fix the buswidth adjustment between spi-mem and controller block/compat_ioctl: fix range check in BLKGETSIZE arm64: dts: qcom: add IPA qcom,qmp property Linux 5.15.36 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I44d3a4de9b6fa1d2016b4e063eb211e8373a1216	2022-05-18 08:55:59 +02:00
Prasad Sodagudi	525d77310a	ANDROID: pstore/ram: Add backward compatibility for ramoops reserved region Some of the platforms might be still expecting dedicated memory region for ramoops node. So add logic to detect the start and size of the ramoops memory region by looking up reserved memory region with of_reserved_mem_lookup() when platform_get_resource() failed. Bug: 191636717 Change-Id: Idc479b45fb3f637f7235efd6eabac62059d5e92b Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org> (cherry picked from commit 9b136eab76eee8988a8474008aa7a8cfc83be421)	2022-05-16 22:30:25 +00:00
Isaac J. Manjarres	317045867f	FROMLIST: pstore/ram: Rework logic for detecting ramoops reserved memory region The reserved memory region for ramoops is assumed to be at a fixed and known location when read from the devicetree. This is not desirable in environments where it is preferred for the region to be dynamically allocated at runtime, as opposed to it being fixed at compile time. Change the logic for detecting the start and size of the ramoops memory region by looking up the reserved memory region instead of using platform_get_resource(), which assumes that the location of the memory is known ahead of time. Bug: 191636717 Link: https://lore.kernel.org/patchwork/patch/1451704/ Change-Id: I24066de9f4fe1f1575cb1bbb1687c37a2b1938a4 Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Signed-off-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org> (cherry picked from commit bd2ca0ba5b9ccf3aa846d0f08a06c193e5a388f4)	2022-05-16 22:30:15 +00:00
Jan Kara	9e951f2d85	udf: Avoid using stale lengthOfImpUse commit c1ad35dd0548ce947d97aaf92f7f2f9a202951cf upstream. udf_write_fi() uses lengthOfImpUse of the entry it is writing to. However this field has not yet been initialized so it either contains completely bogus value or value from last directory entry at that place. In either case this is wrong and can lead to filesystem corruption or kernel crashes. Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> CC: stable@vger.kernel.org Fixes: `979a6e28dd` ("udf: Get rid of 0-length arrays in struct fileIdentDesc") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-15 20:18:52 +02:00
Filipe Manana	3d0e7373b2	btrfs: always log symlinks in full mode commit d0e64a981fd841cb0f28fcd6afcac55e6f1e6994 upstream. On Linux, empty symlinks are invalid, and attempting to create one with the system call symlink(2) results in an -ENOENT error and this is explicitly documented in the man page. If we rename a symlink that was created in the current transaction and its parent directory was logged before, we actually end up logging the symlink without logging its content, which is stored in an inline extent. That means that after a power failure we can end up with an empty symlink, having no content and an i_size of 0 bytes. It can be easily reproduced like this: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt $ mkdir /mnt/testdir $ sync # Create a file inside the directory and fsync the directory. $ touch /mnt/testdir/foo $ xfs_io -c "fsync" /mnt/testdir # Create a symlink inside the directory and then rename the symlink. $ ln -s /mnt/testdir/foo /mnt/testdir/bar $ mv /mnt/testdir/bar /mnt/testdir/baz # Now fsync again the directory, this persist the log tree. $ xfs_io -c "fsync" /mnt/testdir <power failure> $ mount /dev/sdc /mnt $ stat -c %s /mnt/testdir/baz 0 $ readlink /mnt/testdir/baz $ Fix this by always logging symlinks in full mode (LOG_INODE_ALL), so that their content is also logged. A test case for fstests will follow. CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-12 12:30:20 +02:00
Qu Wenruo	e42a854548	btrfs: force v2 space cache usage for subpage mount commit 9f73f1aef98b2fa7252c0a89be64840271ce8ea0 upstream. [BUG] For a 4K sector sized btrfs with v1 cache enabled and only mounted on systems with 4K page size, if it's mounted on subpage (64K page size) systems, it can cause the following warning on v1 space cache: BTRFS error (device dm-1): csum mismatch on free space cache BTRFS warning (device dm-1): failed to load free space cache for block group 84082688, rebuilding it now Although not a big deal, as kernel can rebuild it without problem, such warning will bother end users, especially if they want to switch the same btrfs seamlessly between different page sized systems. [CAUSE] V1 free space cache is still using fixed PAGE_SIZE for various bitmap, like BITS_PER_BITMAP. Such hard-coded PAGE_SIZE usage will cause various mismatch, from v1 cache size to checksum. Thus kernel will always reject v1 cache with a different PAGE_SIZE with csum mismatch. [FIX] Although we should fix v1 cache, it's already going to be marked deprecated soon. And we have v2 cache based on metadata (which is already fully subpage compatible), and it has almost everything superior than v1 cache. So just force subpage mount to use v2 cache on mount. Reported-by: Matt Corallo <blnxfsl@bluematt.me> CC: stable@vger.kernel.org # 5.15+ Link: https://lore.kernel.org/linux-btrfs/61aa27d1-30fc-c1a9-f0f4-9df544395ec3@bluematt.me/ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-12 12:30:20 +02:00
Filipe Manana	74b9abc468	btrfs: do not BUG_ON() on failure to update inode when setting xattr commit 193b4e83986d7ee6caa8ceefb5ee9f58240fbee0 upstream. We are doing a BUG_ON() if we fail to update an inode after setting (or clearing) a xattr, but there's really no reason to not instead simply abort the transaction and return the error to the caller. This should be a rare error because we have previously reserved enough metadata space to update the inode and the delayed inode should have already been setup, so an -ENOSPC or -ENOMEM, which are the possible errors, are very unlikely to happen. So replace the BUG_ON()s with a transaction abort. CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-12 12:30:18 +02:00
Trond Myklebust	d34f9bbc1d	NFSv4: Don't invalidate inode attributes on delegation return commit 00c94ebec5925593c0377b941289224469e72ac7 upstream. There is no need to declare attributes such as the ctime, mtime and block size invalid when we're just returning a delegation, so it is inappropriate to call nfs_post_op_update_inode_force_wcc(). Instead, just call nfs_refresh_inode() after faking up the change attribute. We know that the GETATTR op occurs before the DELEGRETURN, so we are safe when doing this. Fixes: `0bc2c9b4dc` ("NFSv4: Don't discard the attributes returned by asynchronous DELEGRETURN") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-12 12:30:15 +02:00
Filipe Manana	cf12ce1bd7	btrfs: fix leaked plug after failure syncing log on zoned filesystems commit 50ff57888d0b13440e7f4cde05dc339ee8d0f1f8 upstream. On a zoned filesystem, if we fail to allocate the root node for the log root tree while syncing the log, we end up returning without finishing the IO plug we started before, resulting in leaking resources as we have started writeback for extent buffers of a log tree before. That allocation failure, which typically is either -ENOMEM or -ENOSPC, is not fatal and the fsync can safely fallback to a full transaction commit. So release the IO plug if we fail to allocate the extent buffer for the root of the log root tree when syncing the log on a zoned filesystem. Fixes: `3ddebf27fc` ("btrfs: zoned: reorder log node allocation on zoned filesystem") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:42 +02:00
Damien Le Moal	051e78dc1f	zonefs: Clear inode information flags on inode creation commit 694852ead287a3433126e7ebda397b242dc99624 upstream. Ensure that the i_flags field of struct zonefs_inode_info is cleared to 0 when initializing a zone file inode, avoiding seeing the flag ZONEFS_ZONE_OPEN being incorrectly set. Fixes: `b5c00e9757` ("zonefs: open/close zone on file open/close") Cc: <stable@vger.kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:41 +02:00
Damien Le Moal	534c3f29ac	zonefs: Fix management of open zones commit 1da18a296f5ba4f99429e62a7cf4fdbefa598902 upstream. The mount option "explicit_open" manages the device open zone resources to ensure that if an application opens a sequential file for writing, the file zone can always be written by explicitly opening the zone and accounting for that state with the s_open_zones counter. However, if some zones are already open when mounting, the device open zone resource usage status will be larger than the initial s_open_zones value of 0. Ensure that this inconsistency does not happen by closing any sequential zone that is open when mounting. Furthermore, with ZNS drives, closing an explicitly open zone that has not been written will change the zone state to "closed", that is, the zone will remain in an active state. Since this can then cause failures of explicit open operations on other zones if the drive active zone resources are exceeded, we need to make sure that the zone is not active anymore by resetting it instead of closing it. To address this, zonefs_zone_mgmt() is modified to change a REQ_OP_ZONE_CLOSE request into a REQ_OP_ZONE_RESET for sequential zones that have not been written. Fixes: `b5c00e9757` ("zonefs: open/close zone on file open/close") Cc: <stable@vger.kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:41 +02:00
Ronnie Sahlberg	3bb73c4cc2	cifs: destage any unwritten data to the server before calling copychunk_write [ Upstream commit f5d0f921ea362636e4a2efb7c38d1ead373a8700 ] because the copychunk_write might cover a region of the file that has not yet been sent to the server and thus fail. A simple way to reproduce this is: truncate -s 0 /mnt/testfile; strace -f -o x -ttT xfs_io -i -f -c 'pwrite 0k 128k' -c 'fcollapse 16k 24k' /mnt/testfile the issue is that the 'pwrite 0k 128k' becomes rearranged on the wire with the 'fcollapse 16k 24k' due to write-back caching. fcollapse is implemented in cifs.ko as a SMB2 IOCTL(COPYCHUNK_WRITE) call and it will fail serverside since the file is still 0b in size serverside until the writes have been destaged. To avoid this we must ensure that we destage any unwritten data to the server before calling COPYCHUNK_WRITE. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1997373 Reported-by: Xiaoli Feng <xifeng@redhat.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-05-09 09:14:40 +02:00
Namjae Jeon	d276bcc5f7	ksmbd: set fixed sector size to FS_SECTOR_SIZE_INFORMATION [ Upstream commit 02655a70b7cc0f534531ee65fa72692f4d31a944 ] Currently ksmbd is using ->f_bsize from vfs_statfs() as sector size. If fat/exfat is a local share, ->f_bsize is a cluster size that is too large to be used as a sector size. Sector sizes larger than 4K cause problem occurs when mounting an iso file through windows client. The error message can be obtained using Mount-DiskImage command, the error is: "Mount-DiskImage : The sector size of the physical disk on which the virtual disk resides is not supported." This patch reports fixed 4KB sector size if ->s_blocksize is bigger than 4KB. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-05-09 09:14:40 +02:00
Namjae Jeon	df30cbfd3d	ksmbd: increment reference count of parent fp [ Upstream commit 8510a043d334ecdf83d4604782f288db6bf21d60 ] Add missing increment reference count of parent fp in ksmbd_lookup_fd_inode(). Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-05-09 09:14:40 +02:00
Ye Bin	52c3a04f9e	ext4: fix bug_on in start_this_handle during umount filesystem [ Upstream commit b98535d091795a79336f520b0708457aacf55c67 ] We got issue as follows: ------------[ cut here ]------------ kernel BUG at fs/jbd2/transaction.c:389! invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 9 PID: 131 Comm: kworker/9:1 Not tainted 5.17.0-862.14.0.6.x86_64-00001-g23f87daf7d74-dirty #197 Workqueue: events flush_stashed_error_work RIP: 0010:start_this_handle+0x41c/0x1160 RSP: 0018:ffff888106b47c20 EFLAGS: 00010202 RAX: ffffed10251b8400 RBX: ffff888128dc204c RCX: ffffffffb52972ac RDX: 0000000000000200 RSI: 0000000000000004 RDI: ffff888128dc2050 RBP: 0000000000000039 R08: 0000000000000001 R09: ffffed10251b840a R10: ffff888128dc204f R11: ffffed10251b8409 R12: ffff888116d78000 R13: 0000000000000000 R14: dffffc0000000000 R15: ffff888128dc2000 FS: 0000000000000000(0000) GS:ffff88839d680000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000001620068 CR3: 0000000376c0e000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> jbd2__journal_start+0x38a/0x790 jbd2_journal_start+0x19/0x20 flush_stashed_error_work+0x110/0x2b3 process_one_work+0x688/0x1080 worker_thread+0x8b/0xc50 kthread+0x26f/0x310 ret_from_fork+0x22/0x30 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- Above issue may happen as follows: umount read procfs error_work ext4_put_super flush_work(&sbi->s_error_work); ext4_mb_seq_groups_show ext4_mb_load_buddy_gfp ext4_mb_init_group ext4_mb_init_cache ext4_read_block_bitmap_nowait ext4_validate_block_bitmap ext4_error ext4_handle_error schedule_work(&EXT4_SB(sb)->s_error_work); ext4_unregister_sysfs(sb); jbd2_journal_destroy(sbi->s_journal); journal_kill_thread journal->j_flags \|= JBD2_UNMOUNT; flush_stashed_error_work jbd2_journal_start start_this_handle BUG_ON(journal->j_flags & JBD2_UNMOUNT); To solve this issue, we call 'ext4_unregister_sysfs() before flushing s_error_work in ext4_put_super(). Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20220322012419.725457-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-05-09 09:14:39 +02:00
Andreas Gruenbacher	3591293c19	gfs2: No short reads or writes upon glock contention [ Upstream commit 296abc0d91d8b65d42224dd33452ace14491ad08 ] Commit 00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for buffered I/O") changed gfs2_file_read_iter() and gfs2_file_buffered_write() to allow dropping the inode glock while faulting in user buffers. When the lock was dropped, a short result was returned to indicate that the operation was interrupted. As pointed out by Linus (see the link below), this behavior is broken and the operations should always re-acquire the inode glock and resume the operation instead. Link: https://lore.kernel.org/lkml/CAHk-=whaz-g_nOOoo8RRiWNjnv2R+h6_xk2F1J4TuSRxk1MtLw@mail.gmail.com/ Fixes: 00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for buffered I/O") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-05-09 09:14:39 +02:00
Andreas Gruenbacher	b5afb477d2	gfs2: Make sure not to return short direct writes [ Upstream commit 3bde4c48586074202044456285a97ccdf9048988 ] When direct writes fail with -ENOTBLK because we're writing into a hole (gfs2_iomap_begin()) or because of a page invalidation failure (iomap_dio_rw()), we're falling back to buffered writes. In that case, when we lose the inode glock in gfs2_file_buffered_write(), we want to re-acquire it instead of returning a short write. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-05-09 09:14:39 +02:00
Andreas Gruenbacher	fe24959a79	gfs2: Minor retry logic cleanup [ Upstream commit 124c458a401a2497f796e4f2d6cafac6edbea8e9 ] Clean up the retry logic in the read and write functions somewhat. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-05-09 09:14:39 +02:00
Andreas Gruenbacher	e4ea3286b1	gfs2: Prevent endless loops in gfs2_file_buffered_write [ Upstream commit 554c577cee95bdc1d03d9f457e57dc96eb791845 ] Currently, instead of performing a short write, iomap_file_buffered_write will fail when part of its iov iterator cannot be read. In contrast, gfs2_file_buffered_write will loop around if it can read part of the iov iterator, so we can end up in an endless loop. This should be fixed in iomap_file_buffered_write (and also generic_perform_write), but this comes a bit late in the 5.16 development cycle, so work around it in the filesystem by trimming the iov iterator to the known-good size for now. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-05-09 09:14:38 +02:00
Jens Axboe	37811e46a2	io_uring: check reserved fields for recv/recvmsg [ Upstream commit 5a1e99b61b0c81388cde0c808b3e4173907df19f ] We should check unused fields for non-zero and -EINVAL if they are set, making it consistent with other opcodes. Fixes: `aa1fa28fc7` ("io_uring: add support for recvmsg()") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-05-09 09:14:38 +02:00
Jens Axboe	79c10cb188	io_uring: check reserved fields for send/sendmsg [ Upstream commit 588faa1ea5eecb351100ee5d187b9be99210f70d ] We should check unused fields for non-zero and -EINVAL if they are set, making it consistent with other opcodes. Fixes: `0fa03c624d` ("io_uring: add support for sendmsg()") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-05-09 09:14:38 +02:00
Xiubo Li	732f861dd4	ceph: fix possible NULL pointer dereference for req->r_session commit 7acae6183cf37c48b8da48bbbdb78820fb3913f3 upstream. The request will be inserted into the ci->i_unsafe_dirops before assigning the req->r_session, so it's possible that we will hit NULL pointer dereference bug here. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/55327 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-09 09:14:30 +02:00
Greg Kroah-Hartman	2014a9eea4	Revert "Revert "coredump: Use the vma snapshot in fill_files_note"" This reverts commit `0c5b51622c`. It is no longer needed as we are able to update the abi at this point in time. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I5b2c6e89737a2ac647f19719a1ccf256c2794a02	2022-05-04 13:39:10 -07:00
Greg Kroah-Hartman	8b0bc3fc2a	Revert "Revert "coredump: Remove the WARN_ON in dump_vma_snapshot"" This reverts commit `c31598eb0b`. It is no longer needed as we are able to update the abi at this point in time. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ia44bf70cb550e4840e3be1be7c8b2b0bea0a330e	2022-05-04 13:39:10 -07:00
Greg Kroah-Hartman	23dd13c9f9	Revert "Revert "coredump: Snapshot the vmas in do_coredump"" This reverts commit `ff1561ac7f`. It is no longer needed as we are able to update the abi at this point in time. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I20acdca0aa5b59436e890887c03f125f027a9d45	2022-05-04 13:39:10 -07:00
Filipe Manana	4a0123bdb0	btrfs: fallback to blocking mode when doing async dio over multiple extents commit ca93e44bfb5fd7996b76f0f544999171f647f93b upstream Some users recently reported that MariaDB was getting a read corruption when using io_uring on top of btrfs. This started to happen in 5.16, after commit 51bd9563b6783d ("btrfs: fix deadlock due to page faults during direct IO reads and writes"). That changed btrfs to use the new iomap flag IOMAP_DIO_PARTIAL and to disable page faults before calling iomap_dio_rw(). This was necessary to fix deadlocks when the iovector corresponds to a memory mapped file region. That type of scenario is exercised by test case generic/647 from fstests. For this MariaDB scenario, we attempt to read 16K from file offset X using IOCB_NOWAIT and io_uring. In that range we have 4 extents, each with a size of 4K, and what happens is the following: 1) btrfs_direct_read() disables page faults and calls iomap_dio_rw(); 2) iomap creates a struct iomap_dio object, its reference count is initialized to 1 and its ->size field is initialized to 0; 3) iomap calls btrfs_dio_iomap_begin() with file offset X, which finds the first 4K extent, and setups an iomap for this extent consisting of a single page; 4) At iomap_dio_bio_iter(), we are able to access the first page of the buffer (struct iov_iter) with bio_iov_iter_get_pages() without triggering a page fault; 5) iomap submits a bio for this 4K extent (iomap_dio_submit_bio() -> btrfs_submit_direct()) and increments the refcount on the struct iomap_dio object to 2; The ->size field of the struct iomap_dio object is incremented to 4K; 6) iomap calls btrfs_iomap_begin() again, this time with a file offset of X + 4K. There we setup an iomap for the next extent that also has a size of 4K; 7) Then at iomap_dio_bio_iter() we call bio_iov_iter_get_pages(), which tries to access the next page (2nd page) of the buffer. This triggers a page fault and returns -EFAULT; 8) At __iomap_dio_rw() we see the -EFAULT, but we reset the error to 0 because we passed the flag IOMAP_DIO_PARTIAL to iomap and the struct iomap_dio object has a ->size value of 4K (we submitted a bio for an extent already). The 'wait_for_completion' variable is not set to true, because our iocb has IOCB_NOWAIT set; 9) At the bottom of __iomap_dio_rw(), we decrement the reference count of the struct iomap_dio object from 2 to 1. Because we were not the only ones holding a reference on it and 'wait_for_completion' is set to false, -EIOCBQUEUED is returned to btrfs_direct_read(), which just returns it up the callchain, up to io_uring; 10) The bio submitted for the first extent (step 5) completes and its bio endio function, iomap_dio_bio_end_io(), decrements the last reference on the struct iomap_dio object, resulting in calling iomap_dio_complete_work() -> iomap_dio_complete(). 11) At iomap_dio_complete() we adjust the iocb->ki_pos from X to X + 4K and return 4K (the amount of io done) to iomap_dio_complete_work(); 12) iomap_dio_complete_work() calls the iocb completion callback, iocb->ki_complete() with a second argument value of 4K (total io done) and the iocb with the adjust ki_pos of X + 4K. This results in completing the read request for io_uring, leaving it with a result of 4K bytes read, and only the first page of the buffer filled in, while the remaining 3 pages, corresponding to the other 3 extents, were not filled; 13) For the application, the result is unexpected because if we ask to read N bytes, it expects to get N bytes read as long as those N bytes don't cross the EOF (i_size). MariaDB reports this as an error, as it's not expecting a short read, since it knows it's asking for read operations fully within the i_size boundary. This is typical in many applications, but it may also be questionable if they should react to such short reads by issuing more read calls to get the remaining data. Nevertheless, the short read happened due to a change in btrfs regarding how it deals with page faults while in the middle of a read operation, and there's no reason why btrfs can't have the previous behaviour of returning the whole data that was requested by the application. The problem can also be triggered with the following simple program: /* Get O_DIRECT / #ifndef _GNU_SOURCE #define _GNU_SOURCE #endif #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <errno.h> #include <string.h> #include <liburing.h> int main(int argc, char argv[]) { char foo_path; struct io_uring ring; struct io_uring_sqe sqe; struct io_uring_cqe cqe; struct iovec iovec; int fd; long pagesize; void write_buf; void read_buf; ssize_t ret; int i; if (argc != 2) { fprintf(stderr, "Use: %s <directory>\n", argv[0]); return 1; } foo_path = malloc(strlen(argv[1]) + 5); if (!foo_path) { fprintf(stderr, "Failed to allocate memory for file path\n"); return 1; } strcpy(foo_path, argv[1]); strcat(foo_path, "/foo"); / * Create file foo with 2 extents, each with a size matching * the page size. Then allocate a buffer to read both extents * with io_uring, using O_DIRECT and IOCB_NOWAIT. Before doing * the read with io_uring, access the first page of the buffer * to fault it in, so that during the read we only trigger a * page fault when accessing the second page of the buffer. / fd = open(foo_path, O_CREAT \| O_TRUNC \| O_WRONLY \| O_DIRECT, 0666); if (fd == -1) { fprintf(stderr, "Failed to create file 'foo': %s (errno %d)", strerror(errno), errno); return 1; } pagesize = sysconf(_SC_PAGE_SIZE); ret = posix_memalign(&write_buf, pagesize, 2 pagesize); if (ret) { fprintf(stderr, "Failed to allocate write buffer\n"); return 1; } memset(write_buf, 0xab, pagesize); memset(write_buf + pagesize, 0xcd, pagesize); /* Create 2 extents, each with a size matching page size. / for (i = 0; i < 2; i++) { ret = pwrite(fd, write_buf + i pagesize, pagesize, i * pagesize); if (ret != pagesize) { fprintf(stderr, "Failed to write to file, ret = %ld errno %d (%s)\n", ret, errno, strerror(errno)); return 1; } ret = fsync(fd); if (ret != 0) { fprintf(stderr, "Failed to fsync file\n"); return 1; } } close(fd); fd = open(foo_path, O_RDONLY \| O_DIRECT); if (fd == -1) { fprintf(stderr, "Failed to open file 'foo': %s (errno %d)", strerror(errno), errno); return 1; } ret = posix_memalign(&read_buf, pagesize, 2 * pagesize); if (ret) { fprintf(stderr, "Failed to allocate read buffer\n"); return 1; } /* * Fault in only the first page of the read buffer. * We want to trigger a page fault for the 2nd page of the * read buffer during the read operation with io_uring * (O_DIRECT and IOCB_NOWAIT). / memset(read_buf, 0, 1); ret = io_uring_queue_init(1, &ring, 0); if (ret != 0) { fprintf(stderr, "Failed to create io_uring queue\n"); return 1; } sqe = io_uring_get_sqe(&ring); if (!sqe) { fprintf(stderr, "Failed to get io_uring sqe\n"); return 1; } iovec.iov_base = read_buf; iovec.iov_len = 2 pagesize; io_uring_prep_readv(sqe, fd, &iovec, 1, 0); ret = io_uring_submit_and_wait(&ring, 1); if (ret != 1) { fprintf(stderr, "Failed at io_uring_submit_and_wait()\n"); return 1; } ret = io_uring_wait_cqe(&ring, &cqe); if (ret < 0) { fprintf(stderr, "Failed at io_uring_wait_cqe()\n"); return 1; } printf("io_uring read result for file foo:\n\n"); printf(" cqe->res == %d (expected %d)\n", cqe->res, 2 * pagesize); printf(" memcmp(read_buf, write_buf) == %d (expected 0)\n", memcmp(read_buf, write_buf, 2 * pagesize)); io_uring_cqe_seen(&ring, cqe); io_uring_queue_exit(&ring); return 0; } When running it on an unpatched kernel: $ gcc io_uring_test.c -luring $ mkfs.btrfs -f /dev/sda $ mount /dev/sda /mnt/sda $ ./a.out /mnt/sda io_uring read result for file foo: cqe->res == 4096 (expected 8192) memcmp(read_buf, write_buf) == -205 (expected 0) After this patch, the read always returns 8192 bytes, with the buffer filled with the correct data. Although that reproducer always triggers the bug in my test vms, it's possible that it will not be so reliable on other environments, as that can happen if the bio for the first extent completes and decrements the reference on the struct iomap_dio object before we do the atomic_dec_and_test() on the reference at __iomap_dio_rw(). Fix this in btrfs by having btrfs_dio_iomap_begin() return -EAGAIN whenever we try to satisfy a non blocking IO request (IOMAP_NOWAIT flag set) over a range that spans multiple extents (or a mix of extents and holes). This avoids returning success to the caller when we only did partial IO, which is not optimal for writes and for reads it's actually incorrect, as the caller doesn't expect to get less bytes read than it has requested (unless EOF is crossed), as previously mentioned. This is also the type of behaviour that xfs follows (xfs_direct_write_iomap_begin()), even though it doesn't use IOMAP_DIO_PARTIAL. A test case for fstests will follow soon. Link: https://lore.kernel.org/linux-btrfs/CABVffEM0eEWho+206m470rtM0d9J8ue85TtR-A_oVTuGLWFicA@mail.gmail.com/ Link: https://lore.kernel.org/linux-btrfs/CAHF2GV6U32gmqSjLe=XKgfcZAmLCiH26cJ2OnHGp5x=VAH4OHQ@mail.gmail.com/ CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-01 17:22:34 +02:00
Filipe Manana	c81c4f5666	btrfs: fix deadlock due to page faults during direct IO reads and writes commit 51bd9563b6783de8315f38f7baed949e77c42311 upstream If we do a direct IO read or write when the buffer given by the user is memory mapped to the file range we are going to do IO, we end up ending in a deadlock. This is triggered by the new test case generic/647 from fstests. For a direct IO read we get a trace like this: [967.872718] INFO: task mmap-rw-fault:12176 blocked for more than 120 seconds. [967.874161] Not tainted 5.14.0-rc7-btrfs-next-95 #1 [967.874909] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [967.875983] task:mmap-rw-fault state:D stack: 0 pid:12176 ppid: 11884 flags:0x00000000 [967.875992] Call Trace: [967.875999] __schedule+0x3ca/0xe10 [967.876015] schedule+0x43/0xe0 [967.876020] wait_extent_bit.constprop.0+0x1eb/0x260 [btrfs] [967.876109] ? do_wait_intr_irq+0xb0/0xb0 [967.876118] lock_extent_bits+0x37/0x90 [btrfs] [967.876150] btrfs_lock_and_flush_ordered_range+0xa9/0x120 [btrfs] [967.876184] ? extent_readahead+0xa7/0x530 [btrfs] [967.876214] extent_readahead+0x32d/0x530 [btrfs] [967.876253] ? lru_cache_add+0x104/0x220 [967.876255] ? kvm_sched_clock_read+0x14/0x40 [967.876258] ? sched_clock_cpu+0xd/0x110 [967.876263] ? lock_release+0x155/0x4a0 [967.876271] read_pages+0x86/0x270 [967.876274] ? lru_cache_add+0x125/0x220 [967.876281] page_cache_ra_unbounded+0x1a3/0x220 [967.876291] filemap_fault+0x626/0xa20 [967.876303] __do_fault+0x36/0xf0 [967.876308] __handle_mm_fault+0x83f/0x15f0 [967.876322] handle_mm_fault+0x9e/0x260 [967.876327] __get_user_pages+0x204/0x620 [967.876332] ? get_user_pages_unlocked+0x69/0x340 [967.876340] get_user_pages_unlocked+0xd3/0x340 [967.876349] internal_get_user_pages_fast+0xbca/0xdc0 [967.876366] iov_iter_get_pages+0x8d/0x3a0 [967.876374] bio_iov_iter_get_pages+0x82/0x4a0 [967.876379] ? lock_release+0x155/0x4a0 [967.876387] iomap_dio_bio_actor+0x232/0x410 [967.876396] iomap_apply+0x12a/0x4a0 [967.876398] ? iomap_dio_rw+0x30/0x30 [967.876414] __iomap_dio_rw+0x29f/0x5e0 [967.876415] ? iomap_dio_rw+0x30/0x30 [967.876420] ? lock_acquired+0xf3/0x420 [967.876429] iomap_dio_rw+0xa/0x30 [967.876431] btrfs_file_read_iter+0x10b/0x140 [btrfs] [967.876460] new_sync_read+0x118/0x1a0 [967.876472] vfs_read+0x128/0x1b0 [967.876477] __x64_sys_pread64+0x90/0xc0 [967.876483] do_syscall_64+0x3b/0xc0 [967.876487] entry_SYSCALL_64_after_hwframe+0x44/0xae [967.876490] RIP: 0033:0x7fb6f2c038d6 [967.876493] RSP: 002b:00007fffddf586b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000011 [967.876496] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007fb6f2c038d6 [967.876498] RDX: 0000000000001000 RSI: 00007fb6f2c17000 RDI: 0000000000000003 [967.876499] RBP: 0000000000001000 R08: 0000000000000003 R09: 0000000000000000 [967.876501] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000003 [967.876502] R13: 0000000000000000 R14: 00007fb6f2c17000 R15: 0000000000000000 This happens because at btrfs_dio_iomap_begin() we lock the extent range and return with it locked - we only unlock in the endio callback, at end_bio_extent_readpage() -> endio_readpage_release_extent(). Then after iomap called the btrfs_dio_iomap_begin() callback, it triggers the page faults that resulting in reading the pages, through the readahead callback btrfs_readahead(), and through there we end to attempt to lock again the same extent range (or a subrange of what we locked before), resulting in the deadlock. For a direct IO write, the scenario is a bit different, and it results in trace like this: [1132.442520] run fstests generic/647 at 2021-08-31 18:53:35 [1330.349355] INFO: task mmap-rw-fault:184017 blocked for more than 120 seconds. [1330.350540] Not tainted 5.14.0-rc7-btrfs-next-95 #1 [1330.351158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1330.351900] task:mmap-rw-fault state:D stack: 0 pid:184017 ppid:183725 flags:0x00000000 [1330.351906] Call Trace: [1330.351913] __schedule+0x3ca/0xe10 [1330.351930] schedule+0x43/0xe0 [1330.351935] btrfs_start_ordered_extent+0x108/0x1c0 [btrfs] [1330.352020] ? do_wait_intr_irq+0xb0/0xb0 [1330.352028] btrfs_lock_and_flush_ordered_range+0x8c/0x120 [btrfs] [1330.352064] ? extent_readahead+0xa7/0x530 [btrfs] [1330.352094] extent_readahead+0x32d/0x530 [btrfs] [1330.352133] ? lru_cache_add+0x104/0x220 [1330.352135] ? kvm_sched_clock_read+0x14/0x40 [1330.352138] ? sched_clock_cpu+0xd/0x110 [1330.352143] ? lock_release+0x155/0x4a0 [1330.352151] read_pages+0x86/0x270 [1330.352155] ? lru_cache_add+0x125/0x220 [1330.352162] page_cache_ra_unbounded+0x1a3/0x220 [1330.352172] filemap_fault+0x626/0xa20 [1330.352176] ? filemap_map_pages+0x18b/0x660 [1330.352184] __do_fault+0x36/0xf0 [1330.352189] __handle_mm_fault+0x1253/0x15f0 [1330.352203] handle_mm_fault+0x9e/0x260 [1330.352208] __get_user_pages+0x204/0x620 [1330.352212] ? get_user_pages_unlocked+0x69/0x340 [1330.352220] get_user_pages_unlocked+0xd3/0x340 [1330.352229] internal_get_user_pages_fast+0xbca/0xdc0 [1330.352246] iov_iter_get_pages+0x8d/0x3a0 [1330.352254] bio_iov_iter_get_pages+0x82/0x4a0 [1330.352259] ? lock_release+0x155/0x4a0 [1330.352266] iomap_dio_bio_actor+0x232/0x410 [1330.352275] iomap_apply+0x12a/0x4a0 [1330.352278] ? iomap_dio_rw+0x30/0x30 [1330.352292] __iomap_dio_rw+0x29f/0x5e0 [1330.352294] ? iomap_dio_rw+0x30/0x30 [1330.352306] btrfs_file_write_iter+0x238/0x480 [btrfs] [1330.352339] new_sync_write+0x11f/0x1b0 [1330.352344] ? NF_HOOK_LIST.constprop.0.cold+0x31/0x3e [1330.352354] vfs_write+0x292/0x3c0 [1330.352359] __x64_sys_pwrite64+0x90/0xc0 [1330.352365] do_syscall_64+0x3b/0xc0 [1330.352369] entry_SYSCALL_64_after_hwframe+0x44/0xae [1330.352372] RIP: 0033:0x7f4b0a580986 [1330.352379] RSP: 002b:00007ffd34d75418 EFLAGS: 00000246 ORIG_RAX: 0000000000000012 [1330.352382] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f4b0a580986 [1330.352383] RDX: 0000000000001000 RSI: 00007f4b0a3a4000 RDI: 0000000000000003 [1330.352385] RBP: 00007f4b0a3a4000 R08: 0000000000000003 R09: 0000000000000000 [1330.352386] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 [1330.352387] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Unlike for reads, at btrfs_dio_iomap_begin() we return with the extent range unlocked, but later when the page faults are triggered and we try to read the extents, we end up btrfs_lock_and_flush_ordered_range() where we find the ordered extent for our write, created by the iomap callback btrfs_dio_iomap_begin(), and we wait for it to complete, which makes us deadlock since we can't complete the ordered extent without reading the pages (the iomap code only submits the bio after the pages are faulted in). Fix this by setting the nofault attribute of the given iov_iter and retry the direct IO read/write if we get an -EFAULT error returned from iomap. For reads, also disable page faults completely, this is because when we read from a hole or a prealloc extent, we can still trigger page faults due to the call to iov_iter_zero() done by iomap - at the moment, it is oblivious to the value of the ->nofault attribute of an iov_iter. We also need to keep track of the number of bytes written or read, and pass it to iomap_dio_rw(), as well as use the new flag IOMAP_DIO_PARTIAL. This depends on the iov_iter and iomap changes introduced in commit c03098d4b9ad ("Merge tag 'gfs2-v5.15-rc5-mmap-fault' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2"). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-01 17:22:33 +02:00

1 2 3 4 5 ...

74354 Commits