Commit Graph

74344 Commits

Author SHA1 Message Date
Liujie Xie 50e4cd9df7 ANDROID: vendor_hooks: Add hooks for memory when debug
Add vendors hooks for recording memory used

Bug: 182443489
Bug: 234407991
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Change-Id: I62d8bb2b6650d8b187b433f97eb833ef0b784df1
2022-06-02 14:37:19 -07:00
Greg Kroah-Hartman 910b540ffa This is the 5.15.41 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmKErdYACgkQONu9yGCS
 aT4LIg//bqU/qlVBJq3v3QCfN93SOXwECuPXi0e8p3aG/ptn1pzV0A5+rDD4TXeU
 mpw8oWSNQC0Wn8mag9poCvLFaSHuMIo2wPoPOwpRyb0BfhXCaWE4s1HbX49dNi2l
 9V76KC4JH69zaHtEmnBy83zzr4nyRG6q3bTd1hD3uJaSg6+2E+DfIK7MUSAyhyPU
 DSi2CSHgxZkBEJIVncCPPhz4HX7RJIav51EwAZlnqW6eX3osEe9Qh1b82qcatXIF
 rC/w74mQoRnghL0FRwQ5Z4mTEcgBI3lYoKJMkt8iZgOm9nBq1F62gXysKojmjDYc
 AvP97Jt+hF9hDFUfR2wW/sA/rsRhw1mx0LoFu96fc15Nxiyh8b/MFD+b3a2CqtN9
 szggwoezx90RbGOsc1zNhoX3VmZ2xMeVyAVg4RCscxXiO5FR1DqV4mUlbXoQx9ah
 I9+GYtc0vs9BP9pucK2kx7fzJ+eW93jIZVXHuh6mcspu7sdTBQam+ERH1gMIyCki
 OXcg0HCTLQi26MNMkrdX+RebwHxB7tbLXUXqYzQKEWjsLtCyv7LYmsRV+RZSyZ1p
 vqKs4gTjLPmcINebsT80mYZSH560lXkjl6Z0H5U23NQO8RG3QewrKZkHqwOn5zyQ
 aF5fnmVTVcMbT8BK3qCadr00Giz5usnw8R9am5EY1ELzo48207A=
 =35Fl
 -----END PGP SIGNATURE-----

Merge 5.15.41 into android13-5.15

Changes in 5.15.41
	batman-adv: Don't skb_split skbuffs with frag_list
	iwlwifi: iwl-dbg: Use del_timer_sync() before freeing
	hwmon: (tmp401) Add OF device ID table
	mac80211: Reset MBSSID parameters upon connection
	net: Fix features skip in for_each_netdev_feature()
	net: mscc: ocelot: fix last VCAP IS1/IS2 filter persisting in hardware when deleted
	net: mscc: ocelot: fix VCAP IS2 filters matching on both lookups
	net: mscc: ocelot: restrict tc-trap actions to VCAP IS2 lookup 0
	net: mscc: ocelot: avoid corrupting hardware counters when moving VCAP filters
	fbdev: simplefb: Cleanup fb_info in .fb_destroy rather than .remove
	fbdev: efifb: Cleanup fb_info in .fb_destroy rather than .remove
	fbdev: vesafb: Cleanup fb_info in .fb_destroy rather than .remove
	platform/surface: aggregator: Fix initialization order when compiling as builtin module
	ice: Fix race during aux device (un)plugging
	ice: fix PTP stale Tx timestamps cleanup
	ipv4: drop dst in multicast routing path
	drm/nouveau: Fix a potential theorical leak in nouveau_get_backlight_name()
	netlink: do not reset transport header in netlink_recvmsg()
	net: chelsio: cxgb4: Avoid potential negative array offset
	fbdev: efifb: Fix a use-after-free due early fb_info cleanup
	sfc: Use swap() instead of open coding it
	net: sfc: fix memory leak due to ptp channel
	mac80211_hwsim: call ieee80211_tx_prepare_skb under RCU protection
	nfs: fix broken handling of the softreval mount option
	ionic: fix missing pci_release_regions() on error in ionic_probe()
	dim: initialize all struct fields
	hwmon: (ltq-cputemp) restrict it to SOC_XWAY
	procfs: prevent unprivileged processes accessing fdinfo dir
	selftests: vm: Makefile: rename TARGETS to VMTARGETS
	arm64: vdso: fix makefile dependency on vdso.so
	virtio: fix virtio transitional ids
	s390/ctcm: fix variable dereferenced before check
	s390/ctcm: fix potential memory leak
	s390/lcs: fix variable dereferenced before check
	net/sched: act_pedit: really ensure the skb is writable
	net: ethernet: mediatek: ppe: fix wrong size passed to memset()
	net: bcmgenet: Check for Wake-on-LAN interrupt probe deferral
	drm/vc4: hdmi: Fix build error for implicit function declaration
	net: dsa: bcm_sf2: Fix Wake-on-LAN with mac_link_down()
	net/smc: non blocking recvmsg() return -EAGAIN when no data and signal_pending
	net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe()
	tls: Fix context leak on tls_device_down
	drm/vmwgfx: Fix fencing on SVGAv3
	gfs2: Fix filesystem block deallocation for short writes
	hwmon: (f71882fg) Fix negative temperature
	RDMA/irdma: Fix deadlock in irdma_cleanup_cm_core()
	iommu: arm-smmu: disable large page mappings for Nvidia arm-smmu
	ASoC: max98090: Reject invalid values in custom control put()
	ASoC: max98090: Generate notifications on changes for custom control
	ASoC: ops: Validate input values in snd_soc_put_volsw_range()
	s390: disable -Warray-bounds
	ASoC: SOF: Fix NULL pointer exception in sof_pci_probe callback
	net: emaclite: Don't advertise 1000BASE-T and do auto negotiation
	net: sfp: Add tx-fault workaround for Huawei MA5671A SFP ONT
	secure_seq: use the 64 bits of the siphash for port offset calculation
	tcp: use different parts of the port_offset for index and offset
	tcp: resalt the secret every 10 seconds
	tcp: add small random increments to the source port
	tcp: dynamically allocate the perturb table used by source ports
	tcp: increase source port perturb table to 2^16
	tcp: drop the hash_32() part from the index calculation
	interconnect: Restore sync state by ignoring ipa-virt in provider count
	firmware_loader: use kernel credentials when reading firmware
	KVM: PPC: Book3S PR: Enable MSR_DR for switch_mmu_context()
	usb: xhci-mtk: fix fs isoc's transfer error
	x86/mm: Fix marking of unused sub-pmd ranges
	tty/serial: digicolor: fix possible null-ptr-deref in digicolor_uart_probe()
	tty: n_gsm: fix buffer over-read in gsm_dlci_data()
	tty: n_gsm: fix mux activation issues in gsm_config()
	usb: cdc-wdm: fix reading stuck on device close
	usb: typec: tcpci: Don't skip cleanup in .remove() on error
	usb: typec: tcpci_mt6360: Update for BMC PHY setting
	USB: serial: pl2303: add device id for HP LM930 Display
	USB: serial: qcserial: add support for Sierra Wireless EM7590
	USB: serial: option: add Fibocom L610 modem
	USB: serial: option: add Fibocom MA510 modem
	slimbus: qcom: Fix IRQ check in qcom_slim_probe
	fsl_lpuart: Don't enable interrupts too early
	serial: 8250_mtk: Fix UART_EFR register address
	serial: 8250_mtk: Fix register address for XON/XOFF character
	ceph: fix setting of xattrs on async created inodes
	Revert "mm/memory-failure.c: skip huge_zero_page in memory_failure()"
	mm/huge_memory: do not overkill when splitting huge_zero_page
	drm/vmwgfx: Disable command buffers on svga3 without gbobjects
	drm/nouveau/tegra: Stop using iommu_present()
	i40e: i40e_main: fix a missing check on list iterator
	net: atlantic: always deep reset on pm op, fixing up my null deref regression
	net: phy: Fix race condition on link status change
	writeback: Avoid skipping inode writeback
	cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp()
	arm[64]/memremap: don't abuse pfn_valid() to ensure presence of linear map
	net: phy: micrel: Do not use kszphy_suspend/resume for KSZ8061
	net: phy: micrel: Pass .probe for KS8737
	SUNRPC: Ensure that the gssproxy client can start in a connected state
	drm/vmwgfx: Initialize drm_mode_fb_cmd2
	Revert "drm/amd/pm: keep the BACO feature enabled for suspend"
	dma-buf: call dma_buf_stats_setup after dmabuf is in valid list
	mm/hwpoison: use pr_err() instead of dump_page() in get_any_page()
	SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()
	ping: fix address binding wrt vrf
	usb: gadget: uvc: rename function to be more consistent
	usb: gadget: uvc: allow for application to cleanly shutdown
	Linux 5.15.41

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia65cdbbddf553237d6a3a38efb9bcb2fcc3990ec
2022-05-18 11:31:34 +02:00
Trond Myklebust 54f6834b28 SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()
commit f00432063db1a0db484e85193eccc6845435b80e upstream.

We must ensure that all sockets are closed before we call xprt_free()
and release the reference to the net namespace. The problem is that
calling fput() will defer closing the socket until delayed_fput() gets
called.
Let's fix the situation by allowing rpciod and the transport teardown
code (which runs on the system wq) to call __fput_sync(), and directly
close the socket.

Reported-by: Felix Fu <foyjog@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: a73881c96d ("SUNRPC: Fix an Oops in udp_poll()")
Cc: stable@vger.kernel.org # 5.1.x: 3be232f11a3c: SUNRPC: Prevent immediate close+reconnect
Cc: stable@vger.kernel.org # 5.1.x: 89f42494f92f: SUNRPC: Don't call connect() more than once on a TCP socket
Cc: stable@vger.kernel.org # 5.1.x
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Meena Shanmugam <meenashanmugam@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-18 10:26:57 +02:00
Jing Xia 80b6fb3d18 writeback: Avoid skipping inode writeback
commit 846a3351ddfe4a86eede4bb26a205c3f38ef84d3 upstream.

We have run into an issue that a task gets stuck in
balance_dirty_pages_ratelimited() when perform I/O stress testing.
The reason we observed is that an I_DIRTY_PAGES inode with lots
of dirty pages is in b_dirty_time list and standard background
writeback cannot writeback the inode.
After studing the relevant code, the following scenario may lead
to the issue:

task1                                   task2
-----                                   -----
fuse_flush
 write_inode_now //in b_dirty_time
  writeback_single_inode
   __writeback_single_inode
                                 fuse_write_end
                                  filemap_dirty_folio
                                   __xa_set_mark:PAGECACHE_TAG_DIRTY
    lock inode->i_lock
    if mapping tagged PAGECACHE_TAG_DIRTY
    inode->i_state |= I_DIRTY_PAGES
    unlock inode->i_lock
                                   __mark_inode_dirty:I_DIRTY_PAGES
                                      lock inode->i_lock
                                      -was dirty,inode stays in
                                      -b_dirty_time
                                      unlock inode->i_lock

   if(!(inode->i_state & I_DIRTY_All))
      -not true,so nothing done

This patch moves the dirty inode to b_dirty list when the inode
currently is not queued in b_io or b_more_io list at the end of
writeback_single_inode.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
CC: stable@vger.kernel.org
Fixes: 0ae45f63d4 ("vfs: add support for a lazytime mount option")
Signed-off-by: Jing Xia <jing.xia@unisoc.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220510023514.27399-1-jing.xia@unisoc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-18 10:26:56 +02:00
Jeff Layton 8c09cb115e ceph: fix setting of xattrs on async created inodes
commit 620239d9a32e9fe27c9204ec11e40058671aeeb6 upstream.

Currently when we create a file, we spin up an xattr buffer to send
along with the create request. If we end up doing an async create
however, then we currently pass down a zero-length xattr buffer.

Fix the code to send down the xattr buffer in req->r_pagelist. If the
xattrs span more than a page, however give up and don't try to do an
async create.

Cc: stable@vger.kernel.org
URL: https://bugzilla.redhat.com/show_bug.cgi?id=2063929
Fixes: 9a8d03ca2e ("ceph: attempt to do async create when possible")
Reported-by: John Fortin <fortinj66@gmail.com>
Reported-by: Sri Ramanujam <sri@ramanujam.io>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-18 10:26:55 +02:00
Andreas Gruenbacher 41d5ad9596 gfs2: Fix filesystem block deallocation for short writes
[ Upstream commit d031a8866e709c9d1ee5537a321b6192b4d2dc5b ]

When a write cannot be carried out in full, gfs2_iomap_end() releases
blocks that have been allocated for this write but haven't been used.

To compute the end of the allocation, gfs2_iomap_end() incorrectly
rounded the end of the attempted write down to the next block boundary
to arrive at the end of the allocation.  It would have to round up, but
the end of the allocation is also available as iomap->offset +
iomap->length, so just use that instead.

In addition, use round_up() for computing the start of the unused range.

Fixes: 64bc06bb32 ("gfs2: iomap buffered write support")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-18 10:26:51 +02:00
Kalesh Singh 62cbb09899 procfs: prevent unprivileged processes accessing fdinfo dir
[ Upstream commit 1927e498aee1757b3df755a194cbfc5cc0f2b663 ]

The file permissions on the fdinfo dir from were changed from
S_IRUSR|S_IXUSR to S_IRUGO|S_IXUGO, and a PTRACE_MODE_READ check was added
for opening the fdinfo files [1].  However, the ptrace permission check
was not added to the directory, allowing anyone to get the open FD numbers
by reading the fdinfo directory.

Add the missing ptrace permission check for opening the fdinfo directory.

[1] https://lkml.kernel.org/r/20210308170651.919148-1-kaleshsingh@google.com

Link: https://lkml.kernel.org/r/20210713162008.1056986-1-kaleshsingh@google.com
Fixes: 7bc3fa0172 ("procfs: allow reading fdinfo with PTRACE_MODE_READ")
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-18 10:26:50 +02:00
Dan Aloni 1a2e139e68 nfs: fix broken handling of the softreval mount option
[ Upstream commit 085d16d5f949b64713d5e960d6c9bbf51bc1d511 ]

Turns out that ever since this mount option was added, passing
`softreval` in NFS mount options cancelled all other flags while not
affecting the underlying flag `NFS_MOUNT_SOFTREVAL`.

Fixes: c74dfe97c1 ("NFS: Add mount option 'softreval'")
Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-18 10:26:49 +02:00
Greg Kroah-Hartman 0bfa00b6ba Merge 5.15.40 into android13-5.15
Changes in 5.15.40
	x86/lib/atomic64_386_32: Rename things
	x86: Prepare asm files for straight-line-speculation
	x86: Prepare inline-asm for straight-line-speculation
	objtool: Add straight-line-speculation validation
	x86/alternative: Relax text_poke_bp() constraint
	kbuild: move objtool_args back to scripts/Makefile.build
	x86: Add straight-line-speculation mitigation
	tools arch: Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem memcpy'
	kvm/emulate: Fix SETcc emulation function offsets with SLS
	crypto: x86/poly1305 - Fixup SLS
	objtool: Fix SLS validation for kcov tail-call replacement
	Bluetooth: Fix the creation of hdev->name
	rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition
	udf: Avoid using stale lengthOfImpUse
	mm: fix missing cache flush for all tail pages of compound page
	mm: hugetlb: fix missing cache flush in copy_huge_page_from_user()
	mm: shmem: fix missing cache flush in shmem_mfill_atomic_pte()
	mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic()
	mm/hwpoison: fix error page recovered but reported "not recovered"
	mm/mlock: fix potential imbalanced rlimit ucounts adjustment
	mm: fix invalid page pointer returned with FOLL_PIN gups
	Linux 5.15.40

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib068d0412565187435c8aeeeb22b683b6aa3a9b1
2022-05-18 09:40:16 +02:00
Greg Kroah-Hartman 2bde857bee Merge 5.15.39 into android13-5.15
Changes in 5.15.39
	MIPS: Fix CP0 counter erratum detection for R4k CPUs
	parisc: Merge model and model name into one line in /proc/cpuinfo
	ALSA: hda/realtek: Add quirk for Yoga Duet 7 13ITL6 speakers
	ALSA: fireworks: fix wrong return count shorter than expected by 4 bytes
	mmc: sdhci-msm: Reset GCC_SDCC_BCR register for SDHC
	mmc: sunxi-mmc: Fix DMA descriptors allocated above 32 bits
	mmc: core: Set HS clock speed before sending HS CMD13
	gpiolib: of: fix bounds check for 'gpio-reserved-ranges'
	x86/fpu: Prevent FPU state corruption
	KVM: x86/svm: Account for family 17h event renumberings in amd_pmc_perf_hw_id
	iommu/vt-d: Calculate mask for non-aligned flushes
	iommu/arm-smmu-v3: Fix size calculation in arm_smmu_mm_invalidate_range()
	drm/amd/display: Avoid reading audio pattern past AUDIO_CHANNELS_COUNT
	drm/amdgpu: do not use passthrough mode in Xen dom0
	RISC-V: relocate DTB if it's outside memory region
	Revert "SUNRPC: attempt AF_LOCAL connect on setup"
	timekeeping: Mark NMI safe time accessors as notrace
	firewire: fix potential uaf in outbound_phy_packet_callback()
	firewire: remove check of list iterator against head past the loop body
	firewire: core: extend card->lock in fw_core_handle_bus_reset
	net: stmmac: disable Split Header (SPH) for Intel platforms
	genirq: Synchronize interrupt thread startup
	ASoC: da7219: Fix change notifications for tone generator frequency
	ASoC: wm8958: Fix change notifications for DSP controls
	ASoC: meson: Fix event generation for AUI ACODEC mux
	ASoC: meson: Fix event generation for G12A tohdmi mux
	ASoC: meson: Fix event generation for AUI CODEC mux
	s390/dasd: fix data corruption for ESE devices
	s390/dasd: prevent double format of tracks for ESE devices
	s390/dasd: Fix read for ESE with blksize < 4k
	s390/dasd: Fix read inconsistency for ESE DASD devices
	can: grcan: grcan_close(): fix deadlock
	can: isotp: remove re-binding of bound socket
	can: grcan: use ofdev->dev when allocating DMA memory
	can: grcan: grcan_probe(): fix broken system id check for errata workaround needs
	can: grcan: only use the NAPI poll budget for RX
	nfc: replace improper check device_is_registered() in netlink related functions
	nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs
	NFC: netlink: fix sleep in atomic bug when firmware download timeout
	gpio: visconti: Fix fwnode of GPIO IRQ
	gpio: pca953x: fix irq_stat not updated when irq is disabled (irq_mask not set)
	hwmon: (adt7470) Fix warning on module removal
	hwmon: (pmbus) disable PEC if not enabled
	ASoC: dmaengine: Restore NULL prepare_slave_config() callback
	ASoC: soc-ops: fix error handling
	iommu/vt-d: Drop stop marker messages
	iommu/dart: check return value after calling platform_get_resource()
	net/mlx5e: Fix trust state reset in reload
	net/mlx5e: Don't match double-vlan packets if cvlan is not set
	net/mlx5e: CT: Fix queued up restore put() executing after relevant ft release
	net/mlx5e: Fix the calling of update_buffer_lossy() API
	net/mlx5: Avoid double clear or set of sync reset requested
	net/mlx5: Fix deadlock in sync reset flow
	selftests/seccomp: Don't call read() on TTY from background pgrp
	SUNRPC release the transport of a relocated task with an assigned transport
	RDMA/siw: Fix a condition race issue in MPA request processing
	RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state
	RDMA/irdma: Reduce iWARP QP destroy time
	RDMA/irdma: Fix possible crash due to NULL netdev in notifier
	NFSv4: Don't invalidate inode attributes on delegation return
	net: ethernet: mediatek: add missing of_node_put() in mtk_sgmii_init()
	net: dsa: mt7530: add missing of_node_put() in mt7530_setup()
	net: stmmac: dwmac-sun8i: add missing of_node_put() in sun8i_dwmac_register_mdio_mux()
	net: mdio: Fix ENOMEM return value in BCM6368 mux bus controller
	net: cpsw: add missing of_node_put() in cpsw_probe_dt()
	net: igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter()
	net: emaclite: Add error handling for of_address_to_resource()
	selftests/net: so_txtime: fix parsing of start time stamp on 32 bit systems
	selftests/net: so_txtime: usage(): fix documentation of default clock
	drm/msm/dp: remove fail safe mode related code
	btrfs: do not BUG_ON() on failure to update inode when setting xattr
	hinic: fix bug of wq out of bound access
	mld: respect RCU rules in ip6_mc_source() and ip6_mc_msfilter()
	rxrpc: Enable IPv6 checksums on transport socket
	selftests: mirror_gre_bridge_1q: Avoid changing PVID while interface is operational
	bnxt_en: Fix possible bnxt_open() failure caused by wrong RFS flag
	bnxt_en: Fix unnecessary dropping of RX packets
	selftests: ocelot: tc_flower_chains: specify conform-exceed action for policer
	smsc911x: allow using IRQ0
	btrfs: force v2 space cache usage for subpage mount
	btrfs: always log symlinks in full mode
	drm/amdgpu: unify BO evicting method in amdgpu_ttm
	drm/amdgpu: explicitly check for s0ix when evicting resources
	drm/amdgpu: don't set s3 and s0ix at the same time
	drm/amdgpu: Ensure HDA function is suspended before ASIC reset
	gpio: mvebu: drop pwm base assignment
	kvm: x86/cpuid: Only provide CPUID leaf 0xA if host has architectural PMU
	fbdev: Make fb_release() return -ENODEV if fbdev was unregistered
	net/mlx5: Fix slab-out-of-bounds while reading resource dump menu
	net/mlx5e: Lag, Fix use-after-free in fib event handler
	net/mlx5e: Lag, Fix fib_info pointer assignment
	net/mlx5e: Lag, Don't skip fib events on current dst
	iommu/dart: Add missing module owner to ops structure
	kvm: selftests: do not use bitfields larger than 32-bits for PTEs
	KVM: selftests: Silence compiler warning in the kvm_page_table_test
	x86/kvm: Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume
	KVM: x86: Do not change ICR on write to APIC_SELF_IPI
	KVM: x86/mmu: avoid NULL-pointer dereference on page freeing bugs
	KVM: LAPIC: Enable timer posted-interrupt only when mwait/hlt is advertised
	selftest/vm: verify mmap addr in mremap_test
	selftest/vm: verify remap destination address in mremap_test
	mmc: rtsx: add 74 Clocks in power on flow
	Revert "parisc: Mark sched_clock unstable only if clocks are not syncronized"
	rcu: Fix callbacks processing time limit retaining cond_resched()
	rcu: Apply callbacks processing time limit only on softirq
	PCI: pci-bridge-emul: Add description for class_revision field
	PCI: pci-bridge-emul: Add definitions for missing capabilities registers
	PCI: aardvark: Add support for DEVCAP2, DEVCTL2, LNKCAP2 and LNKCTL2 registers on emulated bridge
	PCI: aardvark: Clear all MSIs at setup
	PCI: aardvark: Comment actions in driver remove method
	PCI: aardvark: Disable bus mastering when unbinding driver
	PCI: aardvark: Mask all interrupts when unbinding driver
	PCI: aardvark: Fix memory leak in driver unbind
	PCI: aardvark: Assert PERST# when unbinding driver
	PCI: aardvark: Disable link training when unbinding driver
	PCI: aardvark: Disable common PHY when unbinding driver
	PCI: aardvark: Replace custom PCIE_CORE_INT_* macros with PCI_INTERRUPT_*
	PCI: aardvark: Rewrite IRQ code to chained IRQ handler
	PCI: aardvark: Check return value of generic_handle_domain_irq() when processing INTx IRQ
	PCI: aardvark: Make MSI irq_chip structures static driver structures
	PCI: aardvark: Make msi_domain_info structure a static driver structure
	PCI: aardvark: Use dev_fwnode() instead of of_node_to_fwnode(dev->of_node)
	PCI: aardvark: Refactor unmasking summary MSI interrupt
	PCI: aardvark: Add support for masking MSI interrupts
	PCI: aardvark: Fix setting MSI address
	PCI: aardvark: Enable MSI-X support
	PCI: aardvark: Add support for ERR interrupt on emulated bridge
	PCI: aardvark: Optimize writing PCI_EXP_RTCTL_PMEIE and PCI_EXP_RTSTA_PME on emulated bridge
	PCI: aardvark: Add support for PME interrupts
	PCI: aardvark: Fix support for PME requester on emulated bridge
	PCI: aardvark: Use separate INTA interrupt for emulated root bridge
	PCI: aardvark: Remove irq_mask_ack() callback for INTx interrupts
	PCI: aardvark: Don't mask irq when mapping
	PCI: aardvark: Drop __maybe_unused from advk_pcie_disable_phy()
	PCI: aardvark: Update comment about link going down after link-up
	Linux 5.15.39

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic51d6d05a0d99156c6fd844786e984aff8e7386a
2022-05-18 09:38:58 +02:00
Greg Kroah-Hartman 4154968fe8 Merge 5.15.38 into android13-5.15
Changes in 5.15.38
	usb: mtu3: fix USB 3.0 dual-role-switch from device to host
	USB: quirks: add a Realtek card reader
	USB: quirks: add STRING quirk for VCOM device
	USB: serial: whiteheat: fix heap overflow in WHITEHEAT_GET_DTR_RTS
	USB: serial: cp210x: add PIDs for Kamstrup USB Meter Reader
	USB: serial: option: add support for Cinterion MV32-WA/MV32-WB
	USB: serial: option: add Telit 0x1057, 0x1058, 0x1075 compositions
	usb: xhci: tegra:Fix PM usage reference leak of tegra_xusb_unpowergate_partitions
	xhci: Enable runtime PM on second Alderlake controller
	xhci: stop polling roothubs after shutdown
	xhci: increase usb U3 -> U0 link resume timeout from 100ms to 500ms
	iio: dac: ad5592r: Fix the missing return value.
	iio: dac: ad5446: Fix read_raw not returning set value
	iio: magnetometer: ak8975: Fix the error handling in ak8975_power_on()
	iio: imu: inv_icm42600: Fix I2C init possible nack
	usb: misc: fix improper handling of refcount in uss720_probe()
	usb: core: Don't hold the device lock while sleeping in do_proc_control()
	usb: typec: ucsi: Fix reuse of completion structure
	usb: typec: ucsi: Fix role swapping
	usb: gadget: uvc: Fix crash when encoding data for usb request
	usb: gadget: configfs: clear deactivation flag in configfs_composite_unbind()
	usb: dwc3: Try usb-role-switch first in dwc3_drd_init
	usb: dwc3: core: Fix tx/rx threshold settings
	usb: dwc3: core: Only handle soft-reset in DCTL
	usb: dwc3: gadget: Return proper request status
	usb: dwc3: pci: add support for the Intel Meteor Lake-P
	usb: cdns3: Fix issue for clear halt endpoint
	usb: phy: generic: Get the vbus supply
	serial: imx: fix overrun interrupts in DMA mode
	serial: amba-pl011: do not time out prematurely when draining tx fifo
	serial: 8250: Also set sticky MCR bits in console restoration
	serial: 8250: Correct the clock for EndRun PTP/1588 PCIe device
	arch_topology: Do not set llc_sibling if llc_id is invalid
	ceph: fix possible NULL pointer dereference for req->r_session
	bus: mhi: host: pci_generic: Add missing poweroff() PM callback
	bus: mhi: host: pci_generic: Flush recovery worker during freeze
	arm64: dts: imx8mm-venice: fix spi2 pin configuration
	pinctrl: samsung: fix missing GPIOLIB on ARM64 Exynos config
	hex2bin: make the function hex_to_bin constant-time
	hex2bin: fix access beyond string end
	riscv: patch_text: Fixup last cpu should be master
	x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests
	iocost: don't reset the inuse weight of under-weighted debtors
	virtio_net: fix wrong buf address calculation when using xdp
	cpufreq: qcom-hw: fix the race between LMH worker and cpuhp
	cpufreq: qcom-cpufreq-hw: Fix throttle frequency value on EPSS platforms
	video: fbdev: udlfb: properly check endpoint type
	arm64: dts: meson: remove CPU opps below 1GHz for G12B boards
	arm64: dts: meson: remove CPU opps below 1GHz for SM1 boards
	iio:imu:bmi160: disable regulator in error path
	mtd: rawnand: fix ecc parameters for mt7622
	xsk: Fix l2fwd for copy mode + busy poll combo
	arm64: dts: imx8qm: Correct SCU clock controller's compatible property
	USB: Fix xhci event ring dequeue pointer ERDP update issue
	ARM: dts: imx6qdl-apalis: Fix sgtl5000 detection issue
	arm64: dts: imx8mn: Fix SAI nodes
	arm64: dts: meson-sm1-bananapi-m5: fix wrong GPIO pin labeling for CON1
	phy: samsung: Fix missing of_node_put() in exynos_sata_phy_probe
	phy: samsung: exynos5250-sata: fix missing device put in probe error paths
	ARM: OMAP2+: Fix refcount leak in omap_gic_of_init
	bus: ti-sysc: Make omap3 gpt12 quirk handling SoC specific
	ARM: dts: dra7: Fix suspend warning for vpe powerdomain
	phy: ti: omap-usb2: Fix error handling in omap_usb2_enable_clocks
	ARM: dts: at91: Map MCLK for wm8731 on at91sam9g20ek
	ARM: dts: at91: sama5d4_xplained: fix pinctrl phandle name
	ARM: dts: at91: fix pinctrl phandles
	phy: mapphone-mdm6600: Fix PM error handling in phy_mdm6600_probe
	phy: ti: Add missing pm_runtime_disable() in serdes_am654_probe
	interconnect: qcom: sdx55: Drop IP0 interconnects
	ARM: dts: Fix mmc order for omap3-gta04
	ARM: dts: am3517-evm: Fix misc pinmuxing
	ARM: dts: logicpd-som-lv: Fix wrong pinmuxing on OMAP35
	ipvs: correctly print the memory size of ip_vs_conn_tab
	phy: amlogic: fix error path in phy_g12a_usb3_pcie_probe()
	pinctrl: mediatek: moore: Fix build error
	mtd: rawnand: Fix return value check of wait_for_completion_timeout
	mtd: fix 'part' field data corruption in mtd_info
	pinctrl: stm32: Do not call stm32_gpio_get() for edge triggered IRQs in EOI
	memory: renesas-rpc-if: Fix HF/OSPI data transfer in Manual Mode
	net: dsa: Add missing of_node_put() in dsa_port_link_register_of
	netfilter: nft_set_rbtree: overlap detection with element re-addition after deletion
	bpf, lwt: Fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hook
	pinctrl: rockchip: fix RK3308 pinmux bits
	tcp: md5: incorrect tcp_header_len for incoming connections
	pinctrl: stm32: Keep pinctrl block clock enabled when LEVEL IRQ requested
	tcp: ensure to use the most recently sent skb when filling the rate sample
	wireguard: device: check for metadata_dst with skb_valid_dst()
	sctp: check asoc strreset_chunk in sctp_generate_reconf_event
	ARM: dts: imx6ull-colibri: fix vqmmc regulator
	arm64: dts: imx8mn-ddr4-evk: Describe the 32.768 kHz PMIC clock
	pinctrl: pistachio: fix use of irq_of_parse_and_map()
	cpufreq: fix memory leak in sun50i_cpufreq_nvmem_probe
	net: hns3: clear inited state and stop client after failed to register netdev
	net: hns3: modify the return code of hclge_get_ring_chain_from_mbx
	net: hns3: add validity check for message data length
	net: hns3: add return value for mailbox handling in PF
	net/smc: sync err code when tcp connection was refused
	ip_gre: Make o_seqno start from 0 in native mode
	ip6_gre: Make o_seqno start from 0 in native mode
	ip_gre, ip6_gre: Fix race condition on o_seqno in collect_md mode
	tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT
	tcp: make sure treq->af_specific is initialized
	bus: sunxi-rsb: Fix the return value of sunxi_rsb_device_create()
	clk: sunxi: sun9i-mmc: check return value after calling platform_get_resource()
	cpufreq: qcom-cpufreq-hw: Clear dcvs interrupts
	net: bcmgenet: hide status block before TX timestamping
	net: phy: marvell10g: fix return value on error
	net: dsa: mv88e6xxx: Fix port_hidden_wait to account for port_base_addr
	drm/sun4i: Remove obsolete references to PHYS_OFFSET
	net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLK
	io_uring: check reserved fields for send/sendmsg
	io_uring: check reserved fields for recv/recvmsg
	netfilter: conntrack: fix udp offload timeout sysctl
	drm/amdkfd: Fix GWS queue count
	drm/amd/display: Fix memory leak in dcn21_clock_source_create
	tls: Skip tls_append_frag on zero copy size
	bnx2x: fix napi API usage sequence
	net: fec: add missing of_node_put() in fec_enet_init_stop_mode()
	gfs2: Prevent endless loops in gfs2_file_buffered_write
	gfs2: Minor retry logic cleanup
	gfs2: Make sure not to return short direct writes
	gfs2: No short reads or writes upon glock contention
	perf arm-spe: Fix addresses of synthesized SPE events
	ixgbe: ensure IPsec VF<->PF compatibility
	Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits"
	tcp: fix F-RTO may not work correctly when receiving DSACK
	ASoC: Intel: soc-acpi: correct device endpoints for max98373
	ASoC: wm8731: Disable the regulator when probing fails
	ext4: fix bug_on in start_this_handle during umount filesystem
	arch: xtensa: platforms: Fix deadlock in rs_close()
	ksmbd: increment reference count of parent fp
	ksmbd: set fixed sector size to FS_SECTOR_SIZE_INFORMATION
	bonding: do not discard lowest hash bit for non layer3+4 hashing
	x86: __memcpy_flushcache: fix wrong alignment if size > 2^32
	cifs: destage any unwritten data to the server before calling copychunk_write
	drivers: net: hippi: Fix deadlock in rr_close()
	powerpc/perf: Fix 32bit compile
	selftest/vm: verify mmap addr in mremap_test
	selftest/vm: verify remap destination address in mremap_test
	Revert "ACPI: processor: idle: fix lockup regression on 32-bit ThinkPad T40"
	zonefs: Fix management of open zones
	zonefs: Clear inode information flags on inode creation
	kasan: prevent cpu_quarantine corruption when CPU offline and cache shrink occur at same time
	mtd: rawnand: qcom: fix memory corruption that causes panic
	netfilter: Update ip6_route_me_harder to consider L3 domain
	drm/i915: Check EDID for HDR static metadata when choosing blc
	drm/i915: Fix SEL_FETCH_PLANE_*(PIPE_B+) register addresses
	net: ethernet: stmmac: fix write to sgmii_adapter_base
	ACPI: processor: idle: Avoid falling back to C3 type C-states
	thermal: int340x: Fix attr.show callback prototype
	btrfs: fix leaked plug after failure syncing log on zoned filesystems
	ARM: dts: at91: sama7g5ek: enable pull-up on flexcom3 console lines
	ARM: dts: imx8mm-venice-gw{71xx,72xx,73xx}: fix OTG controller OC mode
	x86/cpu: Load microcode during restore_processor_state()
	perf symbol: Pass is_kallsyms to symbols__fixup_end()
	perf symbol: Update symbols__fixup_end()
	tty: n_gsm: fix restart handling via CLD command
	tty: n_gsm: fix decoupled mux resource
	tty: n_gsm: fix mux cleanup after unregister tty device
	tty: n_gsm: fix wrong signal octet encoding in convergence layer type 2
	tty: n_gsm: fix malformed counter for out of frame data
	netfilter: nft_socket: only do sk lookups when indev is available
	tty: n_gsm: fix insufficient txframe size
	tty: n_gsm: fix wrong DLCI release order
	tty: n_gsm: fix missing explicit ldisc flush
	tty: n_gsm: fix wrong command retry handling
	tty: n_gsm: fix wrong command frame length field encoding
	tty: n_gsm: fix wrong signal octets encoding in MSC
	tty: n_gsm: fix missing tty wakeup in convergence layer type 2
	tty: n_gsm: fix reset fifo race condition
	tty: n_gsm: fix incorrect UA handling
	tty: n_gsm: fix software flow control handling
	perf symbol: Remove arch__symbols__fixup_end()
	eeprom: at25: Use DMA safe buffers
	objtool: Fix code relocs vs weak symbols
	objtool: Fix type of reloc::addend
	powerpc/64: Add UADDR64 relocation support
	Linux 5.15.38

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic5e398d47dd6240ecde38f635b3085fae7c3c0ab
2022-05-18 09:00:50 +02:00
Greg Kroah-Hartman ecda2085fd Revert 5.15.37 merge into android13-5.15
This reverts the merge of 5.15.37 into the android13-5.15
There are lots of ABI issues, and many of the commits are not needed in
the Android tree at this time.  Revert the merge (except for the
Makefile change), so that future merges will continue to work, and the
needed individual changes from this release will be manually added to
the tree at a later point in time.

Fixes: f7dace75d276 ("Merge 5.15.37 into android13-5.15")
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0632858e5c0fb94fc14c0f4216997330eca260a7
2022-05-18 08:59:14 +02:00
Greg Kroah-Hartman ef5fed3c1e Merge 5.15.37 into android13-5.15
Changes in 5.15.37
	floppy: disable FDRAWCMD by default
	bpf: Introduce composable reg, ret and arg types.
	bpf: Replace ARG_XXX_OR_NULL with ARG_XXX | PTR_MAYBE_NULL
	bpf: Replace RET_XXX_OR_NULL with RET_XXX | PTR_MAYBE_NULL
	bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL
	bpf: Introduce MEM_RDONLY flag
	bpf: Convert PTR_TO_MEM_OR_NULL to composable types.
	bpf: Make per_cpu_ptr return rdonly PTR_TO_MEM.
	bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem.
	bpf/selftests: Test PTR_TO_RDONLY_MEM
	bpf: Fix crash due to out of bounds access into reg2btf_ids.
	spi: cadence-quadspi: fix write completion support
	ARM: dts: socfpga: change qspi to "intel,socfpga-qspi"
	mm: kfence: fix objcgs vector allocation
	gup: Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable}
	iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable
	iov_iter: Introduce fault_in_iov_iter_writeable
	gfs2: Add wrapper for iomap_file_buffered_write
	gfs2: Clean up function may_grant
	gfs2: Introduce flag for glock holder auto-demotion
	gfs2: Move the inode glock locking to gfs2_file_buffered_write
	gfs2: Eliminate ip->i_gh
	gfs2: Fix mmap + page fault deadlocks for buffered I/O
	iomap: Fix iomap_dio_rw return value for user copies
	iomap: Support partial direct I/O on user copy failures
	iomap: Add done_before argument to iomap_dio_rw
	gup: Introduce FOLL_NOFAULT flag to disable page faults
	iov_iter: Introduce nofault flag to disable page faults
	gfs2: Fix mmap + page fault deadlocks for direct I/O
	btrfs: fix deadlock due to page faults during direct IO reads and writes
	btrfs: fallback to blocking mode when doing async dio over multiple extents
	mm: gup: make fault_in_safe_writeable() use fixup_user_fault()
	selftests/bpf: Add test for reg2btf_ids out of bounds access
	Linux 5.15.37

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ica39b8856d6e3928a82f4e34f8b401f1a5cba5ee
2022-05-18 08:59:02 +02:00
Greg Kroah-Hartman e95cdba8e2 Merge 5.15.36 into android13-5.15
Changes in 5.15.36
	fs: remove __sync_filesystem
	block: remove __sync_blockdev
	block: simplify the block device syncing code
	vfs: make sync_filesystem return errors from ->sync_fs
	xfs: return errors in xfs_fs_sync_fs
	dma-mapping: remove bogus test for pfn_valid from dma_map_resource
	arm64/mm: drop HAVE_ARCH_PFN_VALID
	etherdevice: Adjust ether_addr* prototypes to silence -Wstringop-overead
	mm: page_alloc: fix building error on -Werror=array-compare
	perf tools: Fix segfault accessing sample_id xyarray
	mm, kfence: support kmem_dump_obj() for KFENCE objects
	gfs2: assign rgrp glock before compute_bitstructs
	scsi: ufs: core: scsi_get_lba() error fix
	net/sched: cls_u32: fix netns refcount changes in u32_change()
	ALSA: usb-audio: Clear MIDI port active flag after draining
	ALSA: hda/realtek: Add quirk for Clevo NP70PNP
	ASoC: atmel: Remove system clock tree configuration for at91sam9g20ek
	ASoC: topology: Correct error handling in soc_tplg_dapm_widget_create()
	ASoC: rk817: Use devm_clk_get() in rk817_platform_probe
	ASoC: msm8916-wcd-digital: Check failure for devm_snd_soc_register_component
	ASoC: codecs: wcd934x: do not switch off SIDO Buck when codec is in use
	dmaengine: idxd: fix device cleanup on disable
	dmaengine: imx-sdma: Fix error checking in sdma_event_remap
	dmaengine: mediatek:Fix PM usage reference leak of mtk_uart_apdma_alloc_chan_resources
	dmaengine: dw-edma: Fix unaligned 64bit access
	spi: spi-mtk-nor: initialize spi controller after resume
	esp: limit skb_page_frag_refill use to a single page
	spi: cadence-quadspi: fix incorrect supports_op() return value
	igc: Fix infinite loop in release_swfw_sync
	igc: Fix BUG: scheduling while atomic
	igc: Fix suspending when PTM is active
	ALSA: hda/hdmi: fix warning about PCM count when used with SOF
	rxrpc: Restore removed timer deletion
	net/smc: Fix sock leak when release after smc_shutdown()
	net/packet: fix packet_sock xmit return value checking
	ip6_gre: Avoid updating tunnel->tun_hlen in __gre6_xmit()
	ip6_gre: Fix skb_under_panic in __gre6_xmit()
	net: restore alpha order to Ethernet devices in config
	net/sched: cls_u32: fix possible leak in u32_init_knode()
	l3mdev: l3mdev_master_upper_ifindex_by_index_rcu should be using netdev_master_upper_dev_get_rcu
	ipv6: make ip6_rt_gc_expire an atomic_t
	can: isotp: stop timeout monitoring when no first frame was sent
	net: dsa: hellcreek: Calculate checksums in tagger
	net: mscc: ocelot: fix broken IP multicast flooding
	netlink: reset network and mac headers in netlink_dump()
	drm/i915/display/psr: Unset enable_psr2_sel_fetch if other checks in intel_psr2_config_valid() fails
	net: stmmac: Use readl_poll_timeout_atomic() in atomic state
	dmaengine: idxd: add RO check for wq max_batch_size write
	dmaengine: idxd: add RO check for wq max_transfer_size write
	dmaengine: idxd: skip clearing device context when device is read-only
	selftests: mlxsw: vxlan_flooding: Prevent flooding of unwanted packets
	arm64: mm: fix p?d_leaf()
	ARM: vexpress/spc: Avoid negative array index when !SMP
	reset: renesas: Check return value of reset_control_deassert()
	reset: tegra-bpmp: Restore Handle errors in BPMP response
	platform/x86: samsung-laptop: Fix an unsigned comparison which can never be negative
	ALSA: usb-audio: Fix undefined behavior due to shift overflowing the constant
	drm/msm/disp: check the return value of kzalloc()
	arm64: dts: imx: Fix imx8*-var-som touchscreen property sizes
	vxlan: fix error return code in vxlan_fdb_append
	cifs: Check the IOCB_DIRECT flag, not O_DIRECT
	net: atlantic: Avoid out-of-bounds indexing
	mt76: Fix undefined behavior due to shift overflowing the constant
	brcmfmac: sdio: Fix undefined behavior due to shift overflowing the constant
	dpaa_eth: Fix missing of_node_put in dpaa_get_ts_info()
	drm/msm/mdp5: check the return of kzalloc()
	net: macb: Restart tx only if queue pointer is lagging
	scsi: iscsi: Release endpoint ID when its freed
	scsi: iscsi: Merge suspend fields
	scsi: iscsi: Fix NOP handling during conn recovery
	scsi: qedi: Fix failed disconnect handling
	stat: fix inconsistency between struct stat and struct compat_stat
	VFS: filename_create(): fix incorrect intent.
	nvme: add a quirk to disable namespace identifiers
	nvme-pci: disable namespace identifiers for the MAXIO MAP1002/1202
	nvme-pci: disable namespace identifiers for Qemu controllers
	EDAC/synopsys: Read the error count from the correct register
	mm/memory-failure.c: skip huge_zero_page in memory_failure()
	memcg: sync flush only if periodic flush is delayed
	mm, hugetlb: allow for "high" userspace addresses
	oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup
	mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove()
	ata: pata_marvell: Check the 'bmdma_addr' beforing reading
	dma: at_xdmac: fix a missing check on list iterator
	dmaengine: imx-sdma: fix init of uart scripts
	net: atlantic: invert deep par in pm functions, preventing null derefs
	Input: omap4-keypad - fix pm_runtime_get_sync() error checking
	scsi: sr: Do not leak information in ioctl
	sched/pelt: Fix attach_entity_load_avg() corner case
	perf/core: Fix perf_mmap fail when CONFIG_PERF_USE_VMALLOC enabled
	drm/panel/raspberrypi-touchscreen: Avoid NULL deref if not initialised
	drm/panel/raspberrypi-touchscreen: Initialise the bridge in prepare
	KVM: PPC: Fix TCE handling for VFIO
	drm/vc4: Use pm_runtime_resume_and_get to fix pm_runtime_get_sync() usage
	powerpc/perf: Fix power9 event alternatives
	powerpc/perf: Fix power10 event alternatives
	perf script: Always allow field 'data_src' for auxtrace
	perf report: Set PERF_SAMPLE_DATA_SRC bit for Arm SPE event
	xtensa: patch_text: Fixup last cpu should be master
	xtensa: fix a7 clobbering in coprocessor context load/store
	openvswitch: fix OOB access in reserve_sfa_size()
	gpio: Request interrupts after IRQ is initialized
	ASoC: soc-dapm: fix two incorrect uses of list iterator
	e1000e: Fix possible overflow in LTR decoding
	ARC: entry: fix syscall_trace_exit argument
	arm_pmu: Validate single/group leader events
	KVM: x86/pmu: Update AMD PMC sample period to fix guest NMI-watchdog
	KVM: x86: Pend KVM_REQ_APICV_UPDATE during vCPU creation to fix a race
	KVM: nVMX: Defer APICv updates while L2 is active until L1 is active
	KVM: SVM: Flush when freeing encrypted pages even on SME_COHERENT CPUs
	netfilter: conntrack: convert to refcount_t api
	netfilter: conntrack: avoid useless indirection during conntrack destruction
	ext4: fix fallocate to use file_modified to update permissions consistently
	ext4: fix symlink file size not match to file content
	ext4: fix use-after-free in ext4_search_dir
	ext4: limit length to bitmap_maxbytes - blocksize in punch_hole
	ext4, doc: fix incorrect h_reserved size
	ext4: fix overhead calculation to account for the reserved gdt blocks
	ext4: force overhead calculation if the s_overhead_cluster makes no sense
	netfilter: nft_ct: fix use after free when attaching zone template
	jbd2: fix a potential race while discarding reserved buffers after an abort
	spi: atmel-quadspi: Fix the buswidth adjustment between spi-mem and controller
	block/compat_ioctl: fix range check in BLKGETSIZE
	arm64: dts: qcom: add IPA qcom,qmp property
	Linux 5.15.36

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I44d3a4de9b6fa1d2016b4e063eb211e8373a1216
2022-05-18 08:55:59 +02:00
Prasad Sodagudi 525d77310a ANDROID: pstore/ram: Add backward compatibility for ramoops reserved region
Some of the platforms might be still expecting dedicated memory region
for ramoops node. So add logic to detect the start and size of the
ramoops memory region by looking up reserved memory region with
of_reserved_mem_lookup() when platform_get_resource() failed.

Bug: 191636717
Change-Id: Idc479b45fb3f637f7235efd6eabac62059d5e92b
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
(cherry picked from commit 9b136eab76eee8988a8474008aa7a8cfc83be421)
2022-05-16 22:30:25 +00:00
Isaac J. Manjarres 317045867f FROMLIST: pstore/ram: Rework logic for detecting ramoops reserved memory region
The reserved memory region for ramoops is assumed to be at a fixed
and known location when read from the devicetree. This is not desirable
in environments where it is preferred for the region to be dynamically
allocated at runtime, as opposed to it being fixed at compile time.

Change the logic for detecting the start and size of the ramoops
memory region by looking up the reserved memory region instead of
using platform_get_resource(), which assumes that the location
of the memory is known ahead of time.

Bug: 191636717
Link: https://lore.kernel.org/patchwork/patch/1451704/
Change-Id: I24066de9f4fe1f1575cb1bbb1687c37a2b1938a4
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
(cherry picked from commit bd2ca0ba5b9ccf3aa846d0f08a06c193e5a388f4)
2022-05-16 22:30:15 +00:00
Jan Kara 9e951f2d85 udf: Avoid using stale lengthOfImpUse
commit c1ad35dd0548ce947d97aaf92f7f2f9a202951cf upstream.

udf_write_fi() uses lengthOfImpUse of the entry it is writing to.
However this field has not yet been initialized so it either contains
completely bogus value or value from last directory entry at that place.
In either case this is wrong and can lead to filesystem corruption or
kernel crashes.

Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
CC: stable@vger.kernel.org
Fixes: 979a6e28dd ("udf: Get rid of 0-length arrays in struct fileIdentDesc")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-15 20:18:52 +02:00
Filipe Manana 3d0e7373b2 btrfs: always log symlinks in full mode
commit d0e64a981fd841cb0f28fcd6afcac55e6f1e6994 upstream.

On Linux, empty symlinks are invalid, and attempting to create one with
the system call symlink(2) results in an -ENOENT error and this is
explicitly documented in the man page.

If we rename a symlink that was created in the current transaction and its
parent directory was logged before, we actually end up logging the symlink
without logging its content, which is stored in an inline extent. That
means that after a power failure we can end up with an empty symlink,
having no content and an i_size of 0 bytes.

It can be easily reproduced like this:

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt

  $ mkdir /mnt/testdir
  $ sync

  # Create a file inside the directory and fsync the directory.
  $ touch /mnt/testdir/foo
  $ xfs_io -c "fsync" /mnt/testdir

  # Create a symlink inside the directory and then rename the symlink.
  $ ln -s /mnt/testdir/foo /mnt/testdir/bar
  $ mv /mnt/testdir/bar /mnt/testdir/baz

  # Now fsync again the directory, this persist the log tree.
  $ xfs_io -c "fsync" /mnt/testdir

  <power failure>

  $ mount /dev/sdc /mnt
  $ stat -c %s /mnt/testdir/baz
  0
  $ readlink /mnt/testdir/baz
  $

Fix this by always logging symlinks in full mode (LOG_INODE_ALL), so that
their content is also logged.

A test case for fstests will follow.

CC: stable@vger.kernel.org # 4.9+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-12 12:30:20 +02:00
Qu Wenruo e42a854548 btrfs: force v2 space cache usage for subpage mount
commit 9f73f1aef98b2fa7252c0a89be64840271ce8ea0 upstream.

[BUG]
For a 4K sector sized btrfs with v1 cache enabled and only mounted on
systems with 4K page size, if it's mounted on subpage (64K page size)
systems, it can cause the following warning on v1 space cache:

 BTRFS error (device dm-1): csum mismatch on free space cache
 BTRFS warning (device dm-1): failed to load free space cache for block group 84082688, rebuilding it now

Although not a big deal, as kernel can rebuild it without problem, such
warning will bother end users, especially if they want to switch the
same btrfs seamlessly between different page sized systems.

[CAUSE]
V1 free space cache is still using fixed PAGE_SIZE for various bitmap,
like BITS_PER_BITMAP.

Such hard-coded PAGE_SIZE usage will cause various mismatch, from v1
cache size to checksum.

Thus kernel will always reject v1 cache with a different PAGE_SIZE with
csum mismatch.

[FIX]
Although we should fix v1 cache, it's already going to be marked
deprecated soon.

And we have v2 cache based on metadata (which is already fully subpage
compatible), and it has almost everything superior than v1 cache.

So just force subpage mount to use v2 cache on mount.

Reported-by: Matt Corallo <blnxfsl@bluematt.me>
CC: stable@vger.kernel.org # 5.15+
Link: https://lore.kernel.org/linux-btrfs/61aa27d1-30fc-c1a9-f0f4-9df544395ec3@bluematt.me/
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-12 12:30:20 +02:00
Filipe Manana 74b9abc468 btrfs: do not BUG_ON() on failure to update inode when setting xattr
commit 193b4e83986d7ee6caa8ceefb5ee9f58240fbee0 upstream.

We are doing a BUG_ON() if we fail to update an inode after setting (or
clearing) a xattr, but there's really no reason to not instead simply
abort the transaction and return the error to the caller. This should be
a rare error because we have previously reserved enough metadata space to
update the inode and the delayed inode should have already been setup, so
an -ENOSPC or -ENOMEM, which are the possible errors, are very unlikely to
happen.

So replace the BUG_ON()s with a transaction abort.

CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-12 12:30:18 +02:00
Trond Myklebust d34f9bbc1d NFSv4: Don't invalidate inode attributes on delegation return
commit 00c94ebec5925593c0377b941289224469e72ac7 upstream.

There is no need to declare attributes such as the ctime, mtime and
block size invalid when we're just returning a delegation, so it is
inappropriate to call nfs_post_op_update_inode_force_wcc().
Instead, just call nfs_refresh_inode() after faking up the change
attribute. We know that the GETATTR op occurs before the DELEGRETURN, so
we are safe when doing this.

Fixes: 0bc2c9b4dc ("NFSv4: Don't discard the attributes returned by asynchronous DELEGRETURN")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-12 12:30:15 +02:00
Filipe Manana cf12ce1bd7 btrfs: fix leaked plug after failure syncing log on zoned filesystems
commit 50ff57888d0b13440e7f4cde05dc339ee8d0f1f8 upstream.

On a zoned filesystem, if we fail to allocate the root node for the log
root tree while syncing the log, we end up returning without finishing
the IO plug we started before, resulting in leaking resources as we
have started writeback for extent buffers of a log tree before. That
allocation failure, which typically is either -ENOMEM or -ENOSPC, is not
fatal and the fsync can safely fallback to a full transaction commit.

So release the IO plug if we fail to allocate the extent buffer for the
root of the log root tree when syncing the log on a zoned filesystem.

Fixes: 3ddebf27fc ("btrfs: zoned: reorder log node allocation on zoned filesystem")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-09 09:14:42 +02:00
Damien Le Moal 051e78dc1f zonefs: Clear inode information flags on inode creation
commit 694852ead287a3433126e7ebda397b242dc99624 upstream.

Ensure that the i_flags field of struct zonefs_inode_info is cleared to
0 when initializing a zone file inode, avoiding seeing the flag
ZONEFS_ZONE_OPEN being incorrectly set.

Fixes: b5c00e9757 ("zonefs: open/close zone on file open/close")
Cc: <stable@vger.kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-09 09:14:41 +02:00
Damien Le Moal 534c3f29ac zonefs: Fix management of open zones
commit 1da18a296f5ba4f99429e62a7cf4fdbefa598902 upstream.

The mount option "explicit_open" manages the device open zone
resources to ensure that if an application opens a sequential file for
writing, the file zone can always be written by explicitly opening
the zone and accounting for that state with the s_open_zones counter.

However, if some zones are already open when mounting, the device open
zone resource usage status will be larger than the initial s_open_zones
value of 0. Ensure that this inconsistency does not happen by closing
any sequential zone that is open when mounting.

Furthermore, with ZNS drives, closing an explicitly open zone that has
not been written will change the zone state to "closed", that is, the
zone will remain in an active state. Since this can then cause failures
of explicit open operations on other zones if the drive active zone
resources are exceeded, we need to make sure that the zone is not
active anymore by resetting it instead of closing it. To address this,
zonefs_zone_mgmt() is modified to change a REQ_OP_ZONE_CLOSE request
into a REQ_OP_ZONE_RESET for sequential zones that have not been
written.

Fixes: b5c00e9757 ("zonefs: open/close zone on file open/close")
Cc: <stable@vger.kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-09 09:14:41 +02:00
Ronnie Sahlberg 3bb73c4cc2 cifs: destage any unwritten data to the server before calling copychunk_write
[ Upstream commit f5d0f921ea362636e4a2efb7c38d1ead373a8700 ]

because the copychunk_write might cover a region of the file that has not yet
been sent to the server and thus fail.

A simple way to reproduce this is:
truncate -s 0 /mnt/testfile; strace -f -o x -ttT xfs_io -i -f -c 'pwrite 0k 128k' -c 'fcollapse 16k 24k' /mnt/testfile

the issue is that the 'pwrite 0k 128k' becomes rearranged on the wire with
the 'fcollapse 16k 24k' due to write-back caching.

fcollapse is implemented in cifs.ko as a SMB2 IOCTL(COPYCHUNK_WRITE) call
and it will fail serverside since the file is still 0b in size serverside
until the writes have been destaged.
To avoid this we must ensure that we destage any unwritten data to the
server before calling COPYCHUNK_WRITE.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1997373
Reported-by: Xiaoli Feng <xifeng@redhat.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09 09:14:40 +02:00
Namjae Jeon d276bcc5f7 ksmbd: set fixed sector size to FS_SECTOR_SIZE_INFORMATION
[ Upstream commit 02655a70b7cc0f534531ee65fa72692f4d31a944 ]

Currently ksmbd is using ->f_bsize from vfs_statfs() as sector size.
If fat/exfat is a local share, ->f_bsize is a cluster size that is too
large to be used as a sector size. Sector sizes larger than 4K cause
problem occurs when mounting an iso file through windows client.

The error message can be obtained using Mount-DiskImage command,
 the error is:
"Mount-DiskImage : The sector size of the physical disk on which the
virtual disk resides is not supported."

This patch reports fixed 4KB sector size if ->s_blocksize is bigger
than 4KB.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09 09:14:40 +02:00
Namjae Jeon df30cbfd3d ksmbd: increment reference count of parent fp
[ Upstream commit 8510a043d334ecdf83d4604782f288db6bf21d60 ]

Add missing increment reference count of parent fp in
ksmbd_lookup_fd_inode().

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09 09:14:40 +02:00
Ye Bin 52c3a04f9e ext4: fix bug_on in start_this_handle during umount filesystem
[ Upstream commit b98535d091795a79336f520b0708457aacf55c67 ]

We got issue as follows:
------------[ cut here ]------------
kernel BUG at fs/jbd2/transaction.c:389!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 9 PID: 131 Comm: kworker/9:1 Not tainted 5.17.0-862.14.0.6.x86_64-00001-g23f87daf7d74-dirty #197
Workqueue: events flush_stashed_error_work
RIP: 0010:start_this_handle+0x41c/0x1160
RSP: 0018:ffff888106b47c20 EFLAGS: 00010202
RAX: ffffed10251b8400 RBX: ffff888128dc204c RCX: ffffffffb52972ac
RDX: 0000000000000200 RSI: 0000000000000004 RDI: ffff888128dc2050
RBP: 0000000000000039 R08: 0000000000000001 R09: ffffed10251b840a
R10: ffff888128dc204f R11: ffffed10251b8409 R12: ffff888116d78000
R13: 0000000000000000 R14: dffffc0000000000 R15: ffff888128dc2000
FS:  0000000000000000(0000) GS:ffff88839d680000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000001620068 CR3: 0000000376c0e000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 jbd2__journal_start+0x38a/0x790
 jbd2_journal_start+0x19/0x20
 flush_stashed_error_work+0x110/0x2b3
 process_one_work+0x688/0x1080
 worker_thread+0x8b/0xc50
 kthread+0x26f/0x310
 ret_from_fork+0x22/0x30
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---

Above issue may happen as follows:
      umount            read procfs            error_work
ext4_put_super
  flush_work(&sbi->s_error_work);

                      ext4_mb_seq_groups_show
	                ext4_mb_load_buddy_gfp
			  ext4_mb_init_group
			    ext4_mb_init_cache
	                      ext4_read_block_bitmap_nowait
			        ext4_validate_block_bitmap
				  ext4_error
			            ext4_handle_error
			              schedule_work(&EXT4_SB(sb)->s_error_work);

  ext4_unregister_sysfs(sb);
  jbd2_journal_destroy(sbi->s_journal);
    journal_kill_thread
      journal->j_flags |= JBD2_UNMOUNT;

                                          flush_stashed_error_work
				            jbd2_journal_start
					      start_this_handle
					        BUG_ON(journal->j_flags & JBD2_UNMOUNT);

To solve this issue, we call 'ext4_unregister_sysfs() before flushing
s_error_work in ext4_put_super().

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20220322012419.725457-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09 09:14:39 +02:00
Andreas Gruenbacher 3591293c19 gfs2: No short reads or writes upon glock contention
[ Upstream commit 296abc0d91d8b65d42224dd33452ace14491ad08 ]

Commit 00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for buffered
I/O") changed gfs2_file_read_iter() and gfs2_file_buffered_write() to
allow dropping the inode glock while faulting in user buffers.  When the
lock was dropped, a short result was returned to indicate that the
operation was interrupted.

As pointed out by Linus (see the link below), this behavior is broken
and the operations should always re-acquire the inode glock and resume
the operation instead.

Link: https://lore.kernel.org/lkml/CAHk-=whaz-g_nOOoo8RRiWNjnv2R+h6_xk2F1J4TuSRxk1MtLw@mail.gmail.com/
Fixes: 00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for buffered I/O")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09 09:14:39 +02:00
Andreas Gruenbacher b5afb477d2 gfs2: Make sure not to return short direct writes
[ Upstream commit 3bde4c48586074202044456285a97ccdf9048988 ]

When direct writes fail with -ENOTBLK because we're writing into a
hole (gfs2_iomap_begin()) or because of a page invalidation failure
(iomap_dio_rw()), we're falling back to buffered writes.  In that case,
when we lose the inode glock in gfs2_file_buffered_write(), we want to
re-acquire it instead of returning a short write.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09 09:14:39 +02:00
Andreas Gruenbacher fe24959a79 gfs2: Minor retry logic cleanup
[ Upstream commit 124c458a401a2497f796e4f2d6cafac6edbea8e9 ]

Clean up the retry logic in the read and write functions somewhat.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09 09:14:39 +02:00
Andreas Gruenbacher e4ea3286b1 gfs2: Prevent endless loops in gfs2_file_buffered_write
[ Upstream commit 554c577cee95bdc1d03d9f457e57dc96eb791845 ]

Currently, instead of performing a short write,
iomap_file_buffered_write will fail when part of its iov iterator cannot
be read.  In contrast, gfs2_file_buffered_write will loop around if it
can read part of the iov iterator, so we can end up in an endless loop.

This should be fixed in iomap_file_buffered_write (and also
generic_perform_write), but this comes a bit late in the 5.16
development cycle, so work around it in the filesystem by
trimming the iov iterator to the known-good size for now.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09 09:14:38 +02:00
Jens Axboe 37811e46a2 io_uring: check reserved fields for recv/recvmsg
[ Upstream commit 5a1e99b61b0c81388cde0c808b3e4173907df19f ]

We should check unused fields for non-zero and -EINVAL if they are set,
making it consistent with other opcodes.

Fixes: aa1fa28fc7 ("io_uring: add support for recvmsg()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09 09:14:38 +02:00
Jens Axboe 79c10cb188 io_uring: check reserved fields for send/sendmsg
[ Upstream commit 588faa1ea5eecb351100ee5d187b9be99210f70d ]

We should check unused fields for non-zero and -EINVAL if they are set,
making it consistent with other opcodes.

Fixes: 0fa03c624d ("io_uring: add support for sendmsg()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09 09:14:38 +02:00
Xiubo Li 732f861dd4 ceph: fix possible NULL pointer dereference for req->r_session
commit 7acae6183cf37c48b8da48bbbdb78820fb3913f3 upstream.

The request will be inserted into the ci->i_unsafe_dirops before
assigning the req->r_session, so it's possible that we will hit
NULL pointer dereference bug here.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/55327
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-09 09:14:30 +02:00
Greg Kroah-Hartman 2014a9eea4 Revert "Revert "coredump: Use the vma snapshot in fill_files_note""
This reverts commit 0c5b51622c.

It is no longer needed as we are able to update the abi at this point in
time.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5b2c6e89737a2ac647f19719a1ccf256c2794a02
2022-05-04 13:39:10 -07:00
Greg Kroah-Hartman 8b0bc3fc2a Revert "Revert "coredump: Remove the WARN_ON in dump_vma_snapshot""
This reverts commit c31598eb0b.

It is no longer needed as we are able to update the abi at this point in
time.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia44bf70cb550e4840e3be1be7c8b2b0bea0a330e
2022-05-04 13:39:10 -07:00
Greg Kroah-Hartman 23dd13c9f9 Revert "Revert "coredump: Snapshot the vmas in do_coredump""
This reverts commit ff1561ac7f.

It is no longer needed as we are able to update the abi at this point in
time.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I20acdca0aa5b59436e890887c03f125f027a9d45
2022-05-04 13:39:10 -07:00
Filipe Manana 4a0123bdb0 btrfs: fallback to blocking mode when doing async dio over multiple extents
commit ca93e44bfb5fd7996b76f0f544999171f647f93b upstream

Some users recently reported that MariaDB was getting a read corruption
when using io_uring on top of btrfs. This started to happen in 5.16,
after commit 51bd9563b6783d ("btrfs: fix deadlock due to page faults
during direct IO reads and writes"). That changed btrfs to use the new
iomap flag IOMAP_DIO_PARTIAL and to disable page faults before calling
iomap_dio_rw(). This was necessary to fix deadlocks when the iovector
corresponds to a memory mapped file region. That type of scenario is
exercised by test case generic/647 from fstests.

For this MariaDB scenario, we attempt to read 16K from file offset X
using IOCB_NOWAIT and io_uring. In that range we have 4 extents, each
with a size of 4K, and what happens is the following:

1) btrfs_direct_read() disables page faults and calls iomap_dio_rw();

2) iomap creates a struct iomap_dio object, its reference count is
   initialized to 1 and its ->size field is initialized to 0;

3) iomap calls btrfs_dio_iomap_begin() with file offset X, which finds
   the first 4K extent, and setups an iomap for this extent consisting
   of a single page;

4) At iomap_dio_bio_iter(), we are able to access the first page of the
   buffer (struct iov_iter) with bio_iov_iter_get_pages() without
   triggering a page fault;

5) iomap submits a bio for this 4K extent
   (iomap_dio_submit_bio() -> btrfs_submit_direct()) and increments
   the refcount on the struct iomap_dio object to 2; The ->size field
   of the struct iomap_dio object is incremented to 4K;

6) iomap calls btrfs_iomap_begin() again, this time with a file
   offset of X + 4K. There we setup an iomap for the next extent
   that also has a size of 4K;

7) Then at iomap_dio_bio_iter() we call bio_iov_iter_get_pages(),
   which tries to access the next page (2nd page) of the buffer.
   This triggers a page fault and returns -EFAULT;

8) At __iomap_dio_rw() we see the -EFAULT, but we reset the error
   to 0 because we passed the flag IOMAP_DIO_PARTIAL to iomap and
   the struct iomap_dio object has a ->size value of 4K (we submitted
   a bio for an extent already). The 'wait_for_completion' variable
   is not set to true, because our iocb has IOCB_NOWAIT set;

9) At the bottom of __iomap_dio_rw(), we decrement the reference count
   of the struct iomap_dio object from 2 to 1. Because we were not
   the only ones holding a reference on it and 'wait_for_completion' is
   set to false, -EIOCBQUEUED is returned to btrfs_direct_read(), which
   just returns it up the callchain, up to io_uring;

10) The bio submitted for the first extent (step 5) completes and its
    bio endio function, iomap_dio_bio_end_io(), decrements the last
    reference on the struct iomap_dio object, resulting in calling
    iomap_dio_complete_work() -> iomap_dio_complete().

11) At iomap_dio_complete() we adjust the iocb->ki_pos from X to X + 4K
    and return 4K (the amount of io done) to iomap_dio_complete_work();

12) iomap_dio_complete_work() calls the iocb completion callback,
    iocb->ki_complete() with a second argument value of 4K (total io
    done) and the iocb with the adjust ki_pos of X + 4K. This results
    in completing the read request for io_uring, leaving it with a
    result of 4K bytes read, and only the first page of the buffer
    filled in, while the remaining 3 pages, corresponding to the other
    3 extents, were not filled;

13) For the application, the result is unexpected because if we ask
    to read N bytes, it expects to get N bytes read as long as those
    N bytes don't cross the EOF (i_size).

MariaDB reports this as an error, as it's not expecting a short read,
since it knows it's asking for read operations fully within the i_size
boundary. This is typical in many applications, but it may also be
questionable if they should react to such short reads by issuing more
read calls to get the remaining data. Nevertheless, the short read
happened due to a change in btrfs regarding how it deals with page
faults while in the middle of a read operation, and there's no reason
why btrfs can't have the previous behaviour of returning the whole data
that was requested by the application.

The problem can also be triggered with the following simple program:

  /* Get O_DIRECT */
  #ifndef _GNU_SOURCE
  #define _GNU_SOURCE
  #endif

  #include <stdio.h>
  #include <stdlib.h>
  #include <unistd.h>
  #include <fcntl.h>
  #include <errno.h>
  #include <string.h>
  #include <liburing.h>

  int main(int argc, char *argv[])
  {
      char *foo_path;
      struct io_uring ring;
      struct io_uring_sqe *sqe;
      struct io_uring_cqe *cqe;
      struct iovec iovec;
      int fd;
      long pagesize;
      void *write_buf;
      void *read_buf;
      ssize_t ret;
      int i;

      if (argc != 2) {
          fprintf(stderr, "Use: %s <directory>\n", argv[0]);
          return 1;
      }

      foo_path = malloc(strlen(argv[1]) + 5);
      if (!foo_path) {
          fprintf(stderr, "Failed to allocate memory for file path\n");
          return 1;
      }
      strcpy(foo_path, argv[1]);
      strcat(foo_path, "/foo");

      /*
       * Create file foo with 2 extents, each with a size matching
       * the page size. Then allocate a buffer to read both extents
       * with io_uring, using O_DIRECT and IOCB_NOWAIT. Before doing
       * the read with io_uring, access the first page of the buffer
       * to fault it in, so that during the read we only trigger a
       * page fault when accessing the second page of the buffer.
       */
       fd = open(foo_path, O_CREAT | O_TRUNC | O_WRONLY |
                O_DIRECT, 0666);
       if (fd == -1) {
           fprintf(stderr,
                   "Failed to create file 'foo': %s (errno %d)",
                   strerror(errno), errno);
           return 1;
       }

       pagesize = sysconf(_SC_PAGE_SIZE);
       ret = posix_memalign(&write_buf, pagesize, 2 * pagesize);
       if (ret) {
           fprintf(stderr, "Failed to allocate write buffer\n");
           return 1;
       }

       memset(write_buf, 0xab, pagesize);
       memset(write_buf + pagesize, 0xcd, pagesize);

       /* Create 2 extents, each with a size matching page size. */
       for (i = 0; i < 2; i++) {
           ret = pwrite(fd, write_buf + i * pagesize, pagesize,
                        i * pagesize);
           if (ret != pagesize) {
               fprintf(stderr,
                     "Failed to write to file, ret = %ld errno %d (%s)\n",
                      ret, errno, strerror(errno));
               return 1;
           }
           ret = fsync(fd);
           if (ret != 0) {
               fprintf(stderr, "Failed to fsync file\n");
               return 1;
           }
       }

       close(fd);
       fd = open(foo_path, O_RDONLY | O_DIRECT);
       if (fd == -1) {
           fprintf(stderr,
                   "Failed to open file 'foo': %s (errno %d)",
                   strerror(errno), errno);
           return 1;
       }

       ret = posix_memalign(&read_buf, pagesize, 2 * pagesize);
       if (ret) {
           fprintf(stderr, "Failed to allocate read buffer\n");
           return 1;
       }

       /*
        * Fault in only the first page of the read buffer.
        * We want to trigger a page fault for the 2nd page of the
        * read buffer during the read operation with io_uring
        * (O_DIRECT and IOCB_NOWAIT).
        */
       memset(read_buf, 0, 1);

       ret = io_uring_queue_init(1, &ring, 0);
       if (ret != 0) {
           fprintf(stderr, "Failed to create io_uring queue\n");
           return 1;
       }

       sqe = io_uring_get_sqe(&ring);
       if (!sqe) {
           fprintf(stderr, "Failed to get io_uring sqe\n");
           return 1;
       }

       iovec.iov_base = read_buf;
       iovec.iov_len = 2 * pagesize;
       io_uring_prep_readv(sqe, fd, &iovec, 1, 0);

       ret = io_uring_submit_and_wait(&ring, 1);
       if (ret != 1) {
           fprintf(stderr,
                   "Failed at io_uring_submit_and_wait()\n");
           return 1;
       }

       ret = io_uring_wait_cqe(&ring, &cqe);
       if (ret < 0) {
           fprintf(stderr, "Failed at io_uring_wait_cqe()\n");
           return 1;
       }

       printf("io_uring read result for file foo:\n\n");
       printf("  cqe->res == %d (expected %d)\n", cqe->res, 2 * pagesize);
       printf("  memcmp(read_buf, write_buf) == %d (expected 0)\n",
              memcmp(read_buf, write_buf, 2 * pagesize));

       io_uring_cqe_seen(&ring, cqe);
       io_uring_queue_exit(&ring);

       return 0;
  }

When running it on an unpatched kernel:

  $ gcc io_uring_test.c -luring
  $ mkfs.btrfs -f /dev/sda
  $ mount /dev/sda /mnt/sda
  $ ./a.out /mnt/sda
  io_uring read result for file foo:

    cqe->res == 4096 (expected 8192)
    memcmp(read_buf, write_buf) == -205 (expected 0)

After this patch, the read always returns 8192 bytes, with the buffer
filled with the correct data. Although that reproducer always triggers
the bug in my test vms, it's possible that it will not be so reliable
on other environments, as that can happen if the bio for the first
extent completes and decrements the reference on the struct iomap_dio
object before we do the atomic_dec_and_test() on the reference at
__iomap_dio_rw().

Fix this in btrfs by having btrfs_dio_iomap_begin() return -EAGAIN
whenever we try to satisfy a non blocking IO request (IOMAP_NOWAIT flag
set) over a range that spans multiple extents (or a mix of extents and
holes). This avoids returning success to the caller when we only did
partial IO, which is not optimal for writes and for reads it's actually
incorrect, as the caller doesn't expect to get less bytes read than it has
requested (unless EOF is crossed), as previously mentioned. This is also
the type of behaviour that xfs follows (xfs_direct_write_iomap_begin()),
even though it doesn't use IOMAP_DIO_PARTIAL.

A test case for fstests will follow soon.

Link: https://lore.kernel.org/linux-btrfs/CABVffEM0eEWho+206m470rtM0d9J8ue85TtR-A_oVTuGLWFicA@mail.gmail.com/
Link: https://lore.kernel.org/linux-btrfs/CAHF2GV6U32gmqSjLe=XKgfcZAmLCiH26cJ2OnHGp5x=VAH4OHQ@mail.gmail.com/
CC: stable@vger.kernel.org # 5.16+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-01 17:22:34 +02:00
Filipe Manana c81c4f5666 btrfs: fix deadlock due to page faults during direct IO reads and writes
commit 51bd9563b6783de8315f38f7baed949e77c42311 upstream

If we do a direct IO read or write when the buffer given by the user is
memory mapped to the file range we are going to do IO, we end up ending
in a deadlock. This is triggered by the new test case generic/647 from
fstests.

For a direct IO read we get a trace like this:

  [967.872718] INFO: task mmap-rw-fault:12176 blocked for more than 120 seconds.
  [967.874161]       Not tainted 5.14.0-rc7-btrfs-next-95 #1
  [967.874909] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [967.875983] task:mmap-rw-fault   state:D stack:    0 pid:12176 ppid: 11884 flags:0x00000000
  [967.875992] Call Trace:
  [967.875999]  __schedule+0x3ca/0xe10
  [967.876015]  schedule+0x43/0xe0
  [967.876020]  wait_extent_bit.constprop.0+0x1eb/0x260 [btrfs]
  [967.876109]  ? do_wait_intr_irq+0xb0/0xb0
  [967.876118]  lock_extent_bits+0x37/0x90 [btrfs]
  [967.876150]  btrfs_lock_and_flush_ordered_range+0xa9/0x120 [btrfs]
  [967.876184]  ? extent_readahead+0xa7/0x530 [btrfs]
  [967.876214]  extent_readahead+0x32d/0x530 [btrfs]
  [967.876253]  ? lru_cache_add+0x104/0x220
  [967.876255]  ? kvm_sched_clock_read+0x14/0x40
  [967.876258]  ? sched_clock_cpu+0xd/0x110
  [967.876263]  ? lock_release+0x155/0x4a0
  [967.876271]  read_pages+0x86/0x270
  [967.876274]  ? lru_cache_add+0x125/0x220
  [967.876281]  page_cache_ra_unbounded+0x1a3/0x220
  [967.876291]  filemap_fault+0x626/0xa20
  [967.876303]  __do_fault+0x36/0xf0
  [967.876308]  __handle_mm_fault+0x83f/0x15f0
  [967.876322]  handle_mm_fault+0x9e/0x260
  [967.876327]  __get_user_pages+0x204/0x620
  [967.876332]  ? get_user_pages_unlocked+0x69/0x340
  [967.876340]  get_user_pages_unlocked+0xd3/0x340
  [967.876349]  internal_get_user_pages_fast+0xbca/0xdc0
  [967.876366]  iov_iter_get_pages+0x8d/0x3a0
  [967.876374]  bio_iov_iter_get_pages+0x82/0x4a0
  [967.876379]  ? lock_release+0x155/0x4a0
  [967.876387]  iomap_dio_bio_actor+0x232/0x410
  [967.876396]  iomap_apply+0x12a/0x4a0
  [967.876398]  ? iomap_dio_rw+0x30/0x30
  [967.876414]  __iomap_dio_rw+0x29f/0x5e0
  [967.876415]  ? iomap_dio_rw+0x30/0x30
  [967.876420]  ? lock_acquired+0xf3/0x420
  [967.876429]  iomap_dio_rw+0xa/0x30
  [967.876431]  btrfs_file_read_iter+0x10b/0x140 [btrfs]
  [967.876460]  new_sync_read+0x118/0x1a0
  [967.876472]  vfs_read+0x128/0x1b0
  [967.876477]  __x64_sys_pread64+0x90/0xc0
  [967.876483]  do_syscall_64+0x3b/0xc0
  [967.876487]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [967.876490] RIP: 0033:0x7fb6f2c038d6
  [967.876493] RSP: 002b:00007fffddf586b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000011
  [967.876496] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007fb6f2c038d6
  [967.876498] RDX: 0000000000001000 RSI: 00007fb6f2c17000 RDI: 0000000000000003
  [967.876499] RBP: 0000000000001000 R08: 0000000000000003 R09: 0000000000000000
  [967.876501] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000003
  [967.876502] R13: 0000000000000000 R14: 00007fb6f2c17000 R15: 0000000000000000

This happens because at btrfs_dio_iomap_begin() we lock the extent range
and return with it locked - we only unlock in the endio callback, at
end_bio_extent_readpage() -> endio_readpage_release_extent(). Then after
iomap called the btrfs_dio_iomap_begin() callback, it triggers the page
faults that resulting in reading the pages, through the readahead callback
btrfs_readahead(), and through there we end to attempt to lock again the
same extent range (or a subrange of what we locked before), resulting in
the deadlock.

For a direct IO write, the scenario is a bit different, and it results in
trace like this:

  [1132.442520] run fstests generic/647 at 2021-08-31 18:53:35
  [1330.349355] INFO: task mmap-rw-fault:184017 blocked for more than 120 seconds.
  [1330.350540]       Not tainted 5.14.0-rc7-btrfs-next-95 #1
  [1330.351158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [1330.351900] task:mmap-rw-fault   state:D stack:    0 pid:184017 ppid:183725 flags:0x00000000
  [1330.351906] Call Trace:
  [1330.351913]  __schedule+0x3ca/0xe10
  [1330.351930]  schedule+0x43/0xe0
  [1330.351935]  btrfs_start_ordered_extent+0x108/0x1c0 [btrfs]
  [1330.352020]  ? do_wait_intr_irq+0xb0/0xb0
  [1330.352028]  btrfs_lock_and_flush_ordered_range+0x8c/0x120 [btrfs]
  [1330.352064]  ? extent_readahead+0xa7/0x530 [btrfs]
  [1330.352094]  extent_readahead+0x32d/0x530 [btrfs]
  [1330.352133]  ? lru_cache_add+0x104/0x220
  [1330.352135]  ? kvm_sched_clock_read+0x14/0x40
  [1330.352138]  ? sched_clock_cpu+0xd/0x110
  [1330.352143]  ? lock_release+0x155/0x4a0
  [1330.352151]  read_pages+0x86/0x270
  [1330.352155]  ? lru_cache_add+0x125/0x220
  [1330.352162]  page_cache_ra_unbounded+0x1a3/0x220
  [1330.352172]  filemap_fault+0x626/0xa20
  [1330.352176]  ? filemap_map_pages+0x18b/0x660
  [1330.352184]  __do_fault+0x36/0xf0
  [1330.352189]  __handle_mm_fault+0x1253/0x15f0
  [1330.352203]  handle_mm_fault+0x9e/0x260
  [1330.352208]  __get_user_pages+0x204/0x620
  [1330.352212]  ? get_user_pages_unlocked+0x69/0x340
  [1330.352220]  get_user_pages_unlocked+0xd3/0x340
  [1330.352229]  internal_get_user_pages_fast+0xbca/0xdc0
  [1330.352246]  iov_iter_get_pages+0x8d/0x3a0
  [1330.352254]  bio_iov_iter_get_pages+0x82/0x4a0
  [1330.352259]  ? lock_release+0x155/0x4a0
  [1330.352266]  iomap_dio_bio_actor+0x232/0x410
  [1330.352275]  iomap_apply+0x12a/0x4a0
  [1330.352278]  ? iomap_dio_rw+0x30/0x30
  [1330.352292]  __iomap_dio_rw+0x29f/0x5e0
  [1330.352294]  ? iomap_dio_rw+0x30/0x30
  [1330.352306]  btrfs_file_write_iter+0x238/0x480 [btrfs]
  [1330.352339]  new_sync_write+0x11f/0x1b0
  [1330.352344]  ? NF_HOOK_LIST.constprop.0.cold+0x31/0x3e
  [1330.352354]  vfs_write+0x292/0x3c0
  [1330.352359]  __x64_sys_pwrite64+0x90/0xc0
  [1330.352365]  do_syscall_64+0x3b/0xc0
  [1330.352369]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [1330.352372] RIP: 0033:0x7f4b0a580986
  [1330.352379] RSP: 002b:00007ffd34d75418 EFLAGS: 00000246 ORIG_RAX: 0000000000000012
  [1330.352382] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f4b0a580986
  [1330.352383] RDX: 0000000000001000 RSI: 00007f4b0a3a4000 RDI: 0000000000000003
  [1330.352385] RBP: 00007f4b0a3a4000 R08: 0000000000000003 R09: 0000000000000000
  [1330.352386] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
  [1330.352387] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

Unlike for reads, at btrfs_dio_iomap_begin() we return with the extent
range unlocked, but later when the page faults are triggered and we try
to read the extents, we end up btrfs_lock_and_flush_ordered_range() where
we find the ordered extent for our write, created by the iomap callback
btrfs_dio_iomap_begin(), and we wait for it to complete, which makes us
deadlock since we can't complete the ordered extent without reading the
pages (the iomap code only submits the bio after the pages are faulted
in).

Fix this by setting the nofault attribute of the given iov_iter and retry
the direct IO read/write if we get an -EFAULT error returned from iomap.
For reads, also disable page faults completely, this is because when we
read from a hole or a prealloc extent, we can still trigger page faults
due to the call to iov_iter_zero() done by iomap - at the moment, it is
oblivious to the value of the ->nofault attribute of an iov_iter.
We also need to keep track of the number of bytes written or read, and
pass it to iomap_dio_rw(), as well as use the new flag IOMAP_DIO_PARTIAL.

This depends on the iov_iter and iomap changes introduced in commit
c03098d4b9ad ("Merge tag 'gfs2-v5.15-rc5-mmap-fault' of
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2").

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-01 17:22:33 +02:00
Andreas Gruenbacher 640a6be8e8 gfs2: Fix mmap + page fault deadlocks for direct I/O
commit b01b2d72da25c000aeb124bc78daf3fb998be2b6 upstream

Also disable page faults during direct I/O requests and implement a
similar kind of retry logic as in the buffered I/O case.

The retry logic in the direct I/O case differs from the buffered I/O
case in the following way: direct I/O doesn't provide the kinds of
consistency guarantees between concurrent reads and writes that buffered
I/O provides, so once we lose the inode glock while faulting in user
pages, we always resume the operation.  We never need to return a
partial read or write.

This locking problem was originally reported by Jan Kara.  Linus came up
with the idea of disabling page faults.  Many thanks to Al Viro and
Matthew Wilcox for their feedback.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-01 17:22:33 +02:00
Andreas Gruenbacher d3b744791b iomap: Add done_before argument to iomap_dio_rw
commit 4fdccaa0d184c202f98d73b24e3ec8eeee88ab8d upstream

Add a done_before argument to iomap_dio_rw that indicates how much of
the request has already been transferred.  When the request succeeds, we
report that done_before additional bytes were tranferred.  This is
useful for finishing a request asynchronously when part of the request
has already been completed synchronously.

We'll use that to allow iomap_dio_rw to be used with page faults
disabled: when a page fault occurs while submitting a request, we
synchronously complete the part of the request that has already been
submitted.  The caller can then take care of the page fault and call
iomap_dio_rw again for the rest of the request, passing in the number of
bytes already tranferred.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-01 17:22:32 +02:00
Andreas Gruenbacher ea7a578588 iomap: Support partial direct I/O on user copy failures
commit 97308f8b0d867e9ef59528cd97f0db55ffdf5651 upstream

In iomap_dio_rw, when iomap_apply returns an -EFAULT error and the
IOMAP_DIO_PARTIAL flag is set, complete the request synchronously and
return a partial result.  This allows the caller to deal with the page
fault and retry the remainder of the request.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-01 17:22:31 +02:00
Andreas Gruenbacher a00cc46f97 iomap: Fix iomap_dio_rw return value for user copies
commit 42c498c18a94eed79896c50871889af52fa0822e upstream

When a user copy fails in one of the helpers of iomap_dio_rw, fail with
-EFAULT instead of returning 0.  This matches what iomap_dio_bio_actor
returns when it gets an -EFAULT from bio_iov_iter_get_pages.  With these
changes, iomap_dio_actor now consistently fails with -EFAULT when a user
page cannot be faulted in.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-01 17:22:31 +02:00
Andreas Gruenbacher 81a7fc397a gfs2: Fix mmap + page fault deadlocks for buffered I/O
commit 00bfe02f479688a67a29019d1228f1470e26f014 upstream

In the .read_iter and .write_iter file operations, we're accessing
user-space memory while holding the inode glock.  There is a possibility
that the memory is mapped to the same file, in which case we'd recurse
on the same glock.

We could detect and work around this simple case of recursive locking,
but more complex scenarios exist that involve multiple glocks,
processes, and cluster nodes, and working around all of those cases
isn't practical or even possible.

Avoid these kinds of problems by disabling page faults while holding the
inode glock.  If a page fault would occur, we either end up with a
partial read or write or with -EFAULT if nothing could be read or
written.  In either case, we know that we're not done with the
operation, so we indicate that we're willing to give up the inode glock
and then we fault in the missing pages.  If that made us lose the inode
glock, we return a partial read or write.  Otherwise, we resume the
operation.

This locking problem was originally reported by Jan Kara.  Linus came up
with the idea of disabling page faults.  Many thanks to Al Viro and
Matthew Wilcox for their feedback.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-01 17:22:31 +02:00
Andreas Gruenbacher 38b5849881 gfs2: Eliminate ip->i_gh
commit 1b223f7065bc7d89c4677c27381817cc95b117a8 upstream

Now that gfs2_file_buffered_write is the only remaining user of
ip->i_gh, we can move the glock holder to the stack (or rather, use the
one we already have on the stack); there is no need for keeping the
holder in the inode anymore.

This is slightly complicated by the fact that we're using ip->i_gh for
the statfs inode in gfs2_file_buffered_write as well.  Writing to the
statfs inode isn't very common, so allocate the statfs holder
dynamically when needed.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-01 17:22:30 +02:00
Andreas Gruenbacher 8d363d8173 gfs2: Move the inode glock locking to gfs2_file_buffered_write
commit b924bdab7445946e2ed364a0e6e249d36f1f1158 upstream

So far, for buffered writes, we were taking the inode glock in
gfs2_iomap_begin and dropping it in gfs2_iomap_end with the intention of
not holding the inode glock while iomap_write_actor faults in user
pages.  It turns out that iomap_write_actor is called inside iomap_begin
... iomap_end, so the user pages were still faulted in while holding the
inode glock and the locking code in iomap_begin / iomap_end was
completely pointless.

Move the locking into gfs2_file_buffered_write instead.  We'll take care
of the potential deadlocks due to faulting in user pages while holding a
glock in a subsequent patch.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-01 17:22:30 +02:00
Bob Peterson 416a705304 gfs2: Introduce flag for glock holder auto-demotion
commit dc732906c2450939c319fec6e258aa89ecb5a632 upstream

This patch introduces a new HIF_MAY_DEMOTE flag and infrastructure that
will allow glocks to be demoted automatically on locking conflicts.
When a locking request comes in that isn't compatible with the locking
state of an active holder and that holder has the HIF_MAY_DEMOTE flag
set, the holder will be demoted before the incoming locking request is
granted.

Note that this mechanism demotes active holders (with the HIF_HOLDER
flag set), while before we were only demoting glocks without any active
holders.  This allows processes to keep hold of locks that may form a
cyclic locking dependency; the core glock logic will then break those
dependencies in case a conflicting locking request occurs.  We'll use
this to avoid giving up the inode glock proactively before faulting in
pages.

Processes that allow a glock holder to be taken away indicate this by
calling gfs2_holder_allow_demote(), which sets the HIF_MAY_DEMOTE flag.
Later, they call gfs2_holder_disallow_demote() to clear the flag again,
and then they check if their holder is still queued: if it is, they are
still holding the glock; if it isn't, they can re-acquire the glock (or
abort).

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-01 17:22:30 +02:00
Andreas Gruenbacher b25cfbc0e7 gfs2: Clean up function may_grant
commit 6144464937fe1e6135b13a30502a339d549bf093 upstream

Pass the first current glock holder into function may_grant and
deobfuscate the logic there.

While at it, switch from BUG_ON to GLOCK_BUG_ON in may_grant.  To make
that build cleanly, de-constify the may_grant arguments.

We're now using function find_first_holder in do_promote, so move the
function's definition above do_promote.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-01 17:22:29 +02:00
Andreas Gruenbacher b88b998579 gfs2: Add wrapper for iomap_file_buffered_write
commit 2eb7509a05443048fb4df60b782de3f03c6c298b upstream

Add a wrapper around iomap_file_buffered_write.  We'll add code for when
the operation needs to be retried here later.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-01 17:22:29 +02:00