mirror of https://gitee.com/openkylin/linux.git
63909 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
Linus Torvalds | 4119bf9f1d |
10 smb fixes most RDMA (smbdirect) related, also add experimental support for swap over SMB3 mounts, and also adds fix which improves performance of signed connections
-----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl6SeJgACgkQiiy9cAdy T1GqsQwAiZ/fNCrCFNHT3il6KfYVk4/meU1AYovjXjQ0//xIvn3FqG/sJCOEtWDL bH1D9xsvVz7sDT9l3GNeMtR17O3Lkc4VDVy1LtNEu4a2oA1MxksPLuJVI9S+Wcu6 Du7Fb4hzbTrGRj7C+6GL9mx1uOldoaGsbUV21D9dcQeAA0kNodrm6FAUMG/55wZO qB7H/ODaT2CF6s8H9o4GFV5vcasZSBJwaJeFsgoY/BOeQzMQgNnzyvleJTRZnSHk BAnqiocK2jg0Q/EjJs402/UKTEjZRy4q3mS/+Rpa2BRax0rPqdfRIXAJx2SAShNU pvinNge5PdWANZ1fti7hcL+e6n5TyinnkS9AOdzuS38yrpFVfJnrRD1igfUAuuME L64aODnLiAonnAlGcEk341ESt/FMxWnnXIxEPgEGtlyFiyoH13BI763IQYc4HLXQ 8Xo6GkyirDmnflVpFsfmuydhacLjwIxYLnHCAsDspDPEAjRhiKAXTV9d+VpkFgw+ hBNYP/dm =tstz -----END PGP SIGNATURE----- Merge tag '5.7-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: "Ten cifs/smb fixes: - five RDMA (smbdirect) related fixes - add experimental support for swap over SMB3 mounts - also a fix which improves performance of signed connections" * tag '5.7-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: smb3: enable swap on SMB3 mounts smb3: change noisy error message to FYI smb3: smbdirect support can be configured by default cifs: smbd: Do not schedule work to send immediate packet on every receive cifs: smbd: Properly process errors on ib_post_send cifs: Allocate crypto structures on the fly for calculating signatures of incoming packets cifs: smbd: Update receive credits before sending and deal with credits roll back on failure before sending cifs: smbd: Check send queue size before posting a send cifs: smbd: Merge code to track pending packets cifs: ignore cached share root handle closing errors |
|
Linus Torvalds | 50bda5faa6 |
NFS client bugfix for Linux 5.7
Bugfix: - Fix an RCU read lock leakage in pnfs_alloc_ds_commits_list() -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl6SZxQACgkQZwvnipYK APIhmhAAp7kGMRG/FmlmOFlppjvaN4/kWQpOdzQSl2OJBPaNCGXts2TlCUJD3/hR HXQSvxdH+rxZvIBoxDJT3ZMJH3DmBIuplwk0rmnb3kUeqVm8ls/eKePaHbBGySh8 UZ524hagXUSikrSUJQD/fgiwqi3YSTbbmCw7YbgDE8ot3oM5l6hRv/ff7sbE8sSs X9Zt1d2t3zNGmwMaXtaX7WEbcOzNo09eZ3YqxquDE1OImxUVVu7BBr0BNcg/meUE /WrvCQ34821Y27vEvZ5s6sCdF/IKiP3Dn3O18TwhO74mzU8/djUx3hqayAZ0qsuM ACamzH+j1vOFvd6gpRkb467B5c2gux+4y7LI0qy2D6CLvHm06e+n7QppxwaB3niu cpS5PqMaaD+0T9m1xAIOXhqFN25sgWYFapgHHZ0JnHRwLgvxP9r5gYkWJe5gRUdn wHTYXfT+xXMmITvW67JDbkgOQsmhfw3kfQ7doRHv/Ont422KC5auy+ka5NoFZyNL Iihijs8oIuFQZY4losX3y29vUol5kI9NrMycyMmRL9Yx9PnAnlkTh75BrDDV3y+V LkWlCKWdTK+gvTe/PCnmolBm+FRiSkH97FVKxYHzrO+CXdPvgybeXcedqRbhEXiL N2cBlOxYBxWOtQWLdD/iLFyF+yGj3yUiQZJpaT8ql1EZMkd/9Jw= =zeOO -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.7-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfix from Trond Myklebust: "Fix an RCU read lock leakage in pnfs_alloc_ds_commits_list()" * tag 'nfs-for-5.7-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: pNFS: Fix RCU lock leakage |
|
Trond Myklebust | 27d231c0c6 |
pNFS: Fix RCU lock leakage
Another brown paper bag moment. pnfs_alloc_ds_commits_list() is leaking
the RCU lock.
Fixes:
|
|
Linus Torvalds | 5b8b9d0c6d |
Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton: - Almost all of the rest of MM (memcg, slab-generic, slab, pagealloc, gup, hugetlb, pagemap, memremap) - Various other things (hfs, ocfs2, kmod, misc, seqfile) * akpm: (34 commits) ipc/util.c: sysvipc_find_ipc() should increase position index kernel/gcov/fs.c: gcov_seq_next() should increase position index fs/seq_file.c: seq_read(): add info message about buggy .next functions drivers/dma/tegra20-apb-dma.c: fix platform_get_irq.cocci warnings change email address for Pali Rohár selftests: kmod: test disabling module autoloading selftests: kmod: fix handling test numbers above 9 docs: admin-guide: document the kernel.modprobe sysctl fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once() kmod: make request_module() return an error when autoloading is disabled mm/memremap: set caching mode for PCI P2PDMA memory to WC mm/memory_hotplug: add pgprot_t to mhp_params powerpc/mm: thread pgprot_t through create_section_mapping() x86/mm: introduce __set_memory_prot() x86/mm: thread pgprot_t through init_memory_mapping() mm/memory_hotplug: rename mhp_restrictions to mhp_params mm/memory_hotplug: drop the flags field from struct mhp_restrictions mm/special: create generic fallbacks for pte_special() and pte_mkspecial() mm/vma: introduce VM_ACCESS_FLAGS mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS ... |
|
Linus Torvalds | 4e4bdcfa21 |
orangefs: a fix and two cleanups and a merge conflict
Fix: Christoph Hellwig noticed that some logic I added to orangefs_file_read_iter introduced a race condition, so he sent a reversion patch. I had to modify his patch since reverting at this point broke Orangefs. Cleanup 1: Christoph Hellwig noticed that we were doing some unnecessary work in orangefs_flush, so he sent in a patch that removed the un-needed code. Cleanup 2: Al Viro told me he had trouble building Orangefs. Orangefs should be easy to build, even for Al :-). I looked back at the test server build notes in orangefs.txt, just in case that's where the trouble really is, and found a couple of typos and made a couple of clarifications. Merge Conflict: Stephen Rothwell reported that my modifications to orangefs.txt caused a merge conflict with orangefs.rst in Linux Next. I wasn't sure what to do, so I asked, and Jonathan Corbet said not to worry about it and just to report it to Linus. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEIGSFVdO6eop9nER2z0QOqevODb4FAl6Qft8ACgkQz0QOqevO Db6v1A/7BnEalIB30OPy2PUAcobY3Bqtio2Mb7ULquvZAarVvNgTZIyI4UGt0VBN bc8KOpKF/ZJIJLjgyorjnA7KGYg+FLdWatTu5GkdEz6DxKbbvXsokJI/uZkXVIGF BS/Ic0KmOBL+wUjqc/xfJQrm5dSLp2woHWfhyWoKuKJtC0Ut9a0ov0Y8T2X/F4Jw xV4rMhuEy9zBU6nLHOHEumt9RDUAL2TlK4RJ9on0OmV1W/iizin6GU2q20E6fJVb C0ShqigQEI+Uo6QXKf9w8eqWH4Vq431L0zlwASR63DgnmMXM4osx3iVETTcOoeg7 xcz7mjMOqutAu1R89Y0BIBglVvZaP+Auds+awFRvcGFoM5s19DlolRYofLqiIdzy OsQd2oT2DBEHBkYnGCRXnqPLj+Q9/sMs4vNAaHUcEz1pn3x57lgxFFPHQEF6+bLy WX/cEVRgqjHhieLUCXLI9jVA54mhvsGZuyXputLOA7r3Dqjk5LIWPIGIW1ksjA1T Kl7Ge/C4i3OOAJDtjcr+kl5LoKu/ppPmg+fOaTQ9qmGWRiwXdfoR4GMT3pEF0Uam bXwR2CZlsUBIlqO9FmWwrdDddt55gMRbjNcqzgqpCyDKQ5zFM+nWgsl5oHFYkTty QYMZtslM8icZPxGYucrhqLxaqRJ4kJoCcoxmOyPrfN/0cDWuIDc= =dNWv -----END PGP SIGNATURE----- Merge tag 'for-linus-5.7-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs updates from Mike Marshall: "A fix and two cleanups. Fix: - Christoph Hellwig noticed that some logic I added to orangefs_file_read_iter introduced a race condition, so he sent a reversion patch. I had to modify his patch since reverting at this point broke Orangefs. Cleanups: - Christoph Hellwig noticed that we were doing some unnecessary work in orangefs_flush, so he sent in a patch that removed the un-needed code. - Al Viro told me he had trouble building Orangefs. Orangefs should be easy to build, even for Al :-). I looked back at the test server build notes in orangefs.txt, just in case that's where the trouble really is, and found a couple of typos and made a couple of clarifications" * tag 'for-linus-5.7-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: clarify build steps for test server in orangefs.txt orangefs: don't mess with I_DIRTY_TIMES in orangefs_flush orangefs: get rid of knob code... |
|
Vasily Averin | 3bfa7e141b |
fs/seq_file.c: seq_read(): add info message about buggy .next functions
Patch series "seq_file .next functions should increase position index". In Aug 2018 NeilBrown noticed commit |
|
Pali Rohár | 149ed3d404 |
change email address for Pali Rohár
For security reasons I stopped using gmail account and kernel address is now up-to-date alias to my personal address. People periodically send me emails to address which they found in source code of drivers, so this change reflects state where people can contact me. [ Added .mailmap entry as per Joe Perches - Linus ] Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Joe Perches <joe@perches.com> Link: http://lkml.kernel.org/r/20200307104237.8199-1-pali@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Eric Biggers | 26c5d78c97 |
fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()
After request_module(), nothing is stopping the module from being
unloaded until someone takes a reference to it via try_get_module().
The WARN_ONCE() in get_fs_type() is thus user-reachable, via userspace
running 'rmmod' concurrently.
Since WARN_ONCE() is for kernel bugs only, not for user-reachable
situations, downgrade this warning to pr_warn_once().
Keep it printed once only, since the intent of this warning is to detect
a bug in modprobe at boot time. Printing the warning more than once
wouldn't really provide any useful extra information.
Fixes:
|
|
Changwei Ge | 783fda856e |
ocfs2: no need try to truncate file beyond i_size
Linux fallocate(2) with FALLOC_FL_PUNCH_HOLE mode set, its offset can exceed the inode size. Ocfs2 now doesn't allow that offset beyond inode size. This restriction is not necessary and violates fallocate(2) semantics. If fallocate(2) offset is beyond inode size, just return success and do nothing further. Otherwise, ocfs2 will crash the kernel. kernel BUG at fs/ocfs2//alloc.c:7264! ocfs2_truncate_inline+0x20f/0x360 [ocfs2] ocfs2_remove_inode_range+0x23c/0xcb0 [ocfs2] __ocfs2_change_file_space+0x4a5/0x650 [ocfs2] ocfs2_fallocate+0x83/0xa0 [ocfs2] vfs_fallocate+0x148/0x230 SyS_fallocate+0x48/0x80 do_syscall_64+0x79/0x170 Signed-off-by: Changwei Ge <chge@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200407082754.17565-1-chge@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Simon Gander | 25efb2ffdf |
hfsplus: fix crash and filesystem corruption when deleting files
When removing files containing extended attributes, the hfsplus driver may remove the wrong entries from the attributes b-tree, causing major filesystem damage and in some cases even kernel crashes. To remove a file, all its extended attributes have to be removed as well. The driver does this by looking up all keys in the attributes b-tree with the cnid of the file. Each of these entries then gets deleted using the key used for searching, which doesn't contain the attribute's name when it should. Since the key doesn't contain the name, the deletion routine will not find the correct entry and instead remove the one in front of it. If parent nodes have to be modified, these become corrupt as well. This causes invalid links and unsorted entries that not even macOS's fsck_hfs is able to fix. To fix this, modify the search key before an entry is deleted from the attributes b-tree by copying the found entry's key into the search key, therefore ensuring that the correct entry gets removed from the tree. Signed-off-by: Simon Gander <simon@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Anton Altaparmakov <anton@tuxera.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200327155541.1521-1-simon@tuxera.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Linus Torvalds | 87ad46e601 |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull proc fix from Eric Biederman: "A brown paper bag slipped through my proc changes, and syzcaller caught it when the code ended up in your tree. I have opted to fix it the simplest cleanest way I know how, so there is no reasonable chance for the bug to repeat" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: proc: Use a dedicated lock in struct pid |
|
Steve French | 4e8aea30f7 |
smb3: enable swap on SMB3 mounts
Add experimental support for allowing a swap file to be on an SMB3 mount. There are use cases where swapping over a secure network filesystem is preferable. In some cases there are no local block devices large enough, and network block devices can be hard to setup and secure. And in some cases there are no local block devices at all (e.g. with the recent addition of remote boot over SMB3 mounts). There are various enhancements that can be added later e.g.: - doing a mandatory byte range lock over the swapfile (until the Linux VFS is modified to notify the file system that an open is for a swapfile, when the file can be opened "DENY_ALL" to prevent others from opening it). - pinning more buffers in the underlying transport to minimize memory allocations in the TCP stack under the fs - documenting how to create ACLs (on the server) to secure the swapfile (or adding additional tools to cifs-utils to make it easier) Signed-off-by: Steve French <stfrench@microsoft.com> Acked-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> |
|
Linus Torvalds | 172edde960 |
io_uring-5.7-2020-04-09
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl6Pxm0QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpunoD/9Q5g5G/N14UWQpMMh5SxZ98dZL/ogT2RQt eCOROX1Iz3MbHAwQbLd8AaeWkzSG9UjRsF8ED7Td1LRkgaDxrBdnlyx19ys+eiPe ePhM9yoBTpjef3my46/Q7dKUoI1KxWW95QDY0QgRRhP7UQEKRJcppXdNOIs3Bsn4 RstRJi8XulUeS/FUGChydmbJCX4euy/uIB5C8FoWCN70SeF9LiXkhF+YeyUaGYJE VY9UOTFd1UDcoaZfwaQBxya1hj0mOthax9/1Sw9k21FXwdMDObfpgGtisODRTubQ z0NyVl17eoPae6oG6YaGht6YRo7t8D98PmjnEN4D38XMl1dzSBFRN0e9e39PvYS+ NN3ICADAJO6xLkSt97GqpE7KPlfYBzap/dQ3Zlcz1LM3nPYEYtCgPp+90D3Ah8+4 rW+/wMb+SOOithY89aE14e50PABDDnTpL1sftbpfxiqculUQrD2HqmwT5Q2V91DQ 61xspJ8H/TYbJ01v1ChYoCaYYNdbJmsJpGcourVErIoRq5rvEH1nCKFS+8UbBboR CP2K26BxFnQ2ryxdqae99c+E38DJzlZk/51KAeZO3TfIEVKFozaQr/J93FENyKGl ZUXim5pU7dzGC8v8GECjYzGtY5FA4YjwlsjPRqYzj+hHhuuE/qn3wovH/d7vGvSt xFZojkj7aA== =2R2G -----END PGP SIGNATURE----- Merge tag 'io_uring-5.7-2020-04-09' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "Here's a set of fixes that either weren't quite ready for the first, or came about from some intensive testing on memcached with 350K+ sockets. Summary: - Fixes for races or deadlocks around poll handling - Don't double account fixed files against RLIMIT_NOFILE - IORING_OP_OPENAT LFS fix - Poll retry handling (Bijan) - Missing finish_wait() for SQPOLL (Hillf) - Cleanup/split of io_kiocb alloc vs ctx references (Pavel) - Fixed file unregistration and init fixes (Xiaoguang) - Various little fixes (Xiaoguang, Pavel, Colin)" * tag 'io_uring-5.7-2020-04-09' of git://git.kernel.dk/linux-block: io_uring: punt final io_ring_ctx wait-and-free to workqueue io_uring: fix fs cleanup on cqe overflow io_uring: don't read user-shared sqe flags twice io_uring: remove req init from io_get_req() io_uring: alloc req only after getting sqe io_uring: simplify io_get_sqring io_uring: do not always copy iovec in io_req_map_rw() io_uring: ensure openat sets O_LARGEFILE if needed io_uring: initialize fixed_file_data lock io_uring: remove redundant variable pointer nxt and io_wq_assign_next call io_uring: fix ctx refcounting in io_submit_sqes() io_uring: process requests completed with -EAGAIN on poll list io_uring: remove bogus RLIMIT_NOFILE check in file registration io_uring: use io-wq manager as backup task if task is exiting io_uring: grab task reference for poll requests io_uring: retry poll if we got woken with non-matching mask io_uring: add missing finish_wait() in io_sq_thread() io_uring: refactor file register/unregister/update handling |
|
Linus Torvalds | 8c3c07439e |
(More) new code for 5.7:
- Validate the realtime geometry in the superblock when mounting - Refactor a bunch of tricky flag handling in the log code - Flush the CIL more judiciously so that we don't wait until there are millions of log items consuming a lot of memory. - Throttle transaction commits to prevent the xfs frontend from flooding the CIL with too many log items. - Account metadata buffers correctly for memory reclaim. - Mark slabs properly for memory reclaim. These should help reclaim run more effectively when XFS is using a lot of memory. - Don't write a garbage log record at unmount time if we're trying to trigger summary counter recalculation at next mount. - Don't block the AIL on locked dquot/inode buffers; instead trigger its backoff mechanism to give the lock holder a chance to finish up. - Ratelimit writeback flushing when buffered writes encounter ENOSPC. - Other minor cleanups. - Make reflink a synchronous operation when the fs is mounted with wsync or sync, which means that now we force the log to disk to record the changes. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl6LVwIACgkQ+H93GTRK tOuSaQ/6AmhmSTUU6V5ZP8LptYcmRDzu7ZC8QAINWhVjXhDzbi26bFbEwf/sE0hq m9yRANKSMBJvcSPB2KdUZLJvgClUaPmrlo1yE3RV8K+FCCj/60VYV8IWuLmeuG/a xBwAQVBGa/go/6I1gxj+ofiltMCElAYET2nShTyFBX5kVGUsJfW5Cvm94KTtBITi VP4GmVwyl0Q5rM6+/4wd7IQKfzieBk4DsCPYW26cs+STdVXoHzWgyxeM8AkeM5GW 6SI/UKP3PDHa+3tJ/NucDnCa9ykQsaOykHl6VwVZGDTDb2KFNGqiAqT9fPtx1yUF 30EBPPrhSAxh9wHSmxuU6PpPcMqBomanOe7as66ZLcV4Uij+y1JURWumTDw8Yo3C edCh/wpWgmKSj95U0D8KPk/ETgMIaBFRVJhQZ2JEgtarPHNxrudsoOcRykftJeTK l3tPyS5pK8OjVhK927G5rPJd7pqsO0BjjWobno87E3M7D238qyB+A1DAizMIhIid J9oyu3GGaxg4n228AmOAn7DqV4eGBjTGSLJGh4zUYy0Zk7zK52fKEYh+6z248rxB Ce0PGB2LXsj9SwbeRMfJAQ49uzXusXv5Ok9ULhk78J+eM8sQi+0KRc2snk++/MSW c+pVvW77GVihuh1mlBnPIC+OvI+4q/DYPTNttV9yXUoPKchMBHo= =+M2M -----END PGP SIGNATURE----- Merge tag 'xfs-5.7-merge-12' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull more xfs updates from Darrick Wong: "As promised last week, this batch changes how xfs interacts with memory reclaim; how the log batches and throttles log items; how hard writes near ENOSPC will try to squeeze more space out of the filesystem; and hopefully fix the last of the umount hangs after a catastrophic failure. Summary: - Validate the realtime geometry in the superblock when mounting - Refactor a bunch of tricky flag handling in the log code - Flush the CIL more judiciously so that we don't wait until there are millions of log items consuming a lot of memory. - Throttle transaction commits to prevent the xfs frontend from flooding the CIL with too many log items. - Account metadata buffers correctly for memory reclaim. - Mark slabs properly for memory reclaim. These should help reclaim run more effectively when XFS is using a lot of memory. - Don't write a garbage log record at unmount time if we're trying to trigger summary counter recalculation at next mount. - Don't block the AIL on locked dquot/inode buffers; instead trigger its backoff mechanism to give the lock holder a chance to finish up. - Ratelimit writeback flushing when buffered writes encounter ENOSPC. - Other minor cleanups. - Make reflink a synchronous operation when the fs is mounted with wsync or sync, which means that now we force the log to disk to record the changes" * tag 'xfs-5.7-merge-12' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (26 commits) xfs: reflink should force the log out if mounted with wsync xfs: factor out a new xfs_log_force_inode helper xfs: fix inode number overflow in ifree cluster helper xfs: remove redundant variable assignment in xfs_symlink() xfs: ratelimit inode flush on buffered write ENOSPC xfs: return locked status of inode buffer on xfsaild push xfs: trylock underlying buffer on dquot flush xfs: remove unnecessary ternary from xfs_create xfs: don't write a corrupt unmount record to force summary counter recalc xfs: factor inode lookup from xfs_ifree_cluster xfs: tail updates only need to occur when LSN changes xfs: factor common AIL item deletion code xfs: correctly acount for reclaimable slabs xfs: Improve metadata buffer reclaim accountability xfs: don't allow log IO to be throttled xfs: Throttle commits on delayed background CIL push xfs: Lower CIL flush limit for large logs xfs: remove some stale comments from the log code xfs: refactor unmount record writing xfs: merge xlog_commit_record with xlog_write_done ... |
|
Jens Axboe | 85faa7b834 |
io_uring: punt final io_ring_ctx wait-and-free to workqueue
We can't reliably wait in io_ring_ctx_wait_and_kill(), since the task_works list isn't ordered (in fact it's LIFO ordered). We could either fix this with a separate task_works list for io_uring work, or just punt the wait-and-free to async context. This ensures that task_work that comes in while we're shutting down is processed correctly. If we don't go async, we could have work past the fput() work for the ring that depends on work that won't be executed until after we're done with the wait-and-free. But as this operation is blocking, it'll never get a chance to run. This was reproduced with hundreds of thousands of sockets running memcached, haven't been able to reproduce this synthetically. Reported-by: Dan Melnic <dmm@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
|
Steve French | 1dc94b7381 |
smb3: change noisy error message to FYI
The noisy posix error message in readdir was supposed to be an FYI (not enabled by default) CIFS VFS: XXX dev 66306, reparse 0, mode 755 Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> |
|
Linus Torvalds | e4da01d833 |
powerpc updates for 5.7 #2
- A fix for a crash in machine check handling on pseries (ie. guests) - A small series to make it possible to disable CONFIG_COMPAT, and turn it off by default for ppc64le where it's not used. - A few other miscellaneous fixes and small improvements. Thanks to: Alexey Kardashevskiy, Anju T Sudhakar, Arnd Bergmann, Christophe Leroy, Dan Carpenter, Ganesh Goudar, Geert Uytterhoeven, Geoff Levand, Mahesh Salgaonkar, Markus Elfring, Michal Suchanek, Nicholas Piggin, Stephen Boyd, Wen Xiong. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl6O8LgTHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgMcYEACbGf+Z9brLSasYajoqU6QdqGPacHEN 1a9TEmUnN+HWgtfkkoEBFbyuYHnhyhYuf7hvNccDjDA91ESVylO+Wq7Q+v/xoz29 LVyb3V6uuVMLHnoqwP5jpr0lS0aOpuu3Nc2SpfBuolDtJqeMxpVEEK3Ln3uATlq6 SnEAxQEKb2x4Y4Tfuq5A3txupj0s/UYrmeR6GAdkN3Oapbb9uQl8Ql2smqrZo0cq 6TLxtoFzbfXkV6NmY2V0OKMTeXt0fhrvdDhFEBckCUpRZLv4Fd7CwPWNER2bUVs6 04kg87BAO8qRyfr3G93oP0mWgi65kpXI8yN6Vt5Lig+5PnxNh7swvEqDpQR1s9se uVoeN7RBHOKQppZRkpzf6yZbelCcMuwTeIBQ9XOx/jYFAHJ2nUkJm9TYgKckVvNY E4shiM1eoOVMrEmODFBCUmUkJLyn5jU1+r5mn708v2Nb5E1XgoTejitB6bHyL+Aa zo/0DdsZO86iNE7th94oHQRgVbx1vtP9kV6vK6BLB5M95RSGEdVAEMo5CTL70wr+ hz7suOaijPi+TeCW9YPrcOHcHOQE9pzTFQoVZHuv8egbvd9Rni3aBAMMqHx/lQdE Hk4zN7IytOLqQTfINNYgoo0kUKI1ipJQOBfR5jyeLhgg6yp36CnO/m+0QGvrILbi 18tlf2qkD75Z3g== =72sb -----END PGP SIGNATURE----- Merge tag 'powerpc-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull more powerpc updates from Michael Ellerman: "The bulk of this is the series to make CONFIG_COMPAT user-selectable, it's been around for a long time but was blocked behind the syscall-in-C series. Plus there's also a few fixes and other minor things. Summary: - A fix for a crash in machine check handling on pseries (ie. guests) - A small series to make it possible to disable CONFIG_COMPAT, and turn it off by default for ppc64le where it's not used. - A few other miscellaneous fixes and small improvements. Thanks to: Alexey Kardashevskiy, Anju T Sudhakar, Arnd Bergmann, Christophe Leroy, Dan Carpenter, Ganesh Goudar, Geert Uytterhoeven, Geoff Levand, Mahesh Salgaonkar, Markus Elfring, Michal Suchanek, Nicholas Piggin, Stephen Boyd, Wen Xiong" * tag 'powerpc-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: selftests/powerpc: Always build the tm-poison test 64-bit powerpc: Improve ppc_save_regs() Revert "powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled" powerpc/time: Replace <linux/clk-provider.h> by <linux/of_clk.h> powerpc/pseries/ddw: Extend upper limit for huge DMA window for persistent memory powerpc/perf: split callchain.c by bitness powerpc/64: Make COMPAT user-selectable disabled on littleendian by default. powerpc/64: make buildable without CONFIG_COMPAT powerpc/perf: consolidate valid_user_sp -> invalid_user_sp powerpc/perf: consolidate read_user_stack_32 powerpc: move common register copy functions from signal_32.c to signal.c powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro powerpc/ps3: Set CONFIG_UEVENT_HELPER=y in ps3_defconfig powerpc/ps3: Remove an unneeded NULL check powerpc/ps3: Remove duplicate error message powerpc/powernv: Re-enable imc trace-mode in kernel powerpc/perf: Implement a global lock to avoid races between trace, core and thread imc events. powerpc/pseries: Fix MCE handling on pseries selftests/eeh: Skip ahci adapters powerpc/64s: Fix doorbell wakeup msgclr optimisation |
|
Eric W. Biederman | 63f818f46a |
proc: Use a dedicated lock in struct pid
syzbot wrote:
> ========================================================
> WARNING: possible irq lock inversion dependency detected
> 5.6.0-syzkaller #0 Not tainted
> --------------------------------------------------------
> swapper/1/0 just changed the state of lock:
> ffffffff898090d8 (tasklist_lock){.+.?}-{2:2}, at: send_sigurg+0x9f/0x320 fs/fcntl.c:840
> but this lock took another, SOFTIRQ-unsafe lock in the past:
> (&pid->wait_pidfd){+.+.}-{2:2}
>
>
> and interrupts could create inverse lock ordering between them.
>
>
> other info that might help us debug this:
> Possible interrupt unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&pid->wait_pidfd);
> local_irq_disable();
> lock(tasklist_lock);
> lock(&pid->wait_pidfd);
> <Interrupt>
> lock(tasklist_lock);
>
> *** DEADLOCK ***
>
> 4 locks held by swapper/1/0:
The problem is that because wait_pidfd.lock is taken under the tasklist
lock. It must always be taken with irqs disabled as tasklist_lock can be
taken from interrupt context and if wait_pidfd.lock was already taken this
would create a lock order inversion.
Oleg suggested just disabling irqs where I have added extra calls to
wait_pidfd.lock. That should be safe and I think the code will eventually
do that. It was rightly pointed out by Christian that sharing the
wait_pidfd.lock was a premature optimization.
It is also true that my pre-merge window testing was insufficient. So
remove the premature optimization and give struct pid a dedicated lock of
it's own for struct pid things. I have verified that lockdep sees all 3
paths where we take the new pid->lock and lockdep does not complain.
It is my current day dream that one day pid->lock can be used to guard the
task lists as well and then the tasklist_lock won't need to be held to
deliver signals. That will require taking pid->lock with irqs disabled.
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/lkml/00000000000011d66805a25cd73f@google.com/
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Reported-by: syzbot+343f75cdeea091340956@syzkaller.appspotmail.com
Reported-by: syzbot+832aabf700bc3ec920b9@syzkaller.appspotmail.com
Reported-by: syzbot+f675f964019f884dbd0f@syzkaller.appspotmail.com
Reported-by: syzbot+a9fb1457d720a55d6dc5@syzkaller.appspotmail.com
Fixes:
|
|
Pavel Begunkov | c398ecb3d6 |
io_uring: fix fs cleanup on cqe overflow
If completion queue overflow occurs, __io_cqring_fill_event() will update req->cflags, which is in a union with req->work and happens to be aliased to req->work.fs. Following io_free_req() -> io_req_work_drop_env() may get a bunch of different problems (miscount fs->users, segfault, etc) on cleaning @fs. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
|
Linus Torvalds | fcc95f0640 |
The main items are:
- support for asynchronous create and unlink (Jeff Layton). Creates and unlinks are satisfied locally, without waiting for a reply from the MDS, provided the client has been granted appropriate caps (new in v15.y.z ("Octopus") release). This can be a big help for metadata heavy workloads such as tar and rsync. Opt-in with the new nowsync mount option. - multiple blk-mq queues for rbd (Hannes Reinecke and myself). When the driver was converted to blk-mq, we settled on a single blk-mq queue because of a global lock in libceph and some other technical debt. These have since been addressed, so allocate a queue per CPU to enhance parallelism. - don't hold onto caps that aren't actually needed (Zheng Yan). This has been our long-standing behavior, but it causes issues with some active/standby applications (synchronous I/O, stalls if the standby goes down, etc). - .snap directory timestamps consistent with ceph-fuse (Luis Henriques) -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAl6OEO4THGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzi0XNB/wItYipkjlL5fIUBqiRWzYai72DWdPp CnOZo8LB+O0MQDomPT6DdpU1OMlWZi5HF7zklrZ35LTm21UkRNC9zvccjs9l66PJ qo9cKJbxxju+hgzIvgvK9PjlDlaiFAc/pkF8lZ/NaOnSsM1vvsFL9IuY2LXS38MY A/uUTZNUnFy5udam8TPuN+gWwZcUIH48lRWQLWe2I/hNJSweX1l8OHvecOBg+cYH G+8vb7mLU2V9ky0YT5JJmVxUV3CWA5wH6ZrWWy1ofVDdeSFLPrhgWX6IMjaNq+Gd xPfxmly47uBviSqON9dMkiThgy0Qj7yi0Pvx+1sAZbD7aj/6A4qg3LX5 =GIX0 -----END PGP SIGNATURE----- Merge tag 'ceph-for-5.7-rc1' of git://github.com/ceph/ceph-client Pull ceph updates from Ilya Dryomov: "The main items are: - support for asynchronous create and unlink (Jeff Layton). Creates and unlinks are satisfied locally, without waiting for a reply from the MDS, provided the client has been granted appropriate caps (new in v15.y.z ("Octopus") release). This can be a big help for metadata heavy workloads such as tar and rsync. Opt-in with the new nowsync mount option. - multiple blk-mq queues for rbd (Hannes Reinecke and myself). When the driver was converted to blk-mq, we settled on a single blk-mq queue because of a global lock in libceph and some other technical debt. These have since been addressed, so allocate a queue per CPU to enhance parallelism. - don't hold onto caps that aren't actually needed (Zheng Yan). This has been our long-standing behavior, but it causes issues with some active/standby applications (synchronous I/O, stalls if the standby goes down, etc). - .snap directory timestamps consistent with ceph-fuse (Luis Henriques)" * tag 'ceph-for-5.7-rc1' of git://github.com/ceph/ceph-client: (49 commits) ceph: fix snapshot directory timestamps ceph: wait for async creating inode before requesting new max size ceph: don't skip updating wanted caps when cap is stale ceph: request new max size only when there is auth cap ceph: cleanup return error of try_get_cap_refs() ceph: return ceph_mdsc_do_request() errors from __get_parent() ceph: check all mds' caps after page writeback ceph: update i_requested_max_size only when sending cap msg to auth mds ceph: simplify calling of ceph_get_fmode() ceph: remove delay check logic from ceph_check_caps() ceph: consider inode's last read/write when calculating wanted caps ceph: always renew caps if mds_wanted is insufficient ceph: update dentry lease for async create ceph: attempt to do async create when possible ceph: cache layout in parent dir on first sync create ceph: add new MDS req field to hold delegated inode number ceph: decode interval_sets for delegated inos ceph: make ceph_fill_inode non-static ceph: perform asynchronous unlink if we have sufficient caps ceph: don't take refs to want mask unless we have all bits ... |
|
Linus Torvalds | c6b80eb89b |
overlayfs update for 5.7
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCXosx9gAKCRDh3BK/laaZ PPW3AQDYShgXR4gqVL25lxJ1IKaIikIug8EC7S9/a3DMNMpPQAEA+kCrdkpnaJVc DESmHyMDw7EtgeZXu+xkRlFfEU8BOgc= =0WQh -----END PGP SIGNATURE----- Merge tag 'ovl-update-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs update from Miklos Szeredi: - Fix failure to copy-up files from certain NFSv4 mounts - Sort out inconsistencies between st_ino and i_ino (used in /proc/locks) - Allow consistent (POSIX-y) inode numbering in more cases - Allow virtiofs to be used as upper layer - Miscellaneous cleanups and fixes * tag 'ovl-update-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: document xino expected behavior ovl: enable xino automatically in more cases ovl: avoid possible inode number collisions with xino=on ovl: use a private non-persistent ino pool ovl: fix WARN_ON nlink drop to zero ovl: fix a typo in comment ovl: replace zero-length array with flexible-array member ovl: ovl_obtain_alias(): don't call d_instantiate_anon() for old ovl: strict upper fs requirements for remote upper fs ovl: check if upper fs supports RENAME_WHITEOUT ovl: allow remote upper ovl: decide if revalidate needed on a per-dentry basis ovl: separate detection of remote upper layer from stacked overlay ovl: restructure dentry revalidation ovl: ignore failure to copy up unknown xattrs ovl: document permission model ovl: simplify i_ino initialization ovl: factor out helper ovl_get_root() ovl: fix out of date comment and unreachable code ovl: fix value of i_ino for lower hardlink corner case |
|
Linus Torvalds | 9744b923d5 |
Bug fixes for 5.7:
- Fix a problem in readahead where we can crash if we can't allocate a full bio due to GFP_NORETRY. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl6GDloACgkQ+H93GTRK tOuH4g/+J95SMTPKVoChOwJ/vMS7VsPCrbj7GTgDe0LKtaRq5/dQ1fTd3QuSqAh9 0KqCCvAb34y7oP58SccBlr11lRBsJPuWKVJAbiH02Hs+r+mWC+pbWtQz6PCez9wl hLN44jfsuLHPwivf3oHrmw8B5SHdFjkm659ocLseOJCI2tyQO0kZ8AUSEt22gx8r w/Z9pX2NCy7Bhzxpd5jvzh7xY27TTcI2M+i/FW9R4hgznM22Z2pZ8MoKTvDj5gly 20UljmP0sTRgMOMavXs7ME2hHwXDitTIkmKD2JZjNkoeKxaRl4iz+9T0E4xZvdoZ lH+bbUikDocWlOn9zy9+VNBGqtLLKfBp8kOO/I1q1fkW3G/abEPqGvke2ehSmqFx 65MWReBOS0L+g7A0KakHdJVKRuuK3lvXVls+a/2lzTjvjojXCBfy1YCTvhOP2dhT qO3d2ZPBEr+l0VKwLK4QQHz6/1zfJdUy2rFZZnagRUigKc9xW7oXXwkXRHd0sMRv KPOLwxeQu5TSf8hVSH9Z6M53TyEQXX2N0Kifr2GUFXkwoxY91I+hFnuGMSafoKTz CqjOJGNiBFM8ypT/FYRKSakZXEju6zHfyD0tQ0hnEAb2A2HCOJdGqejqbJRZywW2 OM+loTn80spDyuoNpat3/bJBxQLAWX7Dq7m6zL1rkcQLtKxoxik= =5vQD -----END PGP SIGNATURE----- Merge tag 'iomap-5.7-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull iomap fix from Darrick Wong: "Fix a problem in readahead where we can crash if we can't allocate a full bio due to GFP_NORETRY" * tag 'iomap-5.7-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: Handle memory allocation failure in readahead |
|
Linus Torvalds | 9b06860d7c |
libnvdimm for 5.7
- Add support for region alignment configuration and enforcement to fix compatibility across architectures and PowerPC page size configurations. - Introduce 'zero_page_range' as a dax operation. This facilitates filesystem-dax operation without a block-device. - Introduce phys_to_target_node() to facilitate drivers that want to know resulting numa node if a given reserved address range was onlined. - Advertise a persistence-domain for of_pmem and papr_scm. The persistence domain indicates where cpu-store cycles need to reach in the platform-memory subsystem before the platform will consider them power-fail protected. - Promote numa_map_to_online_node() to a cross-kernel generic facility. - Save x86 numa information to allow for node-id lookups for reserved memory ranges, deploy that capability for the e820-pmem driver. - Pick up some miscellaneous minor fixes, that missed v5.6-final, including a some smatch reports in the ioctl path and some unit test compilation fixups. - Fixup some flexible-array declarations. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEf41QbsdZzFdA8EfZHtKRamZ9iAIFAl6LtIAACgkQHtKRamZ9 iAIwRA/8CLVVuQpgHQ1tqK4h8CZPrISFXh7wy7uhocEU2xrDh6iGVnLztmoLRr2k 5f8T9lRzreSAwIVL5DbGqP1pFncqIt9VMnKsFlaPMBGCBNR+hURY0iBCNjIT+jiq BOzLd52MR2rqJxeXGTMUbWrBrbmuj4mZPdmGVuFFe7GFRpoaVpCgOo+296eWa/ot gIOFUTonZY7STYjNvDok0TXCmiCFuJb+P+y5ldfCPShHvZhTiaF53jircja8vAjO G5dt8ixBKUK0rXRc4SEQsQhAZNcAFHb6Gy5lg4C2QzhTF374xTc9usJZNWbIE9iM 5mipBYvjVuoY+XaCNZDkaRcJIy/jqB15O6l3QIWbZLGaK9m95YPp9LmkPFwd3JpO e3rO24ML471DxqB9iWIiJCNcBBocLOlnd6qAQTpppWDpGNbudwXvfsmKHmKIScSE x+IDCdscLmmm+WG2dLmLraWOVPu42xZFccoQCi4M3TTqfeB9pZ9XckFQ37zX62zG 5t+7Ek+t1W4QVt/JQYVKH03XT15sqUpVknvx0Hl4Y5TtbDOkFLkO8RN0/HyExDef 7iegS35kqTsM4EfZQ+9juKbI2JBAjHANcbj0V4dogqaRj6vr3akumBzUtuYqAofv qU3s9skmLsEemOJC+ns2PT8vl5dyIoeDfH0r2XvGWxYqolMqJpA= =sY4N -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm and dax updates from Dan Williams: "There were multiple touches outside of drivers/nvdimm/ this round to add cross arch compatibility to the devm_memremap_pages() interface, enhance numa information for persistent memory ranges, and add a zero_page_range() dax operation. This cycle I switched from the patchwork api to Konstantin's b4 script for collecting tags (from x86, PowerPC, filesystem, and device-mapper folks), and everything looks to have gone ok there. This has all appeared in -next with no reported issues. Summary: - Add support for region alignment configuration and enforcement to fix compatibility across architectures and PowerPC page size configurations. - Introduce 'zero_page_range' as a dax operation. This facilitates filesystem-dax operation without a block-device. - Introduce phys_to_target_node() to facilitate drivers that want to know resulting numa node if a given reserved address range was onlined. - Advertise a persistence-domain for of_pmem and papr_scm. The persistence domain indicates where cpu-store cycles need to reach in the platform-memory subsystem before the platform will consider them power-fail protected. - Promote numa_map_to_online_node() to a cross-kernel generic facility. - Save x86 numa information to allow for node-id lookups for reserved memory ranges, deploy that capability for the e820-pmem driver. - Pick up some miscellaneous minor fixes, that missed v5.6-final, including a some smatch reports in the ioctl path and some unit test compilation fixups. - Fixup some flexible-array declarations" * tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (29 commits) dax: Move mandatory ->zero_page_range() check in alloc_dax() dax,iomap: Add helper dax_iomap_zero() to zero a range dax: Use new dax zero page method for zeroing a page dm,dax: Add dax zero_page_range operation s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver dax, pmem: Add a dax operation zero_page_range pmem: Add functions for reading/writing page to/from pmem libnvdimm: Update persistence domain value for of_pmem and papr_scm device tools/test/nvdimm: Fix out of tree build libnvdimm/region: Fix build error libnvdimm/region: Replace zero-length array with flexible-array member libnvdimm/label: Replace zero-length array with flexible-array member ACPI: NFIT: Replace zero-length array with flexible-array member libnvdimm/region: Introduce an 'align' attribute libnvdimm/region: Introduce NDD_LABELING libnvdimm/namespace: Enforce memremap_compat_align() libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid libnvdimm: Out of bounds read in __nd_ioctl() acpi/nfit: improve bounds checking for 'func' mm/memremap_pages: Introduce memremap_compat_align() ... |
|
Pavel Begunkov | 9c280f9087 |
io_uring: don't read user-shared sqe flags twice
Don't re-read userspace-shared sqe->flags, it can be exploited. sqe->flags are copied into req->flags in io_submit_sqe(), check them there instead. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
|
Pavel Begunkov | 0553b8bda8 |
io_uring: remove req init from io_get_req()
io_get_req() do two different things: io_kiocb allocation and initialisation. Move init part out of it and rename into io_alloc_req(). It's simpler this way and also have better data locality. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
|
Pavel Begunkov | b1e50e549b |
io_uring: alloc req only after getting sqe
As io_get_sqe() split into 2 stage get/consume, get an sqe before allocating io_kiocb, so no free_req*() for a failure case is needed, and inline back __io_req_do_free(), which has only 1 user. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
|
Pavel Begunkov | 709b302fad |
io_uring: simplify io_get_sqring
Make io_get_sqring() care only about sqes themselves, not initialising the io_kiocb. Also, split it into get + consume, that will be helpful in the future. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
|
Xiaoguang Wang | 45097daea2 |
io_uring: do not always copy iovec in io_req_map_rw()
In io_read_prep() or io_write_prep(), io_req_map_rw() takes struct io_async_rw's fast_iov as argument to call io_import_iovec(), and if io_import_iovec() uses struct io_async_rw's fast_iov as valid iovec array, later indeed io_req_map_rw() does not need to do the memcpy operation, because they are same pointers. Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
|
Jens Axboe | 08a1d26eb8 |
io_uring: ensure openat sets O_LARGEFILE if needed
OPENAT2 correctly sets O_LARGEFILE if it has to, but that escaped the
OPENAT opcode. Dmitry reports that his test case that compares openat()
and IORING_OP_OPENAT sees failures on large files:
*** sync openat
openat succeeded
sync write at offset 0
write succeeded
sync write at offset 4294967296
write succeeded
*** sync openat
openat succeeded
io_uring write at offset 0
write succeeded
io_uring write at offset 4294967296
write succeeded
*** io_uring openat
openat succeeded
sync write at offset 0
write succeeded
sync write at offset 4294967296
write failed: File too large
*** io_uring openat
openat succeeded
io_uring write at offset 0
write succeeded
io_uring write at offset 4294967296
write failed: File too large
Ensure we set O_LARGEFILE, if force_o_largefile() is true.
Cc: stable@vger.kernel.org # v5.6
Fixes:
|
|
Mike Marshall | 0e393a9a8f |
orangefs: don't mess with I_DIRTY_TIMES in orangefs_flush
Christoph Hellwig noticed that we were doing some unnecessary work in orangefs_flush: orangefs_flush just writes out data on every close(2) call. There is no need to change anything about the dirty state, especially as orangefs doesn't treat I_DIRTY_TIMES special in any way. The code seems to come from partially open coding vfs_fsync. He sent in a patch with the above commit message and also a patch that was a reversion of another Orangefs patch I had sent upstream a while ago. I had to fix his reversion patch so that it would compile which caused his "don't mess with I_DIRTY_TIMES" patch to fail to apply. So here I have just remade his patch and applied it after the fixed reversion patch. Signed-off-by: Mike Marshall <hubcap@omnibond.com> |
|
Mike Marshall | ec95f1dedc |
orangefs: get rid of knob code...
Christoph Hellwig sent in a reversion of "orangefs: remember count when reading." because: ->read_iter calls can race with each other and one or more ->flush calls. Remove the the scheme to store the read count in the file private data as is is completely racy and can cause use after free or double free conditions Christoph's reversion caused Orangefs not to work or to compile. I added a patch that fixed that, but intel's kbuild test robot pointed out that sending Christoph's patch followed by my patch upstream, it would break bisection because of the failure to compile. So I have combined the reversion plus my patch... here's the commit message that was in my patch: Logically, optimal Orangefs "pages" are 4 megabytes. Reading large Orangefs files 4096 bytes at a time is like trying to kick a dead whale down the beach. Before Christoph's "Revert orangefs: remember count when reading." I tried to give users a knob whereby they could, for example, use "count" in read(2) or bs with dd(1) to get whatever they considered an appropriate amount of bytes at a time from Orangefs and fill as many page cache pages as they could at once. Without the racy code that Christoph reverted Orangefs won't even compile, much less work. So this replaces the logic that used the private file data that Christoph reverted with a static number of bytes to read from Orangefs. I ran tests like the following to determine what a reasonable static number of bytes might be: dd if=/pvfsmnt/asdf of=/dev/null count=128 bs=4194304 dd if=/pvfsmnt/asdf of=/dev/null count=256 bs=2097152 dd if=/pvfsmnt/asdf of=/dev/null count=512 bs=1048576 . . . dd if=/pvfsmnt/asdf of=/dev/null count=4194304 bs=128 Reads seem faster using the static number, so my "knob code" wasn't just racy, it wasn't even a good idea... Signed-off-by: Mike Marshall <hubcap@omnibond.com> Reported-by: kbuild test robot <lkp@intel.com> |
|
Linus Torvalds | 63bef48fd6 |
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton: - a lot more of MM, quite a bit more yet to come: (memcg, pagemap, vmalloc, pagealloc, migration, thp, ksm, madvise, virtio, userfaultfd, memory-hotplug, shmem, rmap, zswap, zsmalloc, cleanups) - various other subsystems (procfs, misc, MAINTAINERS, bitops, lib, checkpatch, epoll, binfmt, kallsyms, reiserfs, kmod, gcov, kconfig, ubsan, fault-injection, ipc) * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (158 commits) ipc/shm.c: make compat_ksys_shmctl() static ipc/mqueue.c: fix a brace coding style issue lib/Kconfig.debug: fix a typo "capabilitiy" -> "capability" ubsan: include bug type in report header kasan: unset panic_on_warn before calling panic() ubsan: check panic_on_warn drivers/misc/lkdtm/bugs.c: add arithmetic overflow and array bounds checks ubsan: split "bounds" checker from other options ubsan: add trap instrumentation option init/Kconfig: clean up ANON_INODES and old IO schedulers options kernel/gcov/fs.c: replace zero-length array with flexible-array member gcov: gcc_3_4: replace zero-length array with flexible-array member gcov: gcc_4_7: replace zero-length array with flexible-array member kernel/kmod.c: fix a typo "assuems" -> "assumes" reiserfs: clean up several indentation issues kallsyms: unexport kallsyms_lookup_name() and kallsyms_on_each_symbol() samples/hw_breakpoint: drop use of kallsyms_lookup_name() samples/hw_breakpoint: drop HW_BREAKPOINT_R when reporting writes fs/binfmt_elf.c: don't free interpreter's ELF pheaders on common path fs/binfmt_elf.c: allocate less for static executable ... |
|
Linus Torvalds | 04de788e61 |
NFS client updates for Linux 5.7
Highlights include: Stable fixes: - Fix a page leak in nfs_destroy_unlinked_subrequests() - Fix use-after-free issues in nfs_pageio_add_request() - Fix new mount code constant_table array definitions - finish_automount() requires us to hold 2 refs to the mount record Features: - Improve the accuracy of telldir/seekdir by using 64-bit cookies when possible. - Allow one RDMA active connection and several zombie connections to prevent blocking if the remote server is unresponsive. - Limit the size of the NFS access cache by default - Reduce the number of references to credentials that are taken by NFS - pNFS files and flexfiles drivers now support per-layout segment COMMIT lists. - Enable partial-file layout segments in the pNFS/flexfiles driver. - Add support for CB_RECALL_ANY to the pNFS flexfiles layout type - pNFS/flexfiles Report NFS4ERR_DELAY and NFS4ERR_GRACE errors from the DS using the layouterror mechanism. Bugfixes and cleanups: - SUNRPC: Fix krb5p regressions - Don't specify NFS version in "UDP not supported" error - nfsroot: set tcp as the default transport protocol - pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid() - alloc_nfs_open_context() must use the file cred when available - Fix locking when dereferencing the delegation cred - Fix memory leaks in O_DIRECT when nfs_get_lock_context() fails - Various clean ups of the NFS O_DIRECT commit code - Clean up RDMA connect/disconnect - Replace zero-length arrays with C99-style flexible arrays -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl6LhhsACgkQZwvnipYK APIOJxAAiQOgmIg1CV4mrlcVhkwy09N5JAia6AENtoTmwm08nAYg5Y8REb9uX46a /MJsM2WG8hBCgI6eYmRY8LTr4Ft9rTQEJM9DRMuwQREXwMWwBhUv/QakCeqY1lHE lyB1z4hj5XKeUoN/OcfALC/GXFFf56A0UyN05nMzeCkBTdd3+qu+hW8Ge1wkAXcr f0pyLbzdFZlJuTmI4tr8F93g9p3ezuFBuEroT7XPIVJylAdZVumHqnOnz/Mvb99x rNTsX2dc44GhSAfRnTzPumU3MT6BOLvUzNH1xzdiqKzJrbOnG8WjFodrGr3JWpfp HkeyYQxJ+Hnfb2LiZBjvMQE8M7kVMZ1jVbrGJEbCxfSqgTly8lOHboqAeKsFaReK LStnusizdA1LHQVZxPdvn+oL49RDxnzm9dY+DkrXK1qT0GE+icN1CyTyLLfkSCp8 tYvZSJ/qPk5BNZegqH1nBqXkMDkOJ4eEA7+luXDmajRkdRrZ3IWY2M1DpMEoueJ2 j/zoj/NFr1oErU4o7PV9oolA1Euhn1L3wIDuzsbVtjySmbXJNQTtaVVRFpGw3SsZ 7rbqi4BB0SzOooNhQ4q8mLNi4qT7bl/3D04eL8UVzEM73plexhQ8XiOEz/VrIRX7 L9viXH49g4DHQ0rZIaWefxFueqpgbNvQwnlLZl2uQotG9hwhTts= =YUcP -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - Fix a page leak in nfs_destroy_unlinked_subrequests() - Fix use-after-free issues in nfs_pageio_add_request() - Fix new mount code constant_table array definitions - finish_automount() requires us to hold 2 refs to the mount record Features: - Improve the accuracy of telldir/seekdir by using 64-bit cookies when possible. - Allow one RDMA active connection and several zombie connections to prevent blocking if the remote server is unresponsive. - Limit the size of the NFS access cache by default - Reduce the number of references to credentials that are taken by NFS - pNFS files and flexfiles drivers now support per-layout segment COMMIT lists. - Enable partial-file layout segments in the pNFS/flexfiles driver. - Add support for CB_RECALL_ANY to the pNFS flexfiles layout type - pNFS/flexfiles Report NFS4ERR_DELAY and NFS4ERR_GRACE errors from the DS using the layouterror mechanism. Bugfixes and cleanups: - SUNRPC: Fix krb5p regressions - Don't specify NFS version in "UDP not supported" error - nfsroot: set tcp as the default transport protocol - pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid() - alloc_nfs_open_context() must use the file cred when available - Fix locking when dereferencing the delegation cred - Fix memory leaks in O_DIRECT when nfs_get_lock_context() fails - Various clean ups of the NFS O_DIRECT commit code - Clean up RDMA connect/disconnect - Replace zero-length arrays with C99-style flexible arrays" * tag 'nfs-for-5.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (86 commits) NFS: Clean up process of marking inode stale. SUNRPC: Don't start a timer on an already queued rpc task NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn() NFS/pnfs: Fix dereference of layout cred in pnfs_layoutcommit_inode() NFS: Beware when dereferencing the delegation cred NFS: Add a module parameter to set nfs_mountpoint_expiry_timeout NFS: finish_automount() requires us to hold 2 refs to the mount record NFS: Fix a few constant_table array definitions NFS: Try to join page groups before an O_DIRECT retransmission NFS: Refactor nfs_lock_and_join_requests() NFS: Reverse the submission order of requests in __nfs_pageio_add_request() NFS: Clean up nfs_lock_and_join_requests() NFS: Remove the redundant function nfs_pgio_has_mirroring() NFS: Fix memory leaks in nfs_pageio_stop_mirroring() NFS: Fix a request reference leak in nfs_direct_write_clear_reqs() NFS: Fix use-after-free issues in nfs_pageio_add_request() NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests() NFS: Fix a page leak in nfs_destroy_unlinked_subrequests() NFS: Remove unused FLUSH_SYNC support in nfs_initiate_pgio() pNFS/flexfiles: Specify the layout segment range in LAYOUTGET ... |
|
Linus Torvalds | f40f31cadc |
f2fs-for-5.7-rc1
In this round, we've mainly focused on fixing bugs and addressing issues in recently introduced compression support. Enhancement: - add zstd support, and set LZ4 by default - add ioctl() to show # of compressed blocks - show mount time in debugfs - replace rwsem with spinlock - avoid lock contention in DIO reads Some major bug fixes wrt compression: - compressed block count - memory access and leak - remove obsolete fields - flag controls Other bug fixes and clean ups: - fix overflow when handling .flags in inode_info - fix SPO issue during resize FS flow - fix compression with fsverity enabled - potential deadlock when writing compressed pages - show missing mount options -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAl6L5f0ACgkQQBSofoJI UNImoQ/7BHKpwgpgH/DuydO9ess0XuUgQPQxyj+LE79l0jdBo8FxQZJVNAekx2+h ANTDHjsgqry6xuczOJXzFhECoOrCqZuffrkQM1p3owfzOH9Wrx6aiOomFBJyk/WB kXAY7LDPUwGF5uDB8tvVhM082qLXOlP0coO57f9Ip/OpaG8YOkti+KbcOKJrJ9o3 63IAzu89D/9XJw8834rDSersiEenSJm3jLC12uOxgXMzjDb1Ul1JH0P51gEu6g6O 3df5tiGJcFfaKVVWPHG+UEfav6mvi28+zU9f+dSmL0Wvb8dIqSgUf6ty8KCuRBYw kQYi+E0G1dcGi99AitCOrFxzY+oEo/A5wq2HDI6RkNlu6krD8qYNyWcMbP7dNHnU /5BVN+5d78iR1vKZpTo4X6dojf6B21Tn/OmABSZ5d7B6pI2fY5bjy2WgSLSY7YvP A6sepb9RAAGvpKvvkHI7gYwDKFMel+vfD6em1SKH5iKaDC0rJTUDUy8PTz/qMPBS vMn396dLx+TzTa0dZUuSF8NNk6sPZEReC3AuMNAIPSKiuD7tatRxvutHeEg5ktrr ggOQB67MfKjPMBKmgMIm6XMuILcCGIB1MqbPRlyKtC6rjdMPIKKOfeHJlLFmYwfF gqvCIFlW4DlxHpHH+LbUFKoUA3zofltL91SHUVATJjmiZIT2pqQ= =FVxq -----END PGP SIGNATURE----- Merge tag 'f2fs-for-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've mainly focused on fixing bugs and addressing issues in recently introduced compression support. Enhancement: - add zstd support, and set LZ4 by default - add ioctl() to show # of compressed blocks - show mount time in debugfs - replace rwsem with spinlock - avoid lock contention in DIO reads Some major bug fixes wrt compression: - compressed block count - memory access and leak - remove obsolete fields - flag controls Other bug fixes and clean ups: - fix overflow when handling .flags in inode_info - fix SPO issue during resize FS flow - fix compression with fsverity enabled - potential deadlock when writing compressed pages - show missing mount options" * tag 'f2fs-for-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (66 commits) f2fs: keep inline_data when compression conversion f2fs: fix to disable compression on directory f2fs: add missing CONFIG_F2FS_FS_COMPRESSION f2fs: switch discard_policy.timeout to bool type f2fs: fix to verify tpage before releasing in f2fs_free_dic() f2fs: show compression in statx f2fs: clean up dic->tpages assignment f2fs: compress: support zstd compress algorithm f2fs: compress: add .{init,destroy}_decompress_ctx callback f2fs: compress: fix to call missing destroy_compress_ctx() f2fs: change default compression algorithm f2fs: clean up {cic,dic}.ref handling f2fs: fix to use f2fs_readpage_limit() in f2fs_read_multi_pages() f2fs: xattr.h: Make stub helpers inline f2fs: fix to avoid double unlock f2fs: fix potential .flags overflow on 32bit architecture f2fs: fix NULL pointer dereference in f2fs_verity_work() f2fs: fix to clear PG_error if fsverity failed f2fs: don't call fscrypt_get_encryption_info() explicitly in f2fs_tmpfile() f2fs: don't trigger data flush in foreground operation ... |
|
Linus Torvalds | 763dede1b2 |
This pull request contains fixes for UBI and UBIFS:
- Fix for memory leaks around UBIFS orphan handling - Fix for memory leaks around UBI fastmap - Remove zero-length array from ubi-media.h - Fix for TNC lookup in UBIFS orphan code -----BEGIN PGP SIGNATURE----- iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAl6McBEWHHJpY2hhcmRA c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wbxzD/0RXX1RmnLUtyWvurC+Ge+wJR9k T/LvMrT5TAUpbXsDTFmbhQJiWlnu70Pg2Gcqr+yXz/YboAr1+G0G3SonC1TYWNEe 8aGsWMGeuiB1GTfrGJgyj3M/YxRUYHgRz75mUXZ0kQkraWYbIuKDpudsSAVcLV5Q jO0JQeZmaCB0w3pvaP1CpXEl9+CXnCAnEoheUgAjC9THUPYeAvI3mRbrDJL50Ine 9PDAkP++LtYKRKS0tMhlDtbrjoFWMySmvOG6t7nkWhdj7IdLxCZuYgCV5CvUUiuv oBglozfzBEPp2sA4NlSghEg3zedrGSauDL/Ns3djPmpSWNJIuE5Q8bHZGO9msqR9 MDoPTuSirY3L4ARGpG6rXc0holGq9NniQpa1ArnWnSUtkIjz22dT1V41dEumquxm PganYebBWe3tjSmSc8Jm1xNKscwTPYr6kBApLcDWaxdASD56Q95raAVNJLkP/kKz USb6ZTHunyZQALgqKFgFMc8Nm8Eyu+sXikQscYgR0vJsYy0gaQy52/F3UG2gBpjP 9ULoHTilWU6cx07JdawgJNLfgj37ov3wlM5rW1G50TP7GpzAYlzPl7atozbyogii WeRA0AwM8LHcTggseqOtA3B8606giVlk5Iw2YJ7DkK1ArSAncThbb2rtm/PPyPWv JdG2bHXj3PnzSQfMDw== =RpJX -----END PGP SIGNATURE----- Merge tag 'for-linus-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull UBI and UBIFS updates from Richard Weinberger: - Fix for memory leaks around UBIFS orphan handling - Fix for memory leaks around UBI fastmap - Remove zero-length array from ubi-media.h - Fix for TNC lookup in UBIFS orphan code * tag 'for-linus-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubi: ubi-media.h: Replace zero-length array with flexible-array member ubifs: Fix out-of-bounds memory access caused by abnormal value of node_len ubi: fastmap: Only produce the initial anchor PEB when fastmap is used ubi: fastmap: Free unused fastmap anchor peb during detach ubifs: ubifs_add_orphan: Fix a memory leak bug ubifs: ubifs_jnl_write_inode: Fix a memory leak bug ubifs: Fix ubifs_tnc_lookup() usage in do_kill_orphans() |
|
Linus Torvalds | 762a9f2f01 |
This pull request contains the following changes for UML:
- New mode for time travel, external via virtio - Fixes for ubd to make sure no requests can get lost - Fixes for vector networking - Allow CONFIG_STATIC_LINK only when possible - Minor cleanups and fixes -----BEGIN PGP SIGNATURE----- iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAl6MbGYWHHJpY2hhcmRA c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wSY2D/4k1kb3A5pZ6OEXCkKmRU63j0RC na0bsa4lztMuABgOWKXP09cqL2ZhJ1rVVRUMV7jgVFKj7rKkJHHGHgdBeEkXOcb8 skOVxln1X/i3T9q9QQ4ofkSk0U8gHCZA3pqrn7TFI9ZmrosOUYwhQKkqcNHvSfPc XEjKUx1GCS+wA0mw5yLyDZqDGkZgMNSmNezR7Oq3EB9wi8K2n6Racn6//S/uqiS6 I8HHE7R2ci0YfflP+xE8i1qg8/TY2wj2oCP33b9o/XefyyNSndVj7KQUI3KRBmSh M0k2sbOqegVzSH/l5YFIZ7zbDcqkYeGWopPIuYWo3en7ZmfJfP2KD31c8gPOuElC HuUvQyS1VDpLn6JBa8Y456e8IrKl/QquXfZDc2qG5HYTR6g9nv9y8VNtx4dSQ+sB AfgErKofx7x2JQNRfg+0BYKgw/MawGAjiSZm5qVNfvFM3YDWZSUZ9gEAcX6qto/z P+66Zrhatdt9TaQdy9vbQKDWSJk9ood2mQYU0JJSfzgsotWslyvCsc6ANtwfkc7R sLxnsa6EA7CYogbMJ7wRxD5spCNZrRZvepHhe5uft/nWG/qGM1jy7Vk16Or03sVH sScIp6m+yDyhhEjJOT8Mq6WbM3mIfILMb42FyDJQIpJ9JcXSxzbiZu7RSK38yoEG +WYGOYdTGgzxIWsRmQ== =WVcL -----END PGP SIGNATURE----- Merge tag 'for-linus-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Richard Weinberger: - New mode for time travel, external via virtio - Fixes for ubd to make sure no requests can get lost - Fixes for vector networking - Allow CONFIG_STATIC_LINK only when possible - Minor cleanups and fixes * tag 'for-linus-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: Remove some unnecessary NULL checks in vector_user.c um: vector: Avoid NULL ptr deference if transport is unset um: Make CONFIG_STATIC_LINK actually static um: Implement cpu_relax() as ndelay(1) for time-travel um: Implement ndelay/udelay in time-travel mode um: Implement time-travel=ext um: virtio: Implement VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS um: time-travel: Rewrite as an event scheduler um: Move timer-internal.h to non-shared hostfs: Use kasprintf() instead of fixed buffer formatting um: falloc.h needs to be directly included for older libc um: ubd: Retry buffer read on any kind of error um: ubd: Prevent buffer overrun on command completion um: Fix overlapping ELF segments when statically linked um: Delete never executed timer um: Don't overwrite ethtool driver version um: Fix len of file in create_pid_file um: Don't use console_drivers directly um: Cleanup CONFIG_IOSCHED_CFQ |
|
Steve French | 2bcb4fd6ba |
smb3: smbdirect support can be configured by default
smbdirect support (SMB3 over RDMA) should be enabled by default in many configurations. It is not experimental and is stable enough and has enough performance benefits to recommend that it be configured by default. Change the "If unsure N" to "If unsure Y" in the description of the configuration parameter. Acked-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> |
|
Colin Ian King | 5404e7e0ac |
reiserfs: clean up several indentation issues
There are several places where code is indented incorrectly. Fix these. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200325135018.113431-1-colin.king@canonical.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Alexey Dobriyan | aa0d1564b1 |
fs/binfmt_elf.c: don't free interpreter's ELF pheaders on common path
Static executables don't need to free NULL pointer. It doesn't matter really because static executable is not common scenario but do it anyway out of pedantry. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200219185330.GA4933@avx2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Alexey Dobriyan | 0693ffebcf |
fs/binfmt_elf.c: allocate less for static executable
PT_INTERP ELF header can be spared if executable is static. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200219185012.GB4871@avx2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Alexey Dobriyan | c69bcc932e |
fs/binfmt_elf.c: delete "loc" variable
"loc" variable became just a wrapper for PT_INTERP ELF header after main ELF header was moved to "bprm->buf". Delete it. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200219184847.GA4871@avx2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Jason Baron | efcdd350d1 |
fs/epoll: make nesting accounting safe for -rt kernel
Davidlohr Bueso pointed out that when CONFIG_DEBUG_LOCK_ALLOC is set ep_poll_safewake() can take several non-raw spinlocks after disabling interrupts. Since a spinlock can block in the -rt kernel, we can't take a spinlock after disabling interrupts. So let's re-work how we determine the nesting level such that it plays nicely with the -rt kernel. Let's introduce a 'nests' field in struct eventpoll that records the current nesting level during ep_poll_callback(). Then, if we nest again we can find the previous struct eventpoll that we were called from and increase our count by 1. The 'nests' field is protected by ep->poll_wait.lock. I've also moved the visited field to reduce the size of struct eventpoll from 184 bytes to 176 bytes on x86_64 for !CONFIG_DEBUG_LOCK_ALLOC, which is typical for a production config. Reported-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Cc: Roman Penyaev <rpenyaev@suse.de> Cc: Eric Wong <normalperson@yhbt.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: http://lkml.kernel.org/r/1582739816-13167-1-git-send-email-jbaron@akamai.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Matthew Wilcox (Oracle) | fad955009c |
proc: inline m_next_vma into m_next
It's clearer to just put this inline. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200317193201.9924-5-adobriyan@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Matthew Wilcox (Oracle) | b829a0f0f2 |
seq_file: remove m->version
The process maps file was the only user of version (introduced back in 2005). Now that it uses ppos instead, we can remove it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200317193201.9924-4-adobriyan@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Matthew Wilcox (Oracle) | 4781f2c3ab |
proc: use ppos instead of m->version
The ppos is a private cursor, just like m->version. Use the canonical cursor, not a special one. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200317193201.9924-3-adobriyan@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Matthew Wilcox (Oracle) | c2e88d22e8 |
proc: remove m_cache_vma
Instead of setting m->version in the show method, set it in m_next(), where it should be. Also remove the fallback code for failing to find a vma, or version being zero. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200317193201.9924-2-adobriyan@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Matthew Wilcox (Oracle) | d07ded611e |
proc: inline vma_stop into m_stop
Instead of calling vma_stop() from m_start() and m_next(), do its work in m_stop(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200317193201.9924-1-adobriyan@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Alexey Dobriyan | 5c5ab9714c |
proc: speed up /proc/*/statm
top(1) reads all /proc/*/statm files but kernel threads will always have zeros. Print those zeroes directly without going through seq_put_decimal_ull(). Speed up reading /proc/2/statm (which is kthreadd) is like 3%. My system has more kernel threads than normal processes after booting KDE. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200307154435.GA2788@avx2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Alexey Dobriyan | d919b33daf |
proc: faster open/read/close with "permanent" files
Now that "struct proc_ops" exist we can start putting there stuff which could not fly with VFS "struct file_operations"... Most of fs/proc/inode.c file is dedicated to make open/read/.../close reliable in the event of disappearing /proc entries which usually happens if module is getting removed. Files like /proc/cpuinfo which never disappear simply do not need such protection. Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such "permanent" files. Enable "permanent" flag for /proc/cpuinfo /proc/kmsg /proc/modules /proc/slabinfo /proc/stat /proc/sysvipc/* /proc/swaps More will come once I figure out foolproof way to prevent out module authors from marking their stuff "permanent" for performance reasons when it is not. This should help with scalability: benchmark is "read /proc/cpuinfo R times by N threads scattered over the system". N R t, s (before) t, s (after) ----------------------------------------------------- 64 4096 1.582458 1.530502 -3.2% 256 4096 6.371926 6.125168 -3.9% 1024 4096 25.64888 24.47528 -4.6% Benchmark source: #include <chrono> #include <iostream> #include <thread> #include <vector> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> const int NR_CPUS = sysconf(_SC_NPROCESSORS_ONLN); int N; const char *filename; int R; int xxx = 0; int glue(int n) { cpu_set_t m; CPU_ZERO(&m); CPU_SET(n, &m); return sched_setaffinity(0, sizeof(cpu_set_t), &m); } void f(int n) { glue(n % NR_CPUS); while (*(volatile int *)&xxx == 0) { } for (int i = 0; i < R; i++) { int fd = open(filename, O_RDONLY); char buf[4096]; ssize_t rv = read(fd, buf, sizeof(buf)); asm volatile ("" :: "g" (rv)); close(fd); } } int main(int argc, char *argv[]) { if (argc < 4) { std::cerr << "usage: " << argv[0] << ' ' << "N /proc/filename R "; return 1; } N = atoi(argv[1]); filename = argv[2]; R = atoi(argv[3]); for (int i = 0; i < NR_CPUS; i++) { if (glue(i) == 0) break; } std::vector<std::thread> T; T.reserve(N); for (int i = 0; i < N; i++) { T.emplace_back(f, i); } auto t0 = std::chrono::system_clock::now(); { *(volatile int *)&xxx = 1; for (auto& t: T) { t.join(); } } auto t1 = std::chrono::system_clock::now(); std::chrono::duration<double> dt = t1 - t0; std::cout << dt.count() << ' '; return 0; } P.S.: Explicit randomization marker is added because adding non-function pointer will silently disable structure layout randomization. [akpm@linux-foundation.org: coding style fixes] Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Joe Perches <joe@perches.com> Link: http://lkml.kernel.org/r/20200222201539.GA22576@avx2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Jules Irenge | 904f394e2e |
fs/proc/inode.c: annotate close_pdeo() for sparse
Fix sparse locking imbalance warning: warning: context imbalance in close_pdeo() - unexpected unlock Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200227201538.GA30462@avx2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |