linux

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	c45e8bccec	- Document DM integrity allow_discard feature that was added during 5.7 merge window. - Fix potential for DM writecache data corruption during DM table reloads. - Fix DM verity's FEC support's hash block number calculation in verity_fec_decode(). - Fix bio-based DM multipath crash due to use of stale copy of MPATHF_QUEUE_IO flag state in __map_bio(). -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl6rSn4THHNuaXR6ZXJA cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWjmFB/9kwi4L5SQTg81ak0jmiO4lNSxqvc8s ZJUyT8m/Iheh6FWo6kZYyfN+YVPPwXRFDWk4TGSrKyoqElTbpxkkmVxE7nhp9Z+O 1f7cfD8vCsTexHxYjxx3DxW529YVjvccifhaJFQgCA3II8+0to9PFzc7v6JpFBGS 1CWd169OUDHe2XGNmah0lbgwEgb7ZQRn9MNrkdYu3L9HihNLs8h4uin390FlfhSj +/rS89jkA9X5MhFnspKrX0wGl3qoQBzFFvvn4KKPcH8EPt5zsUPVo+/rTOBL8Tr8 VXDYKco4CDfuqzB5LdHeTi3mKwGq+fpPmwzRTk+/gosDZHA49m7mLKGB =cr8K -----END PGP SIGNATURE----- Merge tag 'for-5.7/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Document DM integrity allow_discard feature that was added during 5.7 merge window. - Fix potential for DM writecache data corruption during DM table reloads. - Fix DM verity's FEC support's hash block number calculation in verity_fec_decode(). - Fix bio-based DM multipath crash due to use of stale copy of MPATHF_QUEUE_IO flag state in __map_bio(). * tag 'for-5.7/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm multipath: use updated MPATHF_QUEUE_IO on mapping for bio-based mpath dm verity fec: fix hash block number in verity_fec_decode dm writecache: fix data corruption when reloading the target dm integrity: document allow_discard option	2020-04-30 16:45:08 -07:00
Gabriel Krisman Bertazi	5686dee34d	dm multipath: use updated MPATHF_QUEUE_IO on mapping for bio-based mpath When adding devices that don't have a scsi_dh on a BIO based multipath, I was able to consistently hit the warning below and lock-up the system. The problem is that __map_bio reads the flag before it potentially being modified by choose_pgpath, and ends up using the older value. The WARN_ON below is not trivially linked to the issue. It goes like this: The activate_path delayed_work is not initialized for non-scsi_dh devices, but we always set MPATHF_QUEUE_IO, asking for initialization. That is fine, since MPATHF_QUEUE_IO would be cleared in choose_pgpath. Nevertheless, only for BIO-based mpath, we cache the flag before calling choose_pgpath, and use the older version when deciding if we should initialize the path. Therefore, we end up trying to initialize the paths, and calling the non-initialized activate_path work. [ 82.437100] ------------[ cut here ]------------ [ 82.437659] WARNING: CPU: 3 PID: 602 at kernel/workqueue.c:1624 __queue_delayed_work+0x71/0x90 [ 82.438436] Modules linked in: [ 82.438911] CPU: 3 PID: 602 Comm: systemd-udevd Not tainted 5.6.0-rc6+ #339 [ 82.439680] RIP: 0010:__queue_delayed_work+0x71/0x90 [ 82.440287] Code: c1 48 89 4a 50 81 ff 00 02 00 00 75 2a 4c 89 cf e9 94 d6 07 00 e9 7f e9 ff ff 0f 0b eb c7 0f 0b 48 81 7a 58 40 74 a8 94 74 a7 <0f> 0b 48 83 7a 48 00 74 a5 0f 0b eb a1 89 fe 4c 89 cf e9 c8 c4 07 [ 82.441719] RSP: 0018:ffffb738803977c0 EFLAGS: 00010007 [ 82.442121] RAX: ffffa086389f9740 RBX: 0000000000000002 RCX: 0000000000000000 [ 82.442718] RDX: ffffa086350dd930 RSI: ffffa0863d76f600 RDI: 0000000000000200 [ 82.443484] RBP: 0000000000000200 R08: 0000000000000000 R09: ffffa086350dd970 [ 82.444128] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa086350dd930 [ 82.444773] R13: ffffa0863d76f600 R14: 0000000000000000 R15: ffffa08636738008 [ 82.445427] FS: 00007f6abfe9dd40(0000) GS:ffffa0863dd80000(0000) knlGS:00000 [ 82.446040] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 82.446478] CR2: 0000557d288db4e8 CR3: 0000000078b36000 CR4: 00000000000006e0 [ 82.447104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 82.447561] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 82.448012] Call Trace: [ 82.448164] queue_delayed_work_on+0x6d/0x80 [ 82.448472] __pg_init_all_paths+0x7b/0xf0 [ 82.448714] pg_init_all_paths+0x26/0x40 [ 82.448980] __multipath_map_bio.isra.0+0x84/0x210 [ 82.449267] __map_bio+0x3c/0x1f0 [ 82.449468] __split_and_process_non_flush+0x14a/0x1b0 [ 82.449775] __split_and_process_bio+0xde/0x340 [ 82.450045] ? dm_get_live_table+0x5/0xb0 [ 82.450278] dm_process_bio+0x98/0x290 [ 82.450518] dm_make_request+0x54/0x120 [ 82.450778] generic_make_request+0xd2/0x3e0 [ 82.451038] ? submit_bio+0x3c/0x150 [ 82.451278] submit_bio+0x3c/0x150 [ 82.451492] mpage_readpages+0x129/0x160 [ 82.451756] ? bdev_evict_inode+0x1d0/0x1d0 [ 82.452033] read_pages+0x72/0x170 [ 82.452260] __do_page_cache_readahead+0x1ba/0x1d0 [ 82.452624] force_page_cache_readahead+0x96/0x110 [ 82.452903] generic_file_read_iter+0x84f/0xae0 [ 82.453192] ? __seccomp_filter+0x7c/0x670 [ 82.453547] new_sync_read+0x10e/0x190 [ 82.453883] vfs_read+0x9d/0x150 [ 82.454172] ksys_read+0x65/0xe0 [ 82.454466] do_syscall_64+0x4e/0x210 [ 82.454828] entry_SYSCALL_64_after_hwframe+0x49/0xbe [...] [ 82.462501] ---[ end trace bb39975e9cf45daa ]--- Cc: stable@vger.kernel.org Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-04-28 19:51:46 -04:00
Sunwook Eom	ad4e80a639	dm verity fec: fix hash block number in verity_fec_decode The error correction data is computed as if data and hash blocks were concatenated. But hash block number starts from v->hash_start. So, we have to calculate hash block number based on that. Fixes: `a739ff3f54` ("dm verity: add support for forward error correction") Cc: stable@vger.kernel.org Signed-off-by: Sunwook Eom <speed.eom@samsung.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-04-16 16:16:38 -04:00
Mikulas Patocka	31b2212019	dm writecache: fix data corruption when reloading the target The dm-writecache reads metadata in the target constructor. However, when we reload the target, there could be another active instance running on the same device. This is the sequence of operations when doing a reload: 1. construct new target 2. suspend old target 3. resume new target 4. destroy old target Metadata that were written by the old target between steps 1 and 2 would not be visible by the new target. Fix the data corruption by loading the metadata in the resume handler. Also, validate block_size is at least as large as both the devices' logical block size and only read 1 block from the metadata during target constructor -- no need to read entirety of metadata now that it is done during resume. Fixes: `48debafe4f` ("dm: add writecache target") Cc: stable@vger.kernel.org # v4.18+ Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-04-16 16:04:13 -04:00
Linus Torvalds	9b06860d7c	libnvdimm for 5.7 - Add support for region alignment configuration and enforcement to fix compatibility across architectures and PowerPC page size configurations. - Introduce 'zero_page_range' as a dax operation. This facilitates filesystem-dax operation without a block-device. - Introduce phys_to_target_node() to facilitate drivers that want to know resulting numa node if a given reserved address range was onlined. - Advertise a persistence-domain for of_pmem and papr_scm. The persistence domain indicates where cpu-store cycles need to reach in the platform-memory subsystem before the platform will consider them power-fail protected. - Promote numa_map_to_online_node() to a cross-kernel generic facility. - Save x86 numa information to allow for node-id lookups for reserved memory ranges, deploy that capability for the e820-pmem driver. - Pick up some miscellaneous minor fixes, that missed v5.6-final, including a some smatch reports in the ioctl path and some unit test compilation fixups. - Fixup some flexible-array declarations. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEf41QbsdZzFdA8EfZHtKRamZ9iAIFAl6LtIAACgkQHtKRamZ9 iAIwRA/8CLVVuQpgHQ1tqK4h8CZPrISFXh7wy7uhocEU2xrDh6iGVnLztmoLRr2k 5f8T9lRzreSAwIVL5DbGqP1pFncqIt9VMnKsFlaPMBGCBNR+hURY0iBCNjIT+jiq BOzLd52MR2rqJxeXGTMUbWrBrbmuj4mZPdmGVuFFe7GFRpoaVpCgOo+296eWa/ot gIOFUTonZY7STYjNvDok0TXCmiCFuJb+P+y5ldfCPShHvZhTiaF53jircja8vAjO G5dt8ixBKUK0rXRc4SEQsQhAZNcAFHb6Gy5lg4C2QzhTF374xTc9usJZNWbIE9iM 5mipBYvjVuoY+XaCNZDkaRcJIy/jqB15O6l3QIWbZLGaK9m95YPp9LmkPFwd3JpO e3rO24ML471DxqB9iWIiJCNcBBocLOlnd6qAQTpppWDpGNbudwXvfsmKHmKIScSE x+IDCdscLmmm+WG2dLmLraWOVPu42xZFccoQCi4M3TTqfeB9pZ9XckFQ37zX62zG 5t+7Ek+t1W4QVt/JQYVKH03XT15sqUpVknvx0Hl4Y5TtbDOkFLkO8RN0/HyExDef 7iegS35kqTsM4EfZQ+9juKbI2JBAjHANcbj0V4dogqaRj6vr3akumBzUtuYqAofv qU3s9skmLsEemOJC+ns2PT8vl5dyIoeDfH0r2XvGWxYqolMqJpA= =sY4N -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm and dax updates from Dan Williams: "There were multiple touches outside of drivers/nvdimm/ this round to add cross arch compatibility to the devm_memremap_pages() interface, enhance numa information for persistent memory ranges, and add a zero_page_range() dax operation. This cycle I switched from the patchwork api to Konstantin's b4 script for collecting tags (from x86, PowerPC, filesystem, and device-mapper folks), and everything looks to have gone ok there. This has all appeared in -next with no reported issues. Summary: - Add support for region alignment configuration and enforcement to fix compatibility across architectures and PowerPC page size configurations. - Introduce 'zero_page_range' as a dax operation. This facilitates filesystem-dax operation without a block-device. - Introduce phys_to_target_node() to facilitate drivers that want to know resulting numa node if a given reserved address range was onlined. - Advertise a persistence-domain for of_pmem and papr_scm. The persistence domain indicates where cpu-store cycles need to reach in the platform-memory subsystem before the platform will consider them power-fail protected. - Promote numa_map_to_online_node() to a cross-kernel generic facility. - Save x86 numa information to allow for node-id lookups for reserved memory ranges, deploy that capability for the e820-pmem driver. - Pick up some miscellaneous minor fixes, that missed v5.6-final, including a some smatch reports in the ioctl path and some unit test compilation fixups. - Fixup some flexible-array declarations" * tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (29 commits) dax: Move mandatory ->zero_page_range() check in alloc_dax() dax,iomap: Add helper dax_iomap_zero() to zero a range dax: Use new dax zero page method for zeroing a page dm,dax: Add dax zero_page_range operation s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver dax, pmem: Add a dax operation zero_page_range pmem: Add functions for reading/writing page to/from pmem libnvdimm: Update persistence domain value for of_pmem and papr_scm device tools/test/nvdimm: Fix out of tree build libnvdimm/region: Fix build error libnvdimm/region: Replace zero-length array with flexible-array member libnvdimm/label: Replace zero-length array with flexible-array member ACPI: NFIT: Replace zero-length array with flexible-array member libnvdimm/region: Introduce an 'align' attribute libnvdimm/region: Introduce NDD_LABELING libnvdimm/namespace: Enforce memremap_compat_align() libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid libnvdimm: Out of bounds read in __nd_ioctl() acpi/nfit: improve bounds checking for 'func' mm/memremap_pages: Introduce memremap_compat_align() ...	2020-04-08 21:03:40 -07:00
Linus Torvalds	de3c913c6e	- Fix excessive bio splitting that caused performance regressions. - Fix logic bug in DM integrity discard support's integrity tag testing. - Fix DM integrity warning on ppc64le due to missing cast. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl6HbiETHHNuaXR6ZXJA cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWiRFCACoiewv5jBvqzYvINy0FpgwLyU0SOOY 11UCIzeLgQUPMZ4a5CUpPQuqSxKR3g7RAoD1EJZ1cyOnAJuk6A+VPkOLDioa4BEC uS7ifenihclFjcpWjQaKTNEhuTURYSjIwk1wqSwb7Fv+L2Uo+4XB8DvazBZdMq8+ 0EAO2CNl3r9Tkut0MRGr7DKrZa4QauaPsl7BvtzlLZbC3Nj2VncpwePYv9z7c7Ra g0zAq8IJlOPgDBUt4szbEAKiwGWQEiG6yxcyj5J3keU6mrHAg0mr08rxaapFkfSK F++3gTYjjLi8rQmczufvJ440t2kjK+b4dOfxMyjZccgjcoPxrxxkgaWq =Cj9X -----END PGP SIGNATURE----- Merge tag 'for-5.7/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix excessive bio splitting that caused performance regressions - Fix logic bug in DM integrity discard support's integrity tag testing - Fix DM integrity warning on ppc64le due to missing cast * tag 'for-5.7/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm integrity: fix logic bug in integrity tag testing Revert "dm: always call blk_queue_split() in dm_process_bio()" dm integrity: fix ppc64le warning	2020-04-03 14:44:48 -07:00
Mikulas Patocka	8267d8fb48	dm integrity: fix logic bug in integrity tag testing If all the bytes are equal to DISCARD_FILLER, we want to accept the buffer. If any of the bytes are different, we must do thorough tag-by-tag checking. The condition was inverted. Fixes: `84597a44a9` ("dm integrity: add optional discard support") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-04-03 13:07:41 -04:00
Mike Snitzer	120c9257f5	Revert "dm: always call blk_queue_split() in dm_process_bio()" This reverts commit `effd58c95f`. blk_queue_split() is causing excessive IO splitting -- because blk_max_size_offset() depends on 'chunk_sectors' limit being set and if it isn't (as is the case for DM targets!) it falls back to splitting on a 'max_sectors' boundary regardless of offset. "Fix" this by reverting back to _not_ using blk_queue_split() in dm_process_bio() for normal IO (reads and writes). Long-term fix is still TBD but it should focus on training blk_max_size_offset() to call into a DM provided hook (to call DM's max_io_len()). Test results from simple misaligned IO test on 4-way dm-striped device with chunksize of 128K and stripesize of 512K: xfs_io -d -c 'pread -b 2m 224s 4072s' /dev/mapper/stripe_dev before this revert: 253,0 21 1 0.000000000 2206 Q R 224 + 4072 [xfs_io] 253,0 21 2 0.000008267 2206 X R 224 / 480 [xfs_io] 253,0 21 3 0.000010530 2206 X R 224 / 256 [xfs_io] 253,0 21 4 0.000027022 2206 X R 480 / 736 [xfs_io] 253,0 21 5 0.000028751 2206 X R 480 / 512 [xfs_io] 253,0 21 6 0.000033323 2206 X R 736 / 992 [xfs_io] 253,0 21 7 0.000035130 2206 X R 736 / 768 [xfs_io] 253,0 21 8 0.000039146 2206 X R 992 / 1248 [xfs_io] 253,0 21 9 0.000040734 2206 X R 992 / 1024 [xfs_io] 253,0 21 10 0.000044694 2206 X R 1248 / 1504 [xfs_io] 253,0 21 11 0.000046422 2206 X R 1248 / 1280 [xfs_io] 253,0 21 12 0.000050376 2206 X R 1504 / 1760 [xfs_io] 253,0 21 13 0.000051974 2206 X R 1504 / 1536 [xfs_io] 253,0 21 14 0.000055881 2206 X R 1760 / 2016 [xfs_io] 253,0 21 15 0.000057462 2206 X R 1760 / 1792 [xfs_io] 253,0 21 16 0.000060999 2206 X R 2016 / 2272 [xfs_io] 253,0 21 17 0.000062489 2206 X R 2016 / 2048 [xfs_io] 253,0 21 18 0.000066133 2206 X R 2272 / 2528 [xfs_io] 253,0 21 19 0.000067507 2206 X R 2272 / 2304 [xfs_io] 253,0 21 20 0.000071136 2206 X R 2528 / 2784 [xfs_io] 253,0 21 21 0.000072764 2206 X R 2528 / 2560 [xfs_io] 253,0 21 22 0.000076185 2206 X R 2784 / 3040 [xfs_io] 253,0 21 23 0.000077486 2206 X R 2784 / 2816 [xfs_io] 253,0 21 24 0.000080885 2206 X R 3040 / 3296 [xfs_io] 253,0 21 25 0.000082316 2206 X R 3040 / 3072 [xfs_io] 253,0 21 26 0.000085788 2206 X R 3296 / 3552 [xfs_io] 253,0 21 27 0.000087096 2206 X R 3296 / 3328 [xfs_io] 253,0 21 28 0.000093469 2206 X R 3552 / 3808 [xfs_io] 253,0 21 29 0.000095186 2206 X R 3552 / 3584 [xfs_io] 253,0 21 30 0.000099228 2206 X R 3808 / 4064 [xfs_io] 253,0 21 31 0.000101062 2206 X R 3808 / 3840 [xfs_io] 253,0 21 32 0.000104956 2206 X R 4064 / 4096 [xfs_io] 253,0 21 33 0.001138823 0 C R 4096 + 200 [0] after this revert: 253,0 18 1 0.000000000 4430 Q R 224 + 3896 [xfs_io] 253,0 18 2 0.000018359 4430 X R 224 / 256 [xfs_io] 253,0 18 3 0.000028898 4430 X R 256 / 512 [xfs_io] 253,0 18 4 0.000033535 4430 X R 512 / 768 [xfs_io] 253,0 18 5 0.000065684 4430 X R 768 / 1024 [xfs_io] 253,0 18 6 0.000091695 4430 X R 1024 / 1280 [xfs_io] 253,0 18 7 0.000098494 4430 X R 1280 / 1536 [xfs_io] 253,0 18 8 0.000114069 4430 X R 1536 / 1792 [xfs_io] 253,0 18 9 0.000129483 4430 X R 1792 / 2048 [xfs_io] 253,0 18 10 0.000136759 4430 X R 2048 / 2304 [xfs_io] 253,0 18 11 0.000152412 4430 X R 2304 / 2560 [xfs_io] 253,0 18 12 0.000160758 4430 X R 2560 / 2816 [xfs_io] 253,0 18 13 0.000183385 4430 X R 2816 / 3072 [xfs_io] 253,0 18 14 0.000190797 4430 X R 3072 / 3328 [xfs_io] 253,0 18 15 0.000197667 4430 X R 3328 / 3584 [xfs_io] 253,0 18 16 0.000218751 4430 X R 3584 / 3840 [xfs_io] 253,0 18 17 0.000226005 4430 X R 3840 / 4096 [xfs_io] 253,0 18 18 0.000250404 4430 Q R 4120 + 176 [xfs_io] 253,0 18 19 0.000847708 0 C R 4096 + 24 [0] 253,0 18 20 0.000855783 0 C R 4120 + 176 [0] Fixes: `effd58c95f` ("dm: always call blk_queue_split() in dm_process_bio()") Cc: stable@vger.kernel.org Reported-by: Andreas Gruenbacher <agruenba@redhat.com> Tested-by: Barry Marson <bmarson@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-04-03 11:32:19 -04:00
Mike Snitzer	e7fc1e57d9	dm integrity: fix ppc64le warning Otherwise: In file included from drivers/md/dm-integrity.c:13: drivers/md/dm-integrity.c: In function 'dm_integrity_status': drivers/md/dm-integrity.c:3061:10: error: format '%llu' expects argument of type 'long long unsigned int', but argument 4 has type 'long int' [-Werror=format=] DMEMIT("%llu %llu", ^~~~~~~~~~~ atomic64_read(&ic->number_of_mismatches), ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./include/linux/device-mapper.h:550:46: note: in definition of macro 'DMEMIT' 0 : scnprintf(result + sz, maxlen - sz, x)) ^ cc1: all warnings being treated as errors Fixes: `7649194a16` ("dm integrity: remove sector type casts") Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-04-03 10:44:24 -04:00
Vivek Goyal	4e4ced9379	dax: Move mandatory ->zero_page_range() check in alloc_dax() zero_page_range() dax operation is mandatory for dax devices. Right now that check happens in dax_zero_page_range() function. Dan thinks that's too late and its better to do the check earlier in alloc_dax(). I also modified alloc_dax() to return pointer with error code in it in case of failure. Right now it returns NULL and caller assumes failure happened due to -ENOMEM. But with this ->zero_page_range() check, I need to return -EINVAL instead. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/20200401161125.GB9398@redhat.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2020-04-02 19:15:03 -07:00
Vivek Goyal	cdf6cdcd3b	dm,dax: Add dax zero_page_range operation This patch adds support for dax zero_page_range operation to dm targets. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20200228163456.1587-5-vgoyal@redhat.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2020-04-02 19:15:03 -07:00
Linus Torvalds	ffc1c20c46	- Add DM writecache "cleaner" policy feature that allows cache to be flushed while userspace monitors for completion to then discommision use of caching. - Optimize DM writecache superblock writing and also yield CPU while initializing writecache on large PMEM devices to avoid CPU stalls. - Various fixes to DM integrity target while preparing for the ability to resize a DM integrity device. In addition to resize support, add optional discard support with the "allow_discards" feature. - Fix DM clone target's discard handling and overflow bugs which could cause data corruption. - Fix memory leak in destructor for DM verity FEC support. - Fix DM zoned target's redundant increment of nr_rnd_zones. - Small cleanup in DM crypt to use crypt_integrity_aead() helper. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl6ExcsTHHNuaXR6ZXJA cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWj6OB/4n/EmXRI3x9uFTyaFFEjaALTUx7gye hIlOLtRTFmU6yit/uqAARLDBMDElhL/ze8RKs/TSmi/FH37u8d6DscG5dPatCsF1 dZ7z77uxhc0RQ+WkyMBtYqxO1OGzULt8434Pos0x1aoPrK+wUEpJOcZAJompAVfj nD3AbJ92zcv7DEdJGiCbViIrgrkAXkUWByXmn/l0AIJEjyxeCLhJdx76I+9PesOJ JKbbjLu1w19Yyo807CRBQLhC9fXhDUJO19jIlKRfGZ9Xa6V2xVuB45VVYiM/jGqI L9z6SsXBquX+8x9HgARuw82EkBp5DeazjrHQ2jHMHjMBhz+NFSK0MMnD =27Eq -----END PGP SIGNATURE----- Merge tag 'for-5.7/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Add DM writecache "cleaner" policy feature that allows cache to be flushed while userspace monitors for completion to then discommision use of caching. - Optimize DM writecache superblock writing and also yield CPU while initializing writecache on large PMEM devices to avoid CPU stalls. - Various fixes to DM integrity target while preparing for the ability to resize a DM integrity device. In addition to resize support, add optional discard support with the "allow_discards" feature. - Fix DM clone target's discard handling and overflow bugs which could cause data corruption. - Fix memory leak in destructor for DM verity FEC support. - Fix DM zoned target's redundant increment of nr_rnd_zones. - Small cleanup in DM crypt to use crypt_integrity_aead() helper. * tag 'for-5.7/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm clone metadata: Fix return type of dm_clone_nr_of_hydrated_regions() dm clone: Add missing casts to prevent overflows and data corruption dm clone: Add overflow check for number of regions dm clone: Fix handling of partial region discards dm writecache: add cond_resched to avoid CPU hangs dm integrity: improve discard in journal mode dm integrity: add optional discard support dm integrity: allow resize of the integrity device dm integrity: factor out get_provided_data_sectors() dm integrity: don't replay journal data past the end of the device dm integrity: remove sector type casts dm integrity: fix a crash with unusually large tag size dm zoned: remove duplicate nr_rnd_zones increase in dmz_init_zone() dm verity fec: fix memory leak in verity_fec_dtr dm writecache: optimize superblock write dm writecache: implement gradual cleanup dm writecache: implement the "cleaner" policy dm writecache: do direct write if the cache is full dm integrity: print device name in integrity_metadata() error message dm crypt: use crypt_integrity_aead() helper	2020-04-01 15:33:12 -07:00
Linus Torvalds	1592614838	for-5.7/drivers-2020-03-29 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl6BJDYQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgplhMD/95jd4nlVetHAo54z+Zk2ExE13+yDamRKyh vc7t2tz1reqFOimtVr5aVuTXCTgOx4CpiIox5qcn6qAExN4JtCChOBRGize/0u8S ckxnhHbN2C0rfnGldvrYYeNRonFI+7QKimnurWUSYYGN0xqbo21BxJ7dFaohMseo q4K8sIW0ctE6AOlw28Jerkg614s2NDGZ7q1laheXnYHn5c9f1m0NaKN/jyTGgr0X TLBiLbX2yRrAuvpctBj6Fna6YN7Vdd9jsf2Bt6ipUI1XgHQoVUGMxQNhWPyjsbSv GzRQUNAfVcasLzCP/Mj/47144OkUtDDpn2mjeXDaFljLDGFULD+jp/SsOmLCxkPC gI7G2yfBvF96/SOyT0JXrLyMcBd1R2vRoASbc5tPu82mZhx7YJZH5WYtOB9h2gra RTYo3xcm0EoN6yeMaH+xOuXxTWWInIrgKPONW4H8s7hxEiMt5oFNVBI7vqPr4LVp tpfxiKZDavKOofKXogNV4W7mSMP/Ir5Q9Ha4g5SXHBGp0z/PHmnQ0xDGNq0KDnU4 eNO0UYCFNCNa+0AOhpNxaVuVm9LjrgvyXRjePgOZQ4akhohwHO6DLrHK1f8Hb1vD 8Ih6uR+F5zZlKsouWro8HLGYm5w40Wq9tbCI8QbPYH6nkGoDmzpPv9jbAeWgJU5c KqP/5TBSLA== =Bs4E -----END PGP SIGNATURE----- Merge tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: - floppy driver cleanup series from Willy - NVMe updates and fixes (Various) - null_blk trace improvements (Chaitanya) - bcache fixes (Coly) - md fixes (via Song) - loop block size change optimizations (Martijn) - scnprintf() use (Takashi) * tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-block: (81 commits) null_blk: add trace in null_blk_zoned.c null_blk: add tracepoint helpers for zoned mode block: add a zone condition debug helper nvme: cleanup namespace identifier reporting in nvme_init_ns_head nvme: rename __nvme_find_ns_head to nvme_find_ns_head nvme: refactor nvme_identify_ns_descs error handling nvme-tcp: Add warning on state change failure at nvme_tcp_setup_ctrl nvme-rdma: Add warning on state change failure at nvme_rdma_setup_ctrl nvme: Fix controller creation races with teardown flow nvme: Make nvme_uninit_ctrl symmetric to nvme_init_ctrl nvme: Fix ctrl use-after-free during sysfs deletion nvme-pci: Re-order nvme_pci_free_ctrl nvme: Remove unused return code from nvme_delete_ctrl_sync nvme: Use nvme_state_terminal helper nvme: release ida resources nvme: Add compat_ioctl handler for NVME_IOCTL_SUBMIT_IO nvmet-tcp: optimize tcp stack TX when data digest is used nvme-fabrics: Use scnprintf() for avoiding potential buffer overflow nvme-multipath: do not reset on unknown status nvmet-rdma: allocate RW ctxs according to mdts ...	2020-03-30 11:43:51 -07:00
Nikos Tsironis	81d5553d12	dm clone metadata: Fix return type of dm_clone_nr_of_hydrated_regions() dm_clone_nr_of_hydrated_regions() returns the number of regions that have been hydrated so far. In order to do so it employs bitmap_weight(). Until now, the return type of dm_clone_nr_of_hydrated_regions() was unsigned long. Because bitmap_weight() returns an int, in case BITS_PER_LONG == 64 and the return value of bitmap_weight() is 2^31 (the maximum allowed number of regions for a device), the result is sign extended from 32 bits to 64 bits and an incorrect value is displayed, in the status output of dm-clone, as the number of hydrated regions. Fix this by having dm_clone_nr_of_hydrated_regions() return an unsigned int. Fixes: `7431b7835f` ("dm: add clone target") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-03-27 14:42:51 -04:00
Nikos Tsironis	9fc06ff568	dm clone: Add missing casts to prevent overflows and data corruption Add missing casts when converting from regions to sectors. In case BITS_PER_LONG == 32, the lack of the appropriate casts can lead to overflows and miscalculation of the device sector. As a result, we could end up discarding and/or copying the wrong parts of the device, thus corrupting the device's data. Fixes: `7431b7835f` ("dm: add clone target") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-03-27 14:42:25 -04:00
Nikos Tsironis	cd481c1226	dm clone: Add overflow check for number of regions Add overflow check for clone->nr_regions variable, which holds the number of regions of the target. The overflow can occur with sufficiently large devices, if BITS_PER_LONG == 32. E.g., if the region size is 8 sectors (4K), the overflow would occur for device sizes > 34359738360 sectors (~16TB). This could result in multiple device sectors wrongly mapping to the same region number, due to the truncation from 64 bits to 32 bits, which would lead to data corruption. Fixes: `7431b7835f` ("dm: add clone target") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-03-27 14:41:46 -04:00
Nikos Tsironis	4b5142905d	dm clone: Fix handling of partial region discards There is a bug in the way dm-clone handles discards, which can lead to discarding the wrong blocks or trying to discard blocks beyond the end of the device. This could lead to data corruption, if the destination device indeed discards the underlying blocks, i.e., if the discard operation results in the original contents of a block to be lost. The root of the problem is the code that calculates the range of regions covered by a discard request and decides which regions to discard. Since dm-clone handles the device in units of regions, we don't discard parts of a region, only whole regions. The range is calculated as: rs = dm_sector_div_up(bio->bi_iter.bi_sector, clone->region_size); re = bio_end_sector(bio) >> clone->region_shift; , where 'rs' is the first region to discard and (re - rs) is the number of regions to discard. The bug manifests when we try to discard part of a single region, i.e., when we try to discard a block with size < region_size, and the discard request both starts at an offset with respect to the beginning of that region and ends before the end of the region. The root cause is the following comparison: if (rs == re) // skip discard and complete original bio immediately , which doesn't take into account that 'rs' might be greater than 're'. Thus, we then issue a discard request for the wrong blocks, instead of skipping the discard all together. Fix the check to also take into account the above case, so we don't end up discarding the wrong blocks. Also, add some range checks to dm_clone_set_region_hydrated() and dm_clone_cond_set_range(), which update dm-clone's region bitmap. Note that the aforementioned bug doesn't cause invalid memory accesses, because dm_clone_is_range_hydrated() returns True for this case, so the checks are just precautionary. Fixes: `7431b7835f` ("dm: add clone target") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-03-27 14:41:21 -04:00
Mikulas Patocka	1edaa447d9	dm writecache: add cond_resched to avoid CPU hangs Initializing a dm-writecache device can take a long time when the persistent memory device is large. Add cond_resched() to a few loops to avoid warnings that the CPU is stuck. Cc: stable@vger.kernel.org # v4.18+ Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-03-27 14:36:50 -04:00
Christoph Hellwig	3d745ea5b0	block: simplify queue allocation Current make_request based drivers use either blk_alloc_queue_node or blk_alloc_queue to allocate a queue, and then set up the make_request_fn function pointer and a few parameters using the blk_queue_make_request helper. Simplify this by passing the make_request pointer to blk_alloc_queue, and while at it merge the _node variant into the main helper by always passing a node_id, and remove the superfluous gfp_mask parameter. A lower-level __blk_alloc_queue is kept for the blk-mq case. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-27 10:23:43 -06:00
Christoph Hellwig	ff27668ce8	bcache: pass the make_request methods to blk_queue_make_request bcache is the only driver not actually passing its make_request methods to blk_queue_make_request, but instead just sets them up manually a little later. Make bcache follow the common way of setting up make_request based queues. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-27 10:23:43 -06:00
Christoph Hellwig	c6a564ffad	block: move the part_stat* helpers from genhd.h to a new header These macros are just used by a few files. Move them out of genhd.h, which is included everywhere into a new standalone header. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-25 09:50:09 -06:00
Coly Li	5ae3a2c03d	bcache: remove dupplicated declaration from btree.h Commit `253a99d95d` ("bcache: move macro btree() and btree_root() into btree.h") makes two duplicated declaration into btree.h, typedef int (btree_map_keys_fn)(); int bch_btree_map_keys(); The kbuild test robot <lkp@intel.com> detects and reports this problem and this patch fixes it by removing the duplicated ones. Fixes: `253a99d95d` ("bcache: move macro btree() and btree_root() into btree.h") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-24 19:56:42 -06:00
Mikulas Patocka	31843edab7	dm integrity: improve discard in journal mode When we discard something that is present in the journal, we flush the journal first, so that discarded blocks are not overwritten by the journal content. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-03-24 13:09:49 -04:00
Mikulas Patocka	84597a44a9	dm integrity: add optional discard support Add an argument "allow_discards" that enables discard processing on dm-integrity device. Discards are only allowed to devices using internal hash. When a block is discarded the integrity tag is filled with DISCARD_FILLER (0xf6) bytes. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-03-24 13:05:25 -04:00
Mikulas Patocka	1ac2c15a7b	dm integrity: allow resize of the integrity device If the size of the underlying device changes, change the size of the integrity device too. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-03-24 12:52:53 -04:00
Mikulas Patocka	87fb177b4c	dm integrity: factor out get_provided_data_sectors() Move code to a new function get_provided_data_sectors(). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-03-24 12:46:35 -04:00
Mikulas Patocka	f6f72f32c2	dm integrity: don't replay journal data past the end of the device Following commits will make it possible to shrink or extend the device. If the device was shrunk, we don't want to replay journal data pointing past the end of the device. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-03-24 12:43:21 -04:00
Mikulas Patocka	7649194a16	dm integrity: remove sector type casts Since the commit `72deb455b5` ("block: remove CONFIG_LBDAF") sector_t is always defined as unsigned long long. Delete the needless type casts in printk and avoids some warnings if DEBUG_PRINT is defined. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-03-24 12:40:18 -04:00
Mikulas Patocka	b93b6643e9	dm integrity: fix a crash with unusually large tag size If the user specifies tag size larger than HASH_MAX_DIGESTSIZE, there's a crash in integrity_metadata(). Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-03-24 12:34:45 -04:00
Bob Liu	b8fdd09037	dm zoned: remove duplicate nr_rnd_zones increase in dmz_init_zone() zmd->nr_rnd_zones was increased twice by mistake. The other place it is increased in dmz_init_zone() is the only one needed: 1131 zmd->nr_useable_zones++; 1132 if (dmz_is_rnd(zone)) { 1133 zmd->nr_rnd_zones++; ^^^ Fixes: `3b1a94c88b` ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-03-24 12:21:48 -04:00
Shetty, Harshini X (EXT-Sony Mobile)	75fa601934	dm verity fec: fix memory leak in verity_fec_dtr Fix below kmemleak detected in verity_fec_ctr. output_pool is allocated for each dm-verity-fec device. But it is not freed when dm-table for the verity target is removed. Hence free the output mempool in destructor function verity_fec_dtr. unreferenced object 0xffffffffa574d000 (size 4096): comm "init", pid 1667, jiffies 4294894890 (age 307.168s) hex dump (first 32 bytes): 8e 36 00 98 66 a8 0b 9b 00 00 00 00 00 00 00 00 .6..f........... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000060e82407>] __kmalloc+0x2b4/0x340 [<00000000dd99488f>] mempool_kmalloc+0x18/0x20 [<000000002560172b>] mempool_init_node+0x98/0x118 [<000000006c3574d2>] mempool_init+0x14/0x20 [<0000000008cb266e>] verity_fec_ctr+0x388/0x3b0 [<000000000887261b>] verity_ctr+0x87c/0x8d0 [<000000002b1e1c62>] dm_table_add_target+0x174/0x348 [<000000002ad89eda>] table_load+0xe4/0x328 [<000000001f06f5e9>] dm_ctl_ioctl+0x3b4/0x5a0 [<00000000bee5fbb7>] do_vfs_ioctl+0x5dc/0x928 [<00000000b475b8f5>] __arm64_sys_ioctl+0x70/0x98 [<000000005361e2e8>] el0_svc_common+0xa0/0x158 [<000000001374818f>] el0_svc_handler+0x6c/0x88 [<000000003364e9f4>] el0_svc+0x8/0xc [<000000009d84cec9>] 0xffffffffffffffff Fixes: `a739ff3f54` ("dm verity: add support for forward error correction") Depends-on: `6f1c819c21` ("dm: convert to bioset_init()/mempool_init()") Cc: stable@vger.kernel.org Signed-off-by: Harshini Shetty <harshini.x.shetty@sony.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-03-24 12:00:29 -04:00
Mikulas Patocka	dc8a01ae1d	dm writecache: optimize superblock write If we write a superblock in writecache_flush, we don't need to set bit and scan the bitmap for it - we can just write the superblock directly. Also, we can set the flag REQ_FUA on the write bio, so that we don't need to submit a flush bio afterwards. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-03-24 11:55:09 -04:00
Mikulas Patocka	3923d4854e	dm writecache: implement gradual cleanup If a block is stored in the cache for too long, it will now be written to the underlying device and cleaned up. Add a new option "max_age" that specifies the maximum age of a block in milliseconds. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-03-24 11:55:08 -04:00
Mikulas Patocka	93de44eb3f	dm writecache: implement the "cleaner" policy The "flush" or "flush_on_suspend" messages flush the whole cache. However, these flushing methods can take some time and the process is left in an interruptible state during the flush. Implement a "cleaner" option that offers an alternate flushing method. When this option is activated (either by a message or in the constructor arguments), the cache will not promote new writes (however, writes to already cached blocks are promoted, to avoid data corruption due to misordered writes) and it will gradually writeback any cached data. The userspace can then monitor the cleaning process with "dmsetup status". When the number of cached bloks drops to zero, the userspace can unload the dm-writecache target and replace it with dm-linear or other targets. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-03-24 11:50:30 -04:00
Mikulas Patocka	d53f1fafec	dm writecache: do direct write if the cache is full If the cache device is full, we do a direct write to the origin device. Note that we must not do it if the written block is already in the cache. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-03-24 11:26:18 -04:00
Erich Eckner	eaab4bde6e	dm integrity: print device name in integrity_metadata() error message Similar to `f710126cfc` ("dm crypt: print device name in integrity error message"), this message should also better identify the device with the integrity failure. Signed-off-by: Erich Eckner <git@eckner.net> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-03-24 11:25:11 -04:00
Yang Yingliang	3fd53533a8	dm crypt: use crypt_integrity_aead() helper Replace test_bit(CRYPT_MODE_INTEGRITY_AEAD, XXX) with crypt_integrity_aead(). Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-03-24 11:17:33 -04:00
Christoph Hellwig	74cc979c3c	block: cleanup how md_autodetect_dev is called Add a new include/linux/raid/detect.h header to declare the md_autodetect_dev prototype which can be shared between md and the partition code. Then use IS_BUILTIN to call it instead of the ifdef magic. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-24 07:57:08 -06:00
Christoph Hellwig	ea3edd4dc2	block: remove __bdevname There is no good reason for __bdevname to exist. Just open code printing the string in the callers. For three of them the format string can be trivially merged into existing printk statements, and in init/do_mounts.c we can at least do the scnprintf once at the start of the function, and unconditional of CONFIG_BLOCK to make the output for tiny configfs a little more helpful. Acked-by: Theodore Ts'o <tytso@mit.edu> # for ext4 Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-24 07:57:07 -06:00
Coly Li	eb9b6666d6	bcache: optimize barrier usage for atomic operations The idea of this patch is from Davidlohr Bueso, he posts a patch for bcache to optimize barrier usage for read-modify-write atomic bitops. Indeed such optimization can also apply on other locations where smp_mb() is used before or after an atomic operation. This patch replaces smp_mb() with smp_mb__before_atomic() or smp_mb__after_atomic() in btree.c and writeback.c, where it is used to synchronize memory cache just earlier on other cores. Although the locations are not on hot code path, it is always not bad to mkae things a little better. Signed-off-by: Coly Li <colyli@suse.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-22 10:06:57 -06:00
Davidlohr Bueso	b004aa867c	bcache: optimize barrier usage for Rmw atomic bitops We can avoid the unnecessary barrier on non LL/SC architectures, such as x86. Instead, use the smp_mb__after_atomic(). Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-22 10:06:57 -06:00
Takashi Iwai	9876e38609	bcache: Use scnprintf() for avoiding potential buffer overflow Since snprintf() returns the would-be-output size instead of the actual output size, the succeeding calls may go beyond the given buffer limit. Fix it by replacing with scnprintf(). Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-22 10:06:57 -06:00
Coly Li	b144e45fc5	bcache: make bch_sectors_dirty_init() to be multithreaded When attaching a cached device (a.k.a backing device) to a cache device, bch_sectors_dirty_init() is called to count dirty sectors and stripes (see what bcache_dev_sectors_dirty_add() does) on the cache device. The counting is done by a single thread recursive function bch_btree_map_keys() to iterate all the bcache btree nodes. If the btree has huge number of nodes, bch_sectors_dirty_init() will take quite long time. In my testing, if the registering cache set has a existed UUID which matches a already registered cached device, the automatical attachment during the registration may take more than 55 minutes. This is too long for waiting the bcache to work in real deployment. Fortunately when bch_sectors_dirty_init() is called, no other thread will access the btree yet, it is safe to do a read-only parallelized dirty sectors counting by multiple threads. This patch tries to create multiple threads, and each thread tries to one-by-one count dirty sectors from the sub-tree indexed by a root node key which the thread fetched. After the sub-tree is counted, the counting thread will continue to fetch another root node key, until the fetched key is NULL. How many threads in parallel depends on the number of keys from the btree root node, and the number of online CPU core. The thread number will be the less number but no more than BCH_DIRTY_INIT_THRD_MAX. If there are only 2 keys in root node, it can only be 2x times faster by this patch. But if there are 10 keys in the root node, with this patch it can be 10x times faster. Signed-off-by: Coly Li <colyli@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-22 10:06:57 -06:00
Coly Li	8e7102273f	bcache: make bch_btree_check() to be multithreaded When registering a cache device, bch_btree_check() is called to check all btree nodes, to make sure the btree is consistent and not corrupted. bch_btree_check() is recursively executed in a single thread, when there are a lot of data cached and the btree is huge, it may take very long time to check all the btree nodes. In my testing, I observed it took around 50 minutes to finish bch_btree_check(). When checking the bcache btree nodes, the cache set is not running yet, and indeed the whole tree is in read-only state, it is safe to create multiple threads to check the btree in parallel. This patch tries to create multiple threads, and each thread tries to one-by-one check the sub-tree indexed by a key from the btree root node. The parallel thread number depends on how many keys in the btree root node. At most BCH_BTR_CHKTHREAD_MAX (64) threads can be created, but in practice is should be min(cpu-number/2, root-node-keys-number). Signed-off-by: Coly Li <colyli@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-22 10:06:57 -06:00
Coly Li	feac1a70b8	bcache: add bcache_ prefix to btree_root() and btree() macros This patch changes macro btree_root() and btree() to bcache_btree_root() and bcache_btree(), to avoid potential generic name clash in future. NOTE: for product kernel maintainers, this patch can be skipped if you feel the rename stuffs introduce inconvenince to patch backport. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-22 10:06:57 -06:00
Coly Li	253a99d95d	bcache: move macro btree() and btree_root() into btree.h In order to accelerate bcache registration speed, the macro btree() and btree_root() will be referenced out of btree.c. This patch moves them from btree.c into btree.h with other relative function declaration in btree.h, for the following changes. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-22 10:06:57 -06:00
Guoqing Jiang	6b40bec3b1	md: check arrays is suspended in mddev_detach before call quiesce operations Don't call quiesce(1) and quiesce(0) if array is already suspended, otherwise in level_store, the array is writable after mddev_detach in below part though the intention is to make array writable after resume. mddev_suspend(mddev); mddev_detach(mddev); ... mddev_resume(mddev); And it also causes calltrace as follows in [1]. [48005.653834] WARNING: CPU: 1 PID: 45380 at kernel/kthread.c:510 kthread_park+0x77/0x90 [...] [48005.653976] CPU: 1 PID: 45380 Comm: mdadm Tainted: G OE 5.4.10-arch1-1 #1 [48005.653979] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J4105-ITX, BIOS P1.40 08/06/2018 [48005.653984] RIP: 0010:kthread_park+0x77/0x90 [48005.654015] Call Trace: [48005.654039] r5l_quiesce+0x3c/0x70 [raid456] [48005.654052] raid5_quiesce+0x228/0x2e0 [raid456] [48005.654073] mddev_detach+0x30/0x70 [md_mod] [48005.654090] level_store+0x202/0x670 [md_mod] [48005.654099] ? security_capable+0x40/0x60 [48005.654114] md_attr_store+0x7b/0xc0 [md_mod] [48005.654123] kernfs_fop_write+0xce/0x1b0 [48005.654132] vfs_write+0xb6/0x1a0 [48005.654138] ksys_write+0x67/0xe0 [48005.654146] do_syscall_64+0x4e/0x140 [48005.654155] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [48005.654161] RIP: 0033:0x7fa0c8737497 [1]: https://bugzilla.kernel.org/show_bug.cgi?id=206161 Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>	2020-03-17 10:53:07 -07:00
Linus Torvalds	5dfcc13902	block-5.6-2020-03-07 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl5j8hwQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpnjID/4/XVrqtVNUzVoVOtkOyxyesBrJVMHEQEpJ PZssv835IStw0ENhxQJfGjPaIFc9Ff6PMkeN5KRAlMoEc+NkrJShF3owGf+6Bps7 rxpblPxaw+CJFa31YBDZVjMCvbVkDm40G5SsJh+xzdIjlWz7MppkkMPdrErPwY8V 0vnrIc+mKBKfBMZTwVkycYtp17LVgfXguledoWzxM1y47IW5UasKh8jdzhbu8Hvt zztdQrigUdb+9XnLGCZIY0JQOyrhJ5zQpZ40FzbvxdYrQZXOoYT8L7iFu/z0Wi7K p3a+G+B4WowtLYW78me4Uut5RrHq2XOehSypfujanQlpgXPGjS3TdHT3an2T8XPQ NyGsZsn/eLm3btNbhGUd8vqpQy5EmWhqmwvYk9tFAoSFLiLcvCC624b/TCYPL+gk 3ZiI7mXBMjHnUZ0J/RF6kZWTAZDvr/tE7UZt1f8r1eEr8VDzCNp5Pst+HCVIguYD g9eWF8oH6wYoj39UKf1k+vW2GjXGFsnfivObaxhyz03sAPXK2wQlzAe/4jZ24XNr TRtOXh97c3CbLAwdUHehlzzdR3U7h0n2KsmrTC5AGmLABmR79s7BJ0+pexuZituO LwU8+gpf7AugHTrLg1eNXAmBHW44I1ticXYiWcT4iSPn99kNIhlW+Jb1iTGoiu7n nXyS3b5SCw== =xwKl -----END PGP SIGNATURE----- Merge tag 'block-5.6-2020-03-07' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Here are a few fixes that should go into this release. This contains: - Revert of a bad bcache patch from this merge window - Removed unused function (Daniel) - Fixup for the blktrace fix from Jan from this release (Cengiz) - Fix of deeper level bfqq overwrite in BFQ (Carlo)" * tag 'block-5.6-2020-03-07' of git://git.kernel.dk/linux-block: block, bfq: fix overwrite of bfq_group pointer in bfq_find_set_group() blktrace: fix dereference after null check Revert "bcache: ignore pending signals when creating gc and allocator thread" block: Remove used kblockd_schedule_work_on()	2020-03-07 14:14:38 -06:00
Linus Torvalds	776e49e8dd	- Fix request-based DM's congestion_fn and actually wire it up to the bdi. - Extend dm-bio-record to track additional struct bio members needed by DM integrity target. - Fix DM core to properly advertise that a device is suspended during unload (between the presuspend and postsuspend hooks). This change is a prereq for related DM integrity and DM writecache fixes. It elevates DM integrity's 'suspending' state tracking to DM core. - Four stable fixes for DM integrity target. - Fix crash in DM cache target due to incorrect work item cancelling. - Fix DM thin metadata lockdep warning that was introduced during 5.6 merge window. - Fix DM zoned target's chunk work refcounting that regressed during recent conversion to refcount_t. - Bump the minor version for DM core and all target versions that have seen interface changes or important fixes during the 5.6 cycle. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl5fwPgTHHNuaXR6ZXJA cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWnUJB/9+36J0Y3HYdUIKM8P5CBH0tdkMqSo4 BFm3Og4Xo1GA5xodNptgD+QoLXy3VWkekUsLvaLgOlPq7haPvHkME20vUMuO1l46 QMNR2VXciYGgV0+CFlpXL2oMr0ZEc1hkt7ZpzYaw1OIk6Mo0tEEDrSo0rvmXafPf W4veTWRL4HfN8fy3NpQNJ4xBQs4Iw4VC20FPFIOvzA1EGk7VGGaP1mGKmOnjxo/O o3lEH8NpQgwxwST7d4T6DPdeu3aTifnjdY8/h8mFGpQnxOiZ/wZkheKWwTCNE8r9 oeVY07qVxNAHyC8G/SmAL5KnC6En1hq5jA2M4xh9guUI2k8YbON571n+ =iAS0 -----END PGP SIGNATURE----- Merge tag 'for-5.6/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix request-based DM's congestion_fn and actually wire it up to the bdi. - Extend dm-bio-record to track additional struct bio members needed by DM integrity target. - Fix DM core to properly advertise that a device is suspended during unload (between the presuspend and postsuspend hooks). This change is a prereq for related DM integrity and DM writecache fixes. It elevates DM integrity's 'suspending' state tracking to DM core. - Four stable fixes for DM integrity target. - Fix crash in DM cache target due to incorrect work item cancelling. - Fix DM thin metadata lockdep warning that was introduced during 5.6 merge window. - Fix DM zoned target's chunk work refcounting that regressed during recent conversion to refcount_t. - Bump the minor version for DM core and all target versions that have seen interface changes or important fixes during the 5.6 cycle. * tag 'for-5.6/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: bump version of core and various targets dm: fix congested_fn for request-based device dm integrity: use dm_bio_record and dm_bio_restore dm bio record: save/restore bi_end_io and bi_integrity dm zoned: Fix reference counter initial value of chunk works dm writecache: verify watermark during resume dm: report suspended device during destroy dm thin metadata: fix lockdep complaint dm cache: fix a crash due to incorrect work item cancelling dm integrity: fix invalid table returned due to argument count mismatch dm integrity: fix a deadlock due to offloading to an incorrect workqueue dm integrity: fix recalculation when moving from journal mode to bitmap mode	2020-03-04 13:02:45 -06:00
Mike Snitzer	636be4241b	dm: bump version of core and various targets Changes made during the 5.6 cycle warrant bumping the version number for DM core and the targets modified by this commit. It should be noted that dm-thin, dm-crypt and dm-raid already had their target version bumped during the 5.6 merge window. Signed-off-by; Mike Snitzer <snitzer@redhat.com>	2020-03-03 11:10:21 -05:00

1 2 3 4 5 ...

6153 Commits