openkylin/qemu - qemu - 红山开源项目托管

Commit Graph

Author	SHA1	Message	Date
Kevin Wolf	5280aa32e1	block: Don't wait for requests in bdrv_drain*_end() The device is drained, so there is no point in waiting for requests at the end of the drained section. Remove the bdrv_drain_recurse() calls there. The bdrv_drain_recurse() calls were introduced in commit `481cad48e5` in order to call the .bdrv_co_drain_end() driver callback. This is now done by a separate bdrv_drain_invoke() call. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-12-22 15:03:41 +01:00
Kevin Wolf	99c05de918	block: bdrv_drain_recurse(): Remove unused begin parameter Now that the bdrv_drain_invoke() calls are pulled up to the callers of bdrv_drain_recurse(), the 'begin' parameter isn't needed any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-12-22 15:03:41 +01:00
Kevin Wolf	2da9b7d456	block: Call .drain_begin only once in bdrv_drain_all_begin() bdrv_drain_all_begin() used to call the .bdrv_co_drain_begin() driver callback inside its polling loop. This means that how many times it got called for each node depended on long it had to poll the event loop. This is obviously not right and results in nodes that stay drained even after bdrv_drain_all_end(), which calls .bdrv_co_drain_begin() once per node. Fix bdrv_drain_all_begin() to call the callback only once, too. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-12-22 15:03:41 +01:00
Kevin Wolf	db0289b9b2	block: Make bdrv_drain_invoke() recursive This change separates bdrv_drain_invoke(), which calls the BlockDriver drain callbacks, from bdrv_drain_recurse(). Instead, the function performs its own recursion now. One reason for this is that bdrv_drain_recurse() can be called multiple times by bdrv_drain_all_begin(), but the callbacks may only be called once. The separation is necessary to fix this bug. The other reason is that we intend to go to a model where we call all driver callbacks first, and only then start polling. This is not fully achieved yet with this patch, as bdrv_drain_invoke() contains a BDRV_POLL_WHILE() loop for the block driver callbacks, which can still call callbacks for any unrelated event. It's a step in this direction anyway. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-12-22 15:03:41 +01:00
Peter Lieven	e38bc23454	block/iscsi: only report an iSCSI Failure if we don't handle it gracefully we currently report an "iSCSI Failure" in iscsi_co_generic_cb if the task hasn't completed with SCSI_STATUS_GOOD. However, we expect a failure in some cases and handle it gracefully. This is the case for misaligned UNMAPs and WRITESAME10/16 calls without UNMAP. In this case a failure in the logs can be quite misleading. While we are at it improve the logging to reveal which operation failed at what LBA. Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <1512733868-9009-3-git-send-email-pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-12-21 09:30:32 +01:00
Peter Lieven	aef172ffdc	block/iscsi: dont leave allocmap in an invalid state on UNMAP failure we forgot to set the allocmap to invalid if an UNMAP call fails. Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <1512733868-9009-2-git-send-email-pl@kamp.de> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-12-21 09:30:31 +01:00
Peter Maydell	f1faf2d59c	Pull request v2: * Fixed incorrect virtio_blk_data_plane_create() local_err refactoring in "hw/block: Use errp directly rather than local_err" that broke virtio-blk over virtio-mmio [Peter] -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJaOSteAAoJEJykq7OBq3PIllkH/RkxTY6JIe9K8PRVsaAX2fRN edO/3E09KTQe9eHEixoMKOIyeKi3RPdipcktXIbdLIDEY4z4vELmQslTrxK/q+8J pccdwu+7tEXr14ciYSnq0m6ksvU5JHlJGyAJEvbCmLHE3dPJszABwT1XLLCb1C8s hSOr3nR/O2U3LHlq/FuvEUK8fohgKlECtE94V/DUWyC774iMw+9OdvTA0VQWYnN6 B0gpYSn4AXmdt5HmpgCa+5rZrT2DjdwhtR9X+iOItPoXJPP81toUxvshLbTgdL54 fSodd12Tbn2Pxr/osD1kwzM9z6oYX8Ay8YZTabODiFo20fhZKZ2wLpL4rrsNnBk= =Qcx2 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging Pull request v2: * Fixed incorrect virtio_blk_data_plane_create() local_err refactoring in "hw/block: Use errp directly rather than local_err" that broke virtio-blk over virtio-mmio [Peter] # gpg: Signature made Tue 19 Dec 2017 15:08:14 GMT # gpg: using RSA key 0x9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: (23 commits) qemu-iotests: add 203 savevm with IOThreads test iothread: fix iothread_stop() race condition iotests: add VM.add_object() blockdev: add x-blockdev-set-iothread force boolean docs: mark nested AioContext locking as a legacy API block: avoid recursive AioContext acquire in bdrv_inactivate_all() virtio-blk: reject configs with logical block size > physical block size virtio-blk: make queue size configurable qemu-iotests: add 202 external snapshots IOThread test blockdev: add x-blockdev-set-iothread testing command iothread: add iothread_by_id() API block: drop unused BlockDirtyBitmapState->aio_context field block: don't keep AioContext acquired after internal_snapshot_prepare() block: don't keep AioContext acquired after blockdev_backup_prepare() block: don't keep AioContext acquired after drive_backup_prepare() block: don't keep AioContext acquired after external_snapshot_prepare() blockdev: hold AioContext for bdrv_unref() in external_snapshot_clean() qdev: drop unused #include "sysemu/iothread.h" dev-storage: Fix the unusual function name hw/block: Use errp directly rather than local_err ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # hw/core/qdev-properties-system.c	2017-12-20 11:30:55 +00:00
Peter Maydell	03c1c09d56	-----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJaOC2xAAoJEL2+eyfA3jBXeOcP/RUB2PDP6xC+gEtQ/pNTuT8a SY+2ZnxeCl/n6grW7Tf4IXwDZCTIM4+fthODJe3gmF0u8RkaEIm4Km4lf9Bc2gal 2ZmZoXk00va4FHlv7CVMQV/kcl3BudUqA7bwlGm8HQAEFkYaYtSrpRIlCrvohQIX hb3hPme1Ia/GGz0W+QTli6bKxBdgXXO20ZOt03UkVRLBO5LN/U9VhHh2Uu1+QOaX bpuqDnfRmkZ5iKGSMkNLT81PAOSQelHz3eej8UhdQz2F1KU2RTWaZ6E3Y/Ky6qah OL5vmV4BJaOlX+c+Vkg97TxChccAJPb1TRXiUl1Ypo8YPOBnuJ0CSBbadI2CnFu4 hAbGvzs77mOq1B3zY1gNMzVpK0R/nSPmXi63tY02VdedSbJ2s7dVyP49oV3Cko89 8XGdUOD6fn5goaA+GwMPs6iQQZBSRCqX7L3tawIcHHqi09BG6D/MNqzcTV8DjJ9Z 6UY1nGmPgO44fQmpkt+NJwYbEw8oENYEZZxsPupaFVqmIyPyjbhwV0ISLAGZtOWA lb4hETJ3a0dFhu6FawZyw2sDpTaSu+7AgIrBrmFFUUciYqWJr9xfAsvbFSqNARX2 Msihq9T6oXfefO+d56C1AIJmIz+LL5KtAUfNaD9qLgHTcyMvfdPQJl5lLyXs7/NK RL6MRSe3KL5GQsga1nK7 =8vnb -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging # gpg: Signature made Mon 18 Dec 2017 21:05:53 GMT # gpg: using RSA key 0xBDBE7B27C0DE3057 # gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>" # gpg: aka "Jeffrey Cody <jeff@codyprime.org>" # gpg: aka "Jeffrey Cody <codyprime@gmail.com>" # Primary key fingerprint: 9957 4B4D 3474 90E7 9D98 D624 BDBE 7B27 C0DE 3057 * remotes/cody/tags/block-pull-request: block/curl: fix minor memory leaks block/curl: check error return of curl_global_init() block/sheepdog: code beautification block/sheepdog: remove spurious NULL check blockjob: kick jobs on set-speed backup: use copy_bitmap in incremental backup backup: simplify non-dirty bits progress processing backup: init copy_bitmap from sync_bitmap for incremental backup: move from done_bitmap to copy_bitmap hbitmap: add next_zero function Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-12-19 17:44:42 +00:00
Stefan Hajnoczi	78f1d3d6a6	coroutine: simplify co_aio_sleep_ns() prototype The AioContext pointer argument to co_aio_sleep_ns() is only used for the sleep timer. It does not affect where the caller coroutine is resumed. Due to changes to coroutine and AIO APIs it is now possible to drop the AioContext pointer argument. This is safe to do since no caller has specific requirements for which AioContext the timer must run in. This patch drops the AioContext pointer argument and renames the function to simplify the API. Reported-by: Paolo Bonzini <pbonzini@redhat.com> Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20171109102652.6360-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-12-19 09:25:27 +00:00
Jeff Cody	996922de45	block/curl: fix minor memory leaks Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-12-18 15:44:39 -05:00
Jeff Cody	2d25964d18	block/curl: check error return of curl_global_init() If curl_global_init() fails, per the documentation no other curl functions may be called, so make sure to check the return value. Also, some minor changes to the initialization latch variable 'inited': - Make it static in the file, for clarity - Change the name for clarity - Make it a bool Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-12-18 15:42:07 -05:00
Jeff Cody	d507c5f682	block/sheepdog: code beautification No functional changes, just whitespace manipulation. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-12-18 15:41:44 -05:00
Jeff Cody	ac90dad94b	block/sheepdog: remove spurious NULL check 'tag' is already checked in the lines immediately preceding this check, and set to non-NULL if NULL. No need to check again, it hasn't changed. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-12-18 15:41:17 -05:00
Vladimir Sementsov-Ogievskiy	53f1c8794f	backup: use copy_bitmap in incremental backup We can use copy_bitmap instead of sync_bitmap. copy_bitmap is initialized from sync_bitmap and it is more informative: we will not try to process data, that is already in progress (by write notifier). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20171012135313.227864-6-vsementsov@virtuozzo.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-12-18 10:54:13 -05:00
Vladimir Sementsov-Ogievskiy	085bd08e6f	backup: simplify non-dirty bits progress processing Set fake progress for non-dirty clusters in copy_bitmap initialization, to. It simplifies code and allows further refactoring. This patch changes user's view of backup progress, but formally it doesn't changed: progress hops are just moved to the beginning. Actually it's just a point of view: when do we actually skip clusters? We can say in the very beginning, that we skip these clusters and do not think about them later. Of course, if go through disk sequentially, it's logical to say, that we skip clusters between copied portions to the left and to the right of them. But even now copying progress is not sequential because of write notifiers. Future patches will introduce new backup architecture which will do copying in several coroutines in parallel, so it will make no sense to publish fake progress by parts in parallel with other copying requests. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20171012135313.227864-5-vsementsov@virtuozzo.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-12-18 10:54:13 -05:00
Vladimir Sementsov-Ogievskiy	8cc6dc6215	backup: init copy_bitmap from sync_bitmap for incremental We should not copy non-dirty clusters in write notifiers. So, initialize copy_bitmap from sync_bitmap. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20171012135313.227864-4-vsementsov@virtuozzo.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-12-18 10:54:13 -05:00
Vladimir Sementsov-Ogievskiy	a193b0f0a8	backup: move from done_bitmap to copy_bitmap Use HBitmap copy_bitmap instead of done_bitmap. This is needed to improve incremental backup in following patches and to unify backup loop for full/incremental modes in future patches. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20171012135313.227864-3-vsementsov@virtuozzo.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-12-18 10:54:13 -05:00
Vladimir Sementsov-Ogievskiy	56207df55e	hbitmap: add next_zero function The function searches for next zero bit. Also add interface for BdrvDirtyBitmap and unit test. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20171012135313.227864-2-vsementsov@virtuozzo.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-12-18 10:54:13 -05:00
Philippe Mathieu-Daudé	7d98febd67	block: remove "qemu/osdep.h" from header file applied using ./scripts/clean-includes Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-12-18 17:07:02 +03:00
Peter Lieven	f1a7ff770f	block/nfs: fix nfs_client_open for filesize greater than 1TB DIV_ROUND_UP(st.st_size, BDRV_SECTOR_SIZE) was overflowing ret (int) if st.st_size is greater than 1TB. Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Message-id: 1511798407-31129-1-git-send-email-pl@kamp.de Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-29 15:28:15 +01:00
Paolo Bonzini	5bf1d5a73a	blockjob: remove clock argument from block_job_sleep_ns All callers are using QEMU_CLOCK_REALTIME, and it will not be possible to support more than one clock when block_job_sleep_ns switches to a single timer stored in the BlockJob struct. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Tested-By: Jeff Cody <jcody@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-11-29 15:11:02 +01:00
Kevin Wolf	02d213009d	block: Expect graph changes in bdrv_parent_drained_begin/end The .drained_begin/end callbacks can (directly or indirectly via aio_poll()) cause block nodes to be removed or the current BdrvChild to point to a different child node. Use QLIST_FOREACH_SAFE() to make sure we don't access invalid BlockDriverStates or accidentally continue iterating the parents of the new child node instead of the node we actually came from. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-11-29 14:22:03 +01:00
Kevin Wolf	70a5afedd6	block: Error out on load_vm with active dirty bitmaps Loading a snapshot invalidates the bitmap. Just marking all blocks dirty is not a useful response in practice, instead the user needs to be aware that we switch to a completely different state. If they are okay with losing the dirty bitmap, they can just explicitly delete it. This effectively reverts commit `04dec3c3ae`. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com>	2017-11-21 14:48:23 +01:00
Kevin Wolf	2b624fe079	block: Add errp to bdrv_all_goto_snapshot() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com>	2017-11-21 14:48:22 +01:00
Kevin Wolf	0b62bcbc61	block: Add errp to bdrv_snapshot_goto() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com>	2017-11-21 14:48:22 +01:00
Kevin Wolf	1f4ad7d3b8	block: Don't request I/O permission with BDRV_O_NO_IO 'qemu-img info' makes sense even when BLK_PERM_CONSISTENT_READ cannot be granted because of a block job in a running qemu process. It already sets BDRV_O_NO_IO to indicate that it doesn't access the guest visible data at all. Check the BDRV_O_NO_IO flags in blk_new_open(), so that I/O related permissions are not unnecessarily requested and 'qemu-img info' can work even if BLK_PERM_CONSISTENT_READ cannot be granted. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2017-11-21 14:48:22 +01:00
Peter Maydell	2e02083438	Block layer patches for 2.11.0-rc2 -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJaDyNMAAoJEH8JsnLIjy/WP+oP/1G0r1b9PBtrEc+B9QG0sMh1 qivbMRmNUFhuzYW8VBzb2Cpsl/lpe/VlAumf75RUACOh4ZfxkOhaYTMB5x9FHvoa 5w35UC382zuXTMfeF4JufF5rVJdB+bB9THJOJ7cuaDQSHZNPo8uMmHrk4J5p5uuU IrOZLfqgQaWGdobGe6FPWy67z2PRWrzEI/Ogr8zwpa95GJhX3fY1CmGvt2MUBG+j cwdYcuX9pawFfjasCkfwYMzY7Uc9wvjIJYSe+WvrDrpjInISgxMOvXylzdWkpsGS rHnsLgRXnUE4BmOavzq+uvlo14hlrmdVn5UYlUkbJcx+Bkp7MZGuNjX6doimCdWg PpbQB3TIi7ce4v9cYWZSrzDDo6E/d1Evi3xMG0ZouU0ff1DPFIilm8lxb3xYia+b ItTLZ7yXWLijGhr4jTeGVhn0+zENiczyiQsL1sDdj83VnpAOGPNZTwtnNGug8ATJ VELwA7jlVEL47HYovQoUIFs0NA5xCbOQm5avS5gk44PB/dNkYGKdHhwgQUV/R36A lRNj7Vh7hwHSyHO/gKi3sqXTbNG/fTDSy1Nu1xc0pAROIKpc6AanP0mFn2DlZS35 kyZbxMbuGUsDQ2j0pcGuP60O7+2CY4jlXRCXo6vaHzP3xDfOTx2L9UYVfk/dwNJu uYbnbXWv06igjm3Y/2Vv =ClZj -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches for 2.11.0-rc2 # gpg: Signature made Fri 17 Nov 2017 17:58:36 GMT # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (25 commits) iotests: Make 087 pass without AIO enabled block: Make bdrv_next() keep strong references qcow2: Fix overly broad madvise() qcow2: Refuse to get unaligned offsets from cache qcow2: Add bounds check to get_refblock_offset() block: Guard against NULL bs->drv qcow2: Unaligned zero cluster in handle_alloc() qcow2: check_errors are fatal qcow2: reject unaligned offsets in write compressed iotests: Add test for failing qemu-img commit tests: Add check-qobject for equality tests iotests: Add test for non-string option reopening block: qobject_is_equal() in bdrv_reopen_prepare() qapi: Add qobject_is_equal() qapi/qlist: Add qlist_append_null() macro qapi/qnull: Add own header qcow2: fix image corruption on commit with persistent bitmap iotests: test clearing unknown autoclear_features by qcow2 block: Fix permissions in image activation qcow2: fix image corruption after committing qcow2 image into base ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-11-17 19:08:07 +00:00
Max Reitz	5e003f17ec	block: Make bdrv_next() keep strong references On one hand, it is a good idea for bdrv_next() to return a strong reference because ideally nearly every pointer should be refcounted. This fixes intermittent failure of iotest 194. On the other, it is absolutely necessary for bdrv_next() itself to keep a strong reference to both the BB (in its first phase) and the BDS (at least in the second phase) because when called the next time, it will dereference those objects to get a link to the next one. Therefore, it needs these objects to stay around until then. Just storing the pointer to the next in the iterator is not really viable because that pointer might become invalid as well. Both arguments taken together means we should probably just invoke bdrv_ref() and blk_ref() in bdrv_next(). This means we have to assert that bdrv_next() is always called from the main loop, but that was probably necessary already before this patch and judging from the callers, it also looks to actually be the case. Keeping these strong references means however that callers need to give them up if they decide to abort the iteration early. They can do so through the new bdrv_next_cleanup() function. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20171110172545.32609-1-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-17 18:21:31 +01:00
Max Reitz	08546bcfb2	qcow2: Fix overly broad madvise() @mem_size and @offset are both size_t, thus subtracting them from one another will just return a big size_t if mem_size < offset -- even more obvious here because the result is stored in another size_t. Checking that result to be positive is therefore not sufficient to exclude the case that offset > mem_size. Thus, we currently sometimes issue an madvise() over a very large address range. This is triggered by iotest 163, but with -m64, this does not result in tangible problems. But with -m32, this test produces three segfaults, all of which are fixed by this patch. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20171114184127.24238-1-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-17 18:21:31 +01:00
Max Reitz	4efb1f7c61	qcow2: Refuse to get unaligned offsets from cache Instead of using an assertion, it is better to emit a corruption event here. Checking all offsets for correct alignment can be tedious and it is easily possible to forget to do so. qcow2_cache_do_get() is a function every L2 and refblock access has to go through, so this is a good central point to add such a check. And for good measure, let us also add an assertion that the offset is non-zero. Making this a corruption event is not feasible, because a zero offset usually means something special (such as the cluster is unused), so all callers should be checking this anyway. If they do not, it is their fault, hence the assertion here. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20171110203111.7666-6-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-17 18:21:31 +01:00
Max Reitz	23482f8a60	qcow2: Add bounds check to get_refblock_offset() Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com> Buglink: https://bugs.launchpad.net/qemu/+bug/1728661 Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20171110203111.7666-5-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-17 18:21:31 +01:00
Max Reitz	d470ad42ac	block: Guard against NULL bs->drv We currently do not guard everywhere against a NULL bs->drv where we should be doing so. Most of the places fixed here just do not care about that case at all. Some care implicitly, e.g. through a prior function call to bdrv_getlength() which would always fail for an ejected BDS. Add an assert there to make it more obvious. Other places seem to care, but do so insufficiently: Freeing clusters in a qcow2 image is an error-free operation, but it may leave the image in an unusable state anyway. Giving qcow2_free_clusters() an error code is not really viable, it is much easier to note that bs->drv may be NULL even after a successful driver call. This concerns bdrv_co_flush(), and the way the check is added to bdrv_co_pdiscard() (in every iteration instead of only once). Finally, some places employ at least an assert(bs->drv); somewhere, that may be reasonable (such as in the reopen code), but in bdrv_has_zero_init(), it is definitely not. Returning 0 there in case of an ejected BDS saves us much headache instead. Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com> Buglink: https://bugs.launchpad.net/qemu/+bug/1728660 Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20171110203111.7666-4-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-17 18:21:31 +01:00
Max Reitz	93bbaf03ff	qcow2: Unaligned zero cluster in handle_alloc() We should check whether the cluster offset we are about to use is actually valid; that is, whether it is aligned to cluster boundaries. Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com> Buglink: https://bugs.launchpad.net/qemu/+bug/1728643 Buglink: https://bugs.launchpad.net/qemu/+bug/1728657 Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20171110203111.7666-3-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-17 18:21:30 +01:00
Max Reitz	791fff504c	qcow2: check_errors are fatal When trying to repair a dirty image, qcow2_check() may apparently succeed (no really fatal error occurred that would prevent the check from continuing), but if check_errors in the result object is non-zero, we cannot trust the image to be usable. Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com> Buglink: https://bugs.launchpad.net/qemu/+bug/1728639 Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20171110203111.7666-2-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-17 18:21:30 +01:00
Anton Nefedov	3e3b838ffe	qcow2: reject unaligned offsets in write compressed Misaligned compressed write is not supported. Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Message-id: 1510654613-47868-2-git-send-email-anton.nefedov@virtuozzo.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-17 18:21:30 +01:00
Eric Blake	4096974e18	qcow2: fix image corruption on commit with persistent bitmap If an image contains persistent bitmaps, we cannot use the fast path of bdrv_make_empty() to clear the image during qemu-img commit, because that will lose the clusters related to the bitmaps. Also leave a comment in qcow2_read_extensions to remind future feature additions to think about fast-path removal, since we just barely fixed the same bug for LUKS encryption. It's a pain that qemu-img has not yet been taught to manipulate, or even at a very minimum display, information about persistent bitmaps; instead, we have to use QMP commands. It's also a pain that only qeury-block and x-debug-block-dirty-bitmap-sha256 will allow bitmap introspection; but the former requires the node to be hooked to a block device, and the latter is experimental. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-11-17 18:21:01 +01:00
Eric Blake	08ace1d753	nbd: Don't crash when server reports NBD_CMD_READ failure If a server fails a read, for example with EIO, but the connection is still live, then we would crash trying to print a non-existent error message in nbd_client_co_preadv(). For consistency, also change the error printout in nbd_read_reply_entry(), although that instance does not crash. Bug introduced in commit `f140e300`. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171112013936.5942-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2017-11-17 08:02:45 -06:00
Daniel P. Berrange	f06033295b	qcow2: fix image corruption after committing qcow2 image into base After committing the qcow2 image contents into the base image, qemu-img will call bdrv_make_empty to drop the payload in the layered image. When this is done for qcow2 images, it blows away the LUKS encryption header, making the resulting image unusable. There are two codepaths for emptying a qcow2 image, and the second (slower) codepath leaves the LUKS header intact, so force use of that codepath. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-11-17 13:36:03 +01:00
Kevin Wolf	398e6ad014	block: Deprecate bdrv_set_read_only() and users bdrv_set_read_only() is used by some block drivers to override the read-only option given by the user. This is not how read-only images generally work in QEMU: Instead of second guessing what the user really meant (which currently includes making an image read-only even if the user didn't only use the default, but explicitly said read-only=off), we should error out if we can't provide what the user requested. This adds deprecation warnings to all callers of bdrv_set_read_only() so that the behaviour can be corrected after the usual deprecation period. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-11-17 13:35:59 +01:00
Daniel P. Berrange	f66afbe26f	qcow2: don't permit changing encryption parameters Currently if trying to change encryption parameters on a qcow2 image, qemu-img will abort. We already explicitly check for attempt to change encrypt.format but missed other parameters like encrypt.key-secret. Rather than list each parameter, just blacklist changing of all parameters with a 'encrypt.' prefix. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-11-17 13:35:59 +01:00
Wang Guang	611e0653ad	replication: Fix replication open fail replication_child_perm request write permissions for all child which will lead bdrv_check_perm fail. replication_child_perm() should request write permissions only if it is writable itself. Signed-off-by: Wang Guang <wang.guang55@zte.com.cn> Signed-off-by: Wang Yong <wang.yong155@zte.com.cn> Reviewed-by: Xie Changlong <xiechanglong@cmss.chinamobile.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-11-17 13:35:59 +01:00
Stefan Hajnoczi	341e0b5658	throttle-groups: forget timer and schedule next TGM on detach tg->any_timer_armed[] must be cleared when detaching pending timers from the AioContext. Failure to do so leads to hung I/O because it looks like there are still timers pending when in fact they have been removed. Other ThrottleGroupMembers might have requests pending too so it's necessary to schedule the next TGM so it can set a timer. This patch fixes hung I/O when QEMU is launched with drives that are in the same throttling group: (guest)$ dd if=/dev/zero of=/dev/vdb oflag=direct bs=512 & (guest)$ dd if=/dev/zero of=/dev/vdc oflag=direct bs=512 & (qemu) stop (qemu) cont ...I/O is stuck... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20171116112150.27607-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-11-16 14:12:57 +00:00
Jeff Cody	1d0f37cf21	block/parallels: add migration blocker Migration does not work for parallels, and has been broken for a while (see patch 'block/parallels: Do not update header or truncate image when INMIGRATE'). The bdrv_invalidate_cache() method needs to be added for migration to be supported. Until this is done, prohibit migration. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 5e04a7c8a3089913fa58d484af42dab7993984ad.1510059970.git.jcody@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-14 18:06:26 +01:00
Jeff Cody	6c7d390b99	block/parallels: Do not update header or truncate image when INMIGRATE If we write or modify the image file while the QEMU run state is INMIGRATE, then the BDRV_O_INACTIVE BDS flag is set. This will cause an assert, since the image is marked inactive. Make sure we obey this flag. Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Jeff Cody <jcody@redhat.com> Message-id: 3996c930fa8cde8570b7a63032720d76a28fd78b.1510059970.git.jcody@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-14 18:06:25 +01:00
Jeff Cody	7479bf07c4	block/vhdx.c: Don't blindly update the header The VHDX specification requires that before user data modification of the vhdx image, the VHDX header file and data GUIDs need to be updated. In vhdx_open(), if the image is set to RDWR, we go ahead and update the header. However, just because the image is set to RDWR does not mean we can go ahead and write at this point - specifically, if the QEMU run state is INMIGRATE, the underlying file BS may be set to inactive via the BDS open flag of BDRV_O_INACTIVE. Attempting to write under this condition will cause an assert in bdrv_co_pwritev(). We can alternatively latch the first time the image is written. And lo and behold, we do just that, via vhdx_user_visible_write() in vhdx_co_writev(). This means the call to vhdx_update_headers() in vhdx_open() is likely just vestigial, and can be removed. Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Jeff Cody <jcody@redhat.com> Message-id: 659e4cdba6ef4c651737852777c8c93d27b38040.1510059970.git.jcody@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-14 18:06:25 +01:00
Vladimir Sementsov-Ogievskiy	04dec3c3ae	block/snapshot: dirty all dirty bitmaps on snapshot-switch Snapshot-switch actually changes active state of disk so it should reflect on dirty bitmaps. Otherwise next incremental backup using these bitmaps will be invalid. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20171023092945.54532-1-vsementsov@virtuozzo.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-14 18:06:25 +01:00
Alberto Garcia	c9b83e9c23	qcow2: Assert that the crypto header does not overlap other metadata The crypto header is initialized only when QEMU is creating a new image, so there's no chance of this happening on a corrupted image. If QEMU is really trying to allocate the header overlapping other existing metadata sections then this is a serious bug in QEMU itself so let's add an assertion. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: ae3d77f312fc0c5e0ac2bbd71676c0112eebe2e5.1509718618.git.berto@igalia.com Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-14 18:06:25 +01:00
Alberto Garcia	951053a9ec	qcow2: Don't open images with header.refcount_table_clusters == 0 qcow2_do_open() is checking that header.refcount_table_clusters is not too large, but it doesn't check that it's greater than zero. Apart from the fact that an image like that is obviously corrupted, trying to use it crashes QEMU since we end up with a null s->refcount_table after qcow2_refcount_init(). These images can however be repaired, so allow opening them if the BDRV_O_CHECK flag is set. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: f9750f50c80359babba11062e88f5075a47e8e16.1509718618.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-14 18:06:25 +01:00
Alberto Garcia	8aa34834d5	qcow2: Prevent allocating compressed clusters at offset 0 If the refcount data is corrupted then we can end up trying to allocate a new compressed cluster at offset 0 in the image, triggering an assertion in qcow2_alloc_bytes() that would crash QEMU: qcow2_alloc_bytes: Assertion `offset' failed. This patch adds an explicit check for this scenario and a new test case. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: fb53467cf48e95ff3330def1cf1003a5b862b7d9.1509718618.git.berto@igalia.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-14 18:06:25 +01:00
Alberto Garcia	9883975050	qcow2: Prevent allocating L2 tables at offset 0 If the refcount data is corrupted then we can end up trying to allocate a new L2 table at offset 0 in the image, triggering an assertion in the qcow2 cache that would crash QEMU: qcow2_cache_entry_mark_dirty: Assertion `c->entries[i].offset != 0' failed This patch adds an explicit check for this scenario and a new test case. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 92dac37191ae7844a2da22c122204eb493cc3133.1509718618.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-14 18:06:25 +01:00
Alberto Garcia	6bf45d59f9	qcow2: Prevent allocating refcount blocks at offset 0 Each entry in the qcow2 cache contains an offset field indicating the location of the data in the qcow2 image. If the offset is 0 then it means that the entry contains no data and is available to be used when needed. Because of that it is not possible to store in the cache the first cluster of the qcow2 image (offset = 0). This is not a problem because that cluster always contains the qcow2 header and we're not using this cache for that. However, if the qcow2 image is corrupted it can happen that we try to allocate a new refcount block at offset 0, triggering this assertion and crashing QEMU: qcow2_cache_entry_mark_dirty: Assertion `c->entries[i].offset != 0' failed This patch adds an explicit check for this scenario and a new test case. This problem was originally reported here: https://bugs.launchpad.net/qemu/+bug/1728615 Reported-by: R.Nageswara Sastry <nasastry@in.ibm.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 92a2fadd10d58b423f269c1d1a309af161cdc73f.1509718618.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-14 18:06:25 +01:00
Peter Maydell	191b5fbfa6	Pull request The following disk I/O throttling fixes solve recent bugs. -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJaCsdYAAoJEJykq7OBq3PIZdMH/10xOuxOjvjxlJNkQquhrAmD y9dVj0jEtopdter/XR7ZCsww1UgxpIt8K43Dk1yWTKrm2bNN1v3cqemJV+UUTLFl LppKxt5Cm1JRKaCfN0hSwOp5pFJumzH6creVdQMQ3VNCSSw6xfV94pupaVE8at6D n4r3ZDF03ARETMJW7HY7QIFi1YVcfmi4wrx8rfhEGLZu06nHrtFQsDdH7SeErgXi wJh+ksji4EvX2xc54nhprCsc9HdzbfeBEYx6tdD0Uh3xm7xXd2oka5Rac74WuqYu B4aKwyFbvKZ0DYnENiOCkemTN51s+0GHLz43T92/DmQhJrBy8EU4TTCn73vgmto= =KnUT -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging Pull request The following disk I/O throttling fixes solve recent bugs. # gpg: Signature made Tue 14 Nov 2017 10:37:12 GMT # gpg: using RSA key 0x9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: qemu-iotests: Test I/O limits with removable media block: Leave valid throttle timers when removing a BDS from a backend block: Check for inserted BlockDriverState in blk_io_limits_disable() throttle-groups: drain before detaching ThrottleState block: all I/O should be completed before removing throttle timers. Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-11-14 16:11:19 +00:00
Alberto Garcia	c89bcf3af0	block: Leave valid throttle timers when removing a BDS from a backend If a BlockBackend has I/O limits set then its ThrottleGroupMember structure uses the AioContext from its attached BlockDriverState. Those two contexts must be kept in sync manually. This is not ideal and will be fixed in the future by removing the throttling configuration from the BlockBackend and storing it in an implicit filter node instead, but for now we have to live with this. When you remove the BlockDriverState from the backend then the throttle timers are destroyed. If a new BlockDriverState is later inserted then they are created again using the new AioContext. There are a couple of problems with this: a) The code manipulates the timers directly, leaving the ThrottleGroupMember.aio_context field in an inconsisent state. b) If you remove the I/O limits (e.g by destroying the backend) when the timers are gone then throttle_group_unregister_tgm() will attempt to destroy them again, crashing QEMU. While b) could be fixed easily by allowing the timers to be freed twice, this would result in a situation in which we can no longer guarantee that a valid ThrottleState has a valid AioContext and timers. This patch ensures that the timers and AioContext are always valid when I/O limits are set, regardless of whether the BlockBackend has a BlockDriverState inserted or not. [Fixed "There'a" typo as suggested by Max Reitz <mreitz@redhat.com> --Stefan] Reported-by: sochin jiang <sochin.jiang@huawei.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: e089c66e7c20289b046d782cea4373b765c5bc1d.1510339534.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-11-13 15:43:49 +00:00
Alberto Garcia	48bf7ea81a	block: Check for inserted BlockDriverState in blk_io_limits_disable() When you set I/O limits using block_set_io_throttle or the command line throttling.* options they are kept in the BlockBackend regardless of whether a BlockDriverState is attached to the backend or not. Therefore when removing the limits using blk_io_limits_disable() we need to check if there's a BDS before attempting to drain it, else it will crash QEMU. This can be reproduced very easily using HMP: (qemu) drive_add 0 if=none,throttling.iops-total=5000 (qemu) drive_del none0 Reported-by: sochin jiang <sochin.jiang@huawei.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 0d3a67ce8d948bb33e08672564714dcfb76a3d8c.1510339534.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-11-13 14:38:46 +00:00
Stefan Hajnoczi	dc868fb03b	throttle-groups: drain before detaching ThrottleState I/O requests hang after stop/cont commands at least since QEMU 2.10.0 with -drive iops=100: (guest)$ dd if=/dev/zero of=/dev/vdb oflag=direct count=1000 (qemu) stop (qemu) cont ...I/O is stuck... This happens because blk_set_aio_context() detaches the ThrottleState while requests may still be in flight: if (tgm->throttle_state) { throttle_group_detach_aio_context(tgm); throttle_group_attach_aio_context(tgm, new_context); } This patch encloses the detach/attach calls in a drained region so no I/O request is left hanging. Also add assertions so we don't make the same mistake again in the future. Reported-by: Yongxue Hong <yhong@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20171110151934.16883-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-11-13 14:02:09 +00:00
Zhengui	632a773543	block: all I/O should be completed before removing throttle timers. In blk_remove_bs, all I/O should be completed before removing throttle timers. If there has inflight I/O, removing throttle timers here will cause the inflight I/O never return. This patch add bdrv_drained_begin before throttle_timers_detach_aio_context to let all I/O completed before removing throttle timers. [Moved declaration of bs as suggested by Alberto Garcia <berto@igalia.com>. --Stefan] Signed-off-by: Zhengui <lizhengui@huawei.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1508564040-120700-1-git-send-email-lizhengui@huawei.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-11-13 14:02:05 +00:00
Eric Blake	b4176cb314	nbd-client: Stricter enforcing of structured reply spec Ensure that the server is not sending unexpected chunk lengths for either the NONE or the OFFSET_DATA chunk, nor unexpected hole length for OFFSET_HOLE. This will flag any server as broken that responds to a zero-length read with an OFFSET_DATA (what our server currently does, but that's about to be fixed) or with OFFSET_HOLE, even though we previously fixed our client to never be able to send such a request over the wire. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171108215703.9295-7-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2017-11-09 10:22:26 -06:00
Eric Blake	9d8f818cde	nbd-client: Short-circuit 0-length operations The NBD spec was recently clarified to state that clients should not send 0-length requests to the server, as the server behavior is undefined [1]. We know that qemu-nbd's behavior is a successful no-op (once it has filtered for read-only exports), but other NBD implementations might return an error. To avoid any questionable server implementations, it is better to just short-circuit such requests on the client side (we are relying on the block layer to already filter out requests such as invalid offset, write to a read-only volume, and so forth); do the short-circuit as late as possible to still benefit from protections from assertions that the block layer is not violating our assumptions. [1] https://github.com/NetworkBlockDevice/nbd/commit/ee926037 Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171108215703.9295-6-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2017-11-09 10:18:31 -06:00
Eric Blake	1104d83c72	nbd-client: Refuse read-only client with BDRV_O_RDWR The NBD spec says that clients should not try to write/trim to an export advertised as read-only by the server. But we failed to check that, and would allow the block layer to use NBD with BDRV_O_RDWR even when the server is read-only, which meant we were depending on the server sending a proper EPERM failure for various commands, and also exposes a leaky abstraction: using qemu-io in read-write mode would succeed on 'w -z 0 0' because of local short-circuiting logic, but 'w 0 0' would send a request over the wire (where it then depends on the server, and fails at least for qemu-nbd but might pass for other NBD implementations). With this patch, a client MUST request read-only mode to access a server that is doing a read-only export, or else it will get a message like: can't open device nbd://localhost:10809/foo: request for write access conflicts with read-only export It is no longer possible to even attempt writes over the wire (including the corner case of 0-length writes), because the block layer enforces the explicit read-only request; this matches the behavior of qcow2 when backed by a read-only POSIX file. Fix several iotests to comply with the new behavior (since qemu-nbd of an internal snapshot, as well as nbd-server-add over QMP, default to a read-only export, we must tell blockdev-add/qemu-io to set up a read-only client). CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171108215703.9295-3-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2017-11-09 10:10:17 -06:00
Eric Blake	e659fb3b99	nbd-client: Fix error message typos Provide missing spaces that are required when using string concatenation to break error messages across source lines. Introduced in commit `f140e300`. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171108215703.9295-2-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2017-11-09 10:09:38 -06:00
Vladimir Sementsov-Ogievskiy	f140e30003	nbd: Minimal structured read for client Minimal implementation: for structured error only error_report error message. Note that test 83 is now more verbose, because the implementation prints more warnings about unexpected communication errors; perhaps future patches should tone things down by using trace messages instead of traces, but the common case of successful communication is no noisier than before. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171027104037.8319-13-eblake@redhat.com>	2017-10-30 21:48:41 +01:00
Vladimir Sementsov-Ogievskiy	d2febedb45	nbd/client: prepare nbd_receive_reply for structured reply In following patch nbd_receive_reply will be used both for simple and structured reply header receiving. NBDReply is altered into union of simple reply header and structured reply chunk header, simple error translation moved to block/nbd-client to be consistent with further structured reply error translation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171027104037.8319-11-eblake@redhat.com>	2017-10-30 21:48:32 +01:00
Max Reitz	572b07bea1	qcow2: Always execute preallocate() in a coroutine Some qcow2 functions (at least perform_cow()) expect s->lock to be taken. Therefore, if we want to make use of them, we should execute preallocate() (as "preallocate_co") in a coroutine so that we can use the qemu_co_mutex_* functions. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20171009215533.12530-3-mreitz@redhat.com Cc: qemu-stable@nongnu.org Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-10-26 15:01:14 +02:00
Max Reitz	e400ad1e1f	qcow2: Fix unaligned preallocated truncation A qcow2 image file's length is not required to have a length that is a multiple of the cluster size. However, qcow2_refcount_area() expects an aligned value for its @start_offset parameter, so we need to round @old_file_size up to the next cluster boundary. Reported-by: Ping Li <pingl@redhat.com> Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1414049 Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20171009215533.12530-2-mreitz@redhat.com Cc: qemu-stable@nongnu.org Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-10-26 15:01:14 +02:00
Max Reitz	233521b199	qcow2: Emit errp when truncating the image tail bdrv_truncate() has an errp parameter which is always set when an error occurs. Let's use that instead of a plain strerror(). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20171009155431.14093-1-mreitz@redhat.com Reviewed-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-10-26 15:01:14 +02:00
Alberto Garcia	a35f87f50d	qcow2: Use BDRV_SECTOR_BITS instead of its literal value BDRV_SECTOR_BITS is defined to be 9 in block.h (and BDRV_SECTOR_SIZE is calculated from that), but there are still a couple of places where we are using the literal value instead of the macro. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 20171009153856.20387-1-berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-10-26 15:01:13 +02:00
Eric Blake	8cbf74b23c	qcow2: Reduce is_zero() rounding Now that bdrv_is_allocated accepts non-aligned inputs, we can remove the TODO added in earlier refactoring. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-26 14:45:57 +02:00
Eric Blake	88e63df214	block: Reduce bdrv_aligned_preadv() rounding Now that bdrv_is_allocated accepts non-aligned inputs, we can remove the TODO added in commit `d6a644bb`. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-26 14:45:57 +02:00
Eric Blake	efa6e2ed64	block: Align block status requests Any device that has request_alignment greater than 512 should be unable to report status at a finer granularity; it may also be simpler for such devices to be guaranteed that the block layer has rounded things out to the granularity boundary (the way the block layer already rounds all other I/O out). Besides, getting the code correct for super-sector alignment also benefits us for the fact that our public interface now has byte granularity, even though none of our drivers have byte-level callbacks. Add an assertion in blkdebug that proves that the block layer never requests status of unaligned sections, similar to what it does on other requests (while still keeping the generic helper in place for when future patches add a throttle driver). Note that iotest 177 already covers this (it would fail if you use just the blkdebug.c hunk without the io.c changes). Meanwhile, we can drop assertions in callers that no longer have to pass in sector-aligned addresses. There is a mid-function scope added for 'count' and 'longret', for a couple of reasons: first, an upcoming patch will add an 'if' statement that checks whether a driver has an old- or new-style callback, and can conveniently use the same scope for less indentation churn at that time. Second, since we are trying to get rid of sector-based computations, wrapping things in a scope makes it easier to group and see what will be deleted in a final cleanup patch once all drivers have been converted to the new-style callback. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-26 14:45:57 +02:00
Eric Blake	3182664220	block: Convert bdrv_get_block_status_above() to bytes We are gradually moving away from sector-based interfaces, towards byte-based. In the common case, allocation is unlikely to ever use values that are not naturally sector-aligned, but it is possible that byte-based values will let us be more precise about allocation at the end of an unaligned file that can do byte-based access. Changing the name of the function from bdrv_get_block_status_above() to bdrv_block_status_above() ensures that the compiler enforces that all callers are updated. Likewise, since it a byte interface allows an offset mapping that might not be sector aligned, split the mapping out of the return value and into a pass-by-reference parameter. For now, the io.c layer still assert()s that all uses are sector-aligned, but that can be relaxed when a later patch implements byte-based block status in the drivers. For the most part this patch is just the addition of scaling at the callers followed by inverse scaling at bdrv_block_status(), plus updates for the new split return interface. But some code, particularly bdrv_block_status(), gets a lot simpler because it no longer has to mess with sectors. Likewise, mirror code no longer computes s->granularity >> BDRV_SECTOR_BITS, and can therefore drop an assertion about alignment because the loop no longer depends on alignment (never mind that we don't really have a driver that reports sub-sector alignments, so it's not really possible to test the effect of sub-sector mirroring). Fix a neighboring assertion to use is_power_of_2 while there. For ease of review, bdrv_get_block_status() was tackled separately. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-26 14:45:57 +02:00
Eric Blake	5b648c67e3	block: Switch bdrv_co_get_block_status_above() to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Convert another internal type (no semantic change), and rename it to match the corresponding public function rename. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-26 14:45:57 +02:00
Eric Blake	7ddb99b9dc	block: Switch bdrv_common_block_status_above() to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Convert another internal function (no semantic change). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-26 14:45:57 +02:00
Eric Blake	4bcd936e47	block: Switch BdrvCoGetBlockStatusData to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Convert another internal type (no semantic change), and rename it to match the corresponding public function rename. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-26 14:45:57 +02:00
Eric Blake	2e8bc7874b	block: Switch bdrv_co_get_block_status() to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Convert another internal function (no semantic change); and as with its public counterpart, rename to bdrv_co_block_status() and split the offset return, to make the compiler enforce that we catch all uses. For now, we assert that callers and the return value still use aligned data, but ultimately, this will be the function where we hand off to a byte-based driver callback, and will eventually need to add logic to ensure we round calls according to the driver's request_alignment then touch up the result handed back to the caller, to start permitting a caller to pass unaligned offsets. Note that we are now prepared to accepts 'bytes' larger than INT_MAX; this is okay as long as we clamp things internally before violating any 32-bit limits, and makes no difference to how a client will use the information (clients looping over the entire file must already be prepared for consecutive calls to return the same status, as drivers are already free to return shorter-than-maximal status due to any other convenient split points, such as when the L2 table crosses cluster boundaries in qcow2). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-26 14:45:57 +02:00
Eric Blake	237d78f8fc	block: Convert bdrv_get_block_status() to bytes We are gradually moving away from sector-based interfaces, towards byte-based. In the common case, allocation is unlikely to ever use values that are not naturally sector-aligned, but it is possible that byte-based values will let us be more precise about allocation at the end of an unaligned file that can do byte-based access. Changing the name of the function from bdrv_get_block_status() to bdrv_block_status() ensures that the compiler enforces that all callers are updated. For now, the io.c layer still assert()s that all callers are sector-aligned, but that can be relaxed when a later patch implements byte-based block status in the drivers. There was an inherent limitation in returning the offset via the return value: we only have room for BDRV_BLOCK_OFFSET_MASK bits, which means an offset can only be mapped for sector-aligned queries (or, if we declare that non-aligned input is at the same relative position modulo 512 of the answer), so the new interface also changes things to return the offset via output through a parameter by reference rather than mashed into the return value. We'll have some glue code that munges between the two styles until we finish converting all uses. For the most part this patch is just the addition of scaling at the callers followed by inverse scaling at bdrv_block_status(), coupled with the tweak in calling convention. But some code, particularly bdrv_is_allocated(), gets a lot simpler because it no longer has to mess with sectors. For ease of review, bdrv_get_block_status_above() will be tackled separately. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-26 14:45:57 +02:00
Eric Blake	7286d6106f	block: Switch bdrv_make_zero() to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Change the internal loop iteration of zeroing a device to track by bytes instead of sectors (although we are still guaranteed that we iterate by steps that are sector-aligned). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-26 14:45:57 +02:00
Eric Blake	f06f6b66c7	qcow2: Switch is_zero_sectors() to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Convert another internal function (no semantic change), and rename it to is_zero() in the process. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-26 14:45:57 +02:00
Eric Blake	7cfd527525	block: Make bdrv_round_to_clusters() signature more useful In the process of converting sector-based interfaces to bytes, I'm finding it easier to represent a byte count as a 64-bit integer at the block layer (even if we are internally capped by SIZE_MAX or even INT_MAX for individual transactions, it's still nicer to not have to worry about truncation/overflow issues on as many variables). Update the signature of bdrv_round_to_clusters() to uniformly use int64_t, matching the signature already chosen for bdrv_is_allocated and the fact that off_t is also a signed type, then adjust clients according to the required fallout (even where the result could now exceed 32 bits, no client is directly assigning the result into a 32-bit value without breaking things into a loop first). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-26 14:45:57 +02:00
Eric Blake	c9ce8c4da6	block: Add flag to avoid wasted work in bdrv_is_allocated() Not all callers care about which BDS owns the mapping for a given range of the file, or where the zeroes lie within that mapping. In particular, bdrv_is_allocated() cares more about finding the largest run of allocated data from the guest perspective, whether or not that data is consecutive from the host perspective, and whether or not the data reads as zero. Therefore, doing subsequent refinements such as checking how much of the format-layer allocation also satisfies BDRV_BLOCK_ZERO at the protocol layer is wasted work - in the best case, it just costs extra CPU cycles during a single bdrv_is_allocated(), but in the worst case, it results in a smaller *pnum, and forces callers to iterate through more status probes when visiting the entire file for even more extra CPU cycles. This patch only optimizes the block layer (no behavior change when want_zero is true, but skip unnecessary effort when it is false). Then when subsequent patches tweak the driver callback to be byte-based, we can also pass this hint through to the driver. Tweak BdrvCoGetBlockStatusData to declare arguments in parameter order, rather than mixing things up (minimizing padding is not necessary here). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-26 14:45:57 +02:00
Eric Blake	298a1665a2	block: Allow NULL file for bdrv_get_block_status() Not all callers care about which BDS owns the mapping for a given range of the file. This patch merely simplifies the callers by consolidating the logic in the common call point, while guaranteeing a non-NULL file to all the driver callbacks, for no semantic change. The only caller that does not care about pnum is bdrv_is_allocated, as invoked by vvfat; we can likewise add assertions that the rest of the stack does not have to worry about a NULL pnum. Furthermore, this will also set the stage for a future cleanup: when a caller does not care about which BDS owns an offset, it would be nice to allow the driver to optimize things to not have to return BDRV_BLOCK_OFFSET_VALID in the first place. In the case of fragmented allocation (for example, it's fairly easy to create a qcow2 image where consecutive guest addresses are not at consecutive host addresses), the current contract requires bdrv_get_block_status() to clamp pnum to the limit where host addresses are no longer consecutive, but allowing a NULL file means that pnum could be set to the full length of known-allocated data. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-26 14:45:57 +02:00
Peter Maydell	79b2a13aa8	nbd patches for 2017-10-14 - Marc-André Lureau - NBD: use g_new() family of functions - Vladimir Sementsov-Ogievskiy - first half of 00/13 nbd minimal structured read -----BEGIN PGP SIGNATURE----- Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAABCAAGBQJZ4q4XAAoJEKeha0olJ0NqEooH/R8NKYACELA39xrLdEMUQuZY 1Lm3/OtpBIICKx7OiZ7LniqApAI++FgjNxOf6PAfNG0TmEA+wMFaZ6NJEdi9DAmv kJVLsxiqKLDD+WIKMq5XfZQoFMJ8rV8W2/BYx9cF3Pl4KMT20qDsumsncZJ7DGOR jjsbAI8Q6g45VBx6TJbxXiTMDj87nIyNaydAGzRQTmEHtnmh8mllPiuEhJu24l6G 7CQKfcu4/7Te/5PvJIPn7CxHdVjLYalgWDRkU3kXcwmO8vGQEkYoiHPoc8lGsGtw oXJ2YIODYBIjeICkF0/PjT9aoeJQG8EuHR1hT0CW5dVBZz/DlVP/j+EZ6IDV/8k= =ud0Z -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2017-10-14' into staging nbd patches for 2017-10-14 - Marc-André Lureau - NBD: use g_new() family of functions - Vladimir Sementsov-Ogievskiy - first half of 00/13 nbd minimal structured read # gpg: Signature made Sun 15 Oct 2017 01:38:47 BST # gpg: using RSA key 0xA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" # gpg: aka "[jpeg image of size 6874]" # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2017-10-14: nbd: header constants indenting nbd/server: simplify reply transmission nbd/server: refactor nbd_co_send_simple_reply parameters nbd/server: do not use NBDReply structure nbd/server: structurize simple reply header sending nbd: rename some simple-request related objects to be _simple_ block/nbd-client: refactor nbd_co_receive_reply block/nbd-client: assert qiov len once in nbd_co_request NBD: use g_new() family of functions Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-10-16 15:54:42 +01:00
Manos Pitsidianakis	b867eaa17b	block/throttle.c: add bdrv_co_drain_begin/end callbacks Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-10-13 12:38:41 +01:00
Manos Pitsidianakis	f8ea8dacf0	block: rename bdrv_co_drain to bdrv_co_drain_begin Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-10-13 12:38:41 +01:00
Manos Pitsidianakis	481cad48e5	block: add bdrv_co_drain_end callback BlockDriverState has a bdrv_co_drain() callback but no equivalent for the end of the drain. The throttle driver (block/throttle.c) needs a way to mark the end of the drain in order to toggle io_limits_disabled correctly, thus bdrv_co_drain_end is needed. Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-10-13 12:38:41 +01:00
Vladimir Sementsov-Ogievskiy	ed397b2fe7	block/nbd-client: refactor nbd_co_receive_reply Pass handle parameter directly, as the whole request isn't needed. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20171012095319.136610-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2017-10-12 16:20:27 -05:00
Vladimir Sementsov-Ogievskiy	4bfe4478d1	block/nbd-client: assert qiov len once in nbd_co_request Also improve the assertion: check that qiov is NULL for other commands than CMD_READ and CMD_WRITE. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20171012095319.136610-2-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2017-10-12 16:19:35 -05:00
Eduardo Habkost	e5766d6ec7	config: qemu_config_parse() return number of config groups Change qemu_config_parse() to return the number of config groups in success and -EINVAL on error. This will allow callers of qemu_config_parse() to check if something was really loaded from the config file. All existing callers of qemu_config_parse() and qemu_read_config_file() only check if the return value was negative, so the change shouldn't affect them. Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20171004025043.3788-2-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-10-09 23:21:52 -03:00
Vladimir Sementsov-Ogievskiy	ce960aa906	block/mirror: check backing in bdrv_mirror_top_flush Backing may be zero after failed bdrv_append in mirror_start_job, which leads to SIGSEGV. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20170929152255.5431-1-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-10-06 16:30:48 +02:00
Pavel Butsykin	163bc39d2c	qcow2: truncate the tail of the image file after shrinking the image Now after shrinking the image, at the end of the image file, there might be a tail that probably will never be used. So we can find the last used cluster and cut the tail. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170929121613.25997-3-pbutsykin@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-10-06 16:30:48 +02:00
Pavel Butsykin	76a2a30a99	qcow2: fix return error code in qcow2_truncate() Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170929121613.25997-2-pbutsykin@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-10-06 16:30:48 +02:00
Vladimir Sementsov-Ogievskiy	18775ff326	block/mirror: check backing in bdrv_mirror_top_refresh_filename Backing may be zero after failed bdrv_attach_child in bdrv_set_backing_hd, which leads to SIGSEGV. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20170928120300.58164-1-vsementsov@virtuozzo.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-10-06 16:30:47 +02:00
Daniel P. Berrange	d67a6b09b4	block: support passthrough of BDRV_REQ_FUA in crypto driver The BDRV_REQ_FUA flag can trivially be allowed in the crypt driver as a passthrough to the underlying block driver. Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170927125340.12360-7-berrange@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-10-06 16:30:47 +02:00
Daniel P. Berrange	4609742a49	block: convert qcrypto_block_encrypt\|decrypt to take bytes offset Instead of sector offset, take the bytes offset when encrypting or decrypting data. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170927125340.12360-6-berrange@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-10-06 16:30:47 +02:00
Daniel P. Berrange	a73466fbad	block: convert crypto driver to bdrv_co_preadv\|pwritev Make the crypto driver implement the bdrv_co_preadv\|pwritev callbacks, and also use bdrv_co_preadv\|pwritev for I/O with the protocol driver beneath. This replaces sector based I/O with byte based I/O, and allows us to stop assuming the physical sector size matches the encryption sector size. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170927125340.12360-5-berrange@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-10-06 16:30:47 +02:00
Daniel P. Berrange	31376555c7	block: fix data type casting for crypto payload offset The crypto APIs report the offset of the data payload as an uint64_t type, but the block driver is casting to size_t or ssize_t which will potentially truncate. Most of the block APIs use int64_t for offsets meanwhile, so even if using uint64_t in the crypto block driver we are still at risk of truncation. Change the block crypto driver to use uint64_t, but add asserts that the value is less than INT64_MAX. Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170927125340.12360-4-berrange@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-10-06 16:30:47 +02:00
Daniel P. Berrange	161253e2d0	block: use 1 MB bounce buffers for crypto instead of 16KB Using 16KB bounce buffers creates a significant performance penalty for I/O to encrypted volumes on storage which high I/O latency (rotating rust & network drives), because it triggers lots of fairly small I/O operations. On tests with rotating rust, and cache=none\|directsync, write speed increased from 2MiB/s to 32MiB/s, on a par with that achieved by the in-kernel luks driver. With other cache modes the in-kernel driver is still notably faster because it is able to report completion of the I/O request before any encryption is done, while the in-QEMU driver must encrypt the data before completion. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170927125340.12360-2-berrange@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-10-06 16:30:47 +02:00
Eric Blake	cb2e28780c	block: Perform copy-on-read in loop Improve our braindead copy-on-read implementation. Pre-patch, we have multiple issues: - we create a bounce buffer and perform a write for the entire request, even if the active image already has 99% of the clusters occupied, and really only needs to copy-on-read the remaining 1% of the clusters - our bounce buffer was as large as the read request, and can needlessly exhaust our memory by using double the memory of the request size (the original request plus our bounce buffer), rather than a capped maximum overhead beyond the original - if a driver has a max_transfer limit, we are bypassing the normal code in bdrv_aligned_preadv() that fragments to that limit, and instead attempt to read the entire buffer from the driver in one go, which some drivers may assert on - a client can request a large request of nearly 2G such that rounding the request out to cluster boundaries results in a byte count larger than 2G. While this cannot exceed 32 bits, it DOES have some follow-on problems: -- the call to bdrv_driver_pread() can assert for exceeding BDRV_REQUEST_MAX_BYTES, if the driver is old and lacks .bdrv_co_preadv -- if the buffer is all zeroes, the subsequent call to bdrv_co_do_pwrite_zeroes is a no-op due to a negative size, which means we did not actually copy on read Fix all of these issues by breaking up the action into a loop, where each iteration is capped to sane limits. Also, querying the allocation status allows us to optimize: when data is already present in the active layer, we don't need to bounce. Note that the code has a telling comment that copy-on-read should probably be a filter driver rather than a bolt-on hack in io.c; but that remains a task for another day. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	d855ebcd3c	block: Add blkdebug hook for copy-on-read Make it possible to inject errors on writes performed during a read operation due to copy-on-read semantics. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	9cdcfd9f7a	block: Uniform handling of 0-length bdrv_get_block_status() Handle a 0-length block status request up front, with a uniform return value claiming the area is not allocated. Most callers don't pass a length of 0 to bdrv_get_block_status() and friends; but it definitely happens with a 0-length read when copy-on-read is enabled. While we could audit all callers to ensure that they never make a 0-length request, and then assert that fact, it was just as easy to fix things to always report success (as long as the callers are careful to not go into an infinite loop). However, we had inconsistent behavior on whether the status is reported as allocated or defers to the backing layer, depending on what callbacks the driver implements, and possibly wasting quite a few CPU cycles to get to that answer. Consistently reporting unallocated up front doesn't really hurt anything, and makes it easier both for callers (0-length requests now have well-defined behavior) and for drivers (drivers don't have to deal with 0-length requests). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Kevin Wolf	bde70715b6	commit: Remove overlay_bs We don't need to make any assumptions about the graph layout above the top node of the commit operation any more. Remove the use of bdrv_find_overlay() and related variables from the commit job code. bdrv_drop_intermediate() doesn't use the 'active' parameter any more, so we can just drop it. The overlay node was previously added to the block job to get a BLK_PERM_GRAPH_MOD. We really need to respect those permissions in bdrv_drop_intermediate() now, but as long as we haven't figured out yet how BLK_PERM_GRAPH_MOD is actually supposed to work, just leave a TODO comment there. With this change, it is now possible to perform another block job on an overlay node without conflicts. qemu-iotests 030 is changed accordingly. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-10-06 16:28:58 +02:00
Kevin Wolf	61f09cea01	commit: Support multiple roots above top node This changes the commit block job to support operation in a graph where there is more than a single active layer that references the top node. This involves inserting the commit filter node not only on the path between the given active node and the top node, but between the top node and all of its parents. On completion, bdrv_drop_intermediate() must consider all parents for updating the backing file link. These parents may be backing files themselves and as such read-only; reopen them temporarily if necessary. Previously this was achieved by the bdrv_reopen() calls in the commit block job that made overlay_bs read-write for the whole duration of the block job, even though write access is only needed on completion. Now that we consider all parents, overlay_bs is meaningless. It is left in place in this commit, but we'll remove it soon. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	ca75962244	dirty-bitmap: Convert internal hbitmap size/granularity Now that all callers are using byte-based interfaces, there's no reason for our internal hbitmap to remain with sector-based granularity. It also simplifies our internal scaling, since we already know that hbitmap widens requests out to granularity boundaries. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	0fdf1a4f68	dirty-bitmap: Switch bdrv_set_dirty() to bytes Both callers already had bytes available, but were scaling to sectors. Move the scaling to internal code. In the case of bdrv_aligned_pwritev(), we are now passing the exact offset rather than a rounded sector-aligned value, but that's okay as long as dirty bitmap widens start/bytes to granularity boundaries. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	49d741b504	qcow2: Switch store_bitmap_data() to byte-based iteration Now that we have adjusted the majority of the calls this function makes to be byte-based, it is easier to read the code if it makes passes over the image using bytes rather than sectors. iotests 165 was rather weak - on a default 64k-cluster image, where bitmap granularity also defaults to 64k bytes, a single cluster of the bitmap table thus covers (6410248) bits which each cover 64k bytes, or 32G of image space. But the test only uses a 1G image, so it cannot trigger any more than one loop of the code in store_bitmap_data(); and it was writing to the first cluster. In order to test that we are properly aligning which portions of the bitmap are being written to the file, we really want to test a case where the first dirty bit returned by bdrv_dirty_iter_next() is not aligned to the start of a cluster, which we can do by modifying the test to write data that doesn't happen to fall in the first cluster of the image. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	ab94db6f76	qcow2: Switch load_bitmap_data() to byte-based iteration Now that we have adjusted the majority of the calls this function makes to be byte-based, it is easier to read the code if it makes passes over the image using bytes rather than sectors. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	b85ee45334	qcow2: Switch qcow2_measure() to byte-based iteration This is new code, but it is easier to read if it makes passes over the image using bytes rather than sectors (and will get easier in the future when bdrv_get_block_status is converted to byte-based). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	23ca459a45	mirror: Switch mirror_dirty_init() to byte-based iteration Now that we have adjusted the majority of the calls this function makes to be byte-based, it is easier to read the code if it makes passes over the image using bytes rather than sectors. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	e0d7f73e63	dirty-bitmap: Change bdrv_[re]set_dirty_bitmap() to use bytes Some of the callers were already scaling bytes to sectors; others can be easily converted to pass byte offsets, all in our shift towards a consistent byte interface everywhere. Making the change will also make it easier to write the hold-out callers to use byte rather than sectors for their iterations; it also makes it easier for a future dirty-bitmap patch to offload scaling over to the internal hbitmap. Although all callers happen to pass sector-aligned values, make the internal scaling robust to any sub-sector requests. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	3b5d4df0c6	dirty-bitmap: Change bdrv_get_dirty_locked() to take bytes Half the callers were already scaling bytes to sectors; the other half can eventually be simplified to use byte iteration. Both callers were already using the result as a bool, so make that explicit. Making the change also makes it easier for a future dirty-bitmap patch to offload scaling over to the internal hbitmap. Remember, asking whether a byte is dirty is effectively asking whether the entire granularity containing the byte is dirty, since we only track dirtiness by granularity. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	9a46dba7b7	dirty-bitmap: Change bdrv_get_dirty_count() to report bytes Thanks to recent cleanups, all callers were scaling a return value of sectors into bytes; do the scaling internally instead. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	f798184cfd	dirty-bitmap: Change bdrv_dirty_iter_next() to report byte offset Thanks to recent cleanups, most callers were scaling a return value of sectors into bytes (the exception, in qcow2-bitmap, will be converted to byte-based iteration later). Update the interface to do the scaling internally instead. In qcow2-bitmap, the code was specifically checking for an error return of -1. To avoid a regression, we either have to make sure we continue to return -1 (rather than a scaled -512) on error, or we have to fix the caller to treat all negative values as error rather than just one magic value. It's easy enough to make both changes at the same time, even though either one in isolation would work. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	715a74d819	dirty-bitmap: Set iterator start by offset, not sector All callers to bdrv_dirty_iter_new() passed 0 for their initial starting point, drop that parameter. Most callers to bdrv_set_dirty_iter() were scaling a byte offset to a sector number; the exception qcow2-bitmap will be converted later to use byte rather than sector iteration. Move the scaling to occur internally to dirty bitmap code instead, so that callers now pass in bytes. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	c7e7c87ac8	qcow2: Switch sectors_covered_by_bitmap_cluster() to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Change the qcow2 bitmap helper function sectors_covered_by_bitmap_cluster(), renaming it to bytes_covered_by_bitmap_cluster() in the process. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	86f6ae67e1	dirty-bitmap: Change bdrv_dirty_bitmap_serialize() to take bytes Right now, the dirty-bitmap code exposes the fact that we use a scale of sector granularity in the underlying hbitmap to anything that wants to serialize a dirty bitmap. It's nicer to uniformly expose bytes as our dirty-bitmap interface, matching the previous change to bitmap size. The only caller to serialization is currently qcow2-cluster.c, which becomes a bit more verbose because it is still tracking sectors for other reasons, but a later patch will fix that to more uniformly use byte offsets everywhere. Likewise, within dirty-bitmap, we have to add more assertions that we are not truncating incorrectly, which can go away once the internal hbitmap is byte-based rather than sector-based. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	993e6525bf	dirty-bitmap: Track bitmap size by bytes We are still using an internal hbitmap that tracks a size in sectors, with the granularity scaled down accordingly, because it lets us use a shortcut for our iterators which are currently sector-based. But there's no reason we can't track the dirty bitmap size in bytes, since it is (mostly) an internal-only variable (remember, the size is how many bytes are covered by the bitmap, not how many bytes the bitmap occupies). A later cleanup will convert dirty bitmap internals to be entirely byte-based, eliminating the intermediate sector rounding added here; and technically, since bdrv_getlength() already rounds up to sectors, our use of DIV_ROUND_UP is more for theoretical completeness than for any actual rounding. Use is_power_of_2() while at it, instead of open-coding that. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	ebfcd2e75f	dirty-bitmap: Change bdrv_dirty_bitmap_size() to report bytes We're already reporting bytes for bdrv_dirty_bitmap_granularity(); mixing bytes and sectors in our return values is a recipe for confusion. A later cleanup will convert dirty bitmap internals to be entirely byte-based, but in the meantime, we should report the bitmap size in bytes. The only external caller in qcow2-bitmap.c is temporarily more verbose (because it is still using sector-based math), but will later be switched to track progress by bytes instead of sectors. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	1b6cc579de	dirty-bitmap: Avoid size query failure during truncate We've previously fixed several places where we failed to account for possible errors from bdrv_nb_sectors(). Fix another one by making bdrv_dirty_bitmap_truncate() take the new size from the caller instead of querying itself; then adjust the sole caller bdrv_truncate() to pass the size just determined by a successful resize, or to reuse the size given to the original truncate operation when refresh_total_sectors() was not able to confirm the actual size (the two sizes can potentially differ according to rounding constraints), thus avoiding sizing the bitmaps to -1. This also fixes a bug where not all failure paths in bdrv_truncate() would set errp. Note that bdrv_truncate() is still a bit awkward. We may want to revisit it later and clean up things to better guarantee that a resize attempt either fails cleanly up front, or cannot fail after guest-visible changes have been made (if temporary changes are made, then they need to be cleanly rolled back). But that is a task for another day; for now, the goal is the bare minimum fix to ensure that just bdrv_dirty_bitmap_truncate() cannot fail. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	dfe55c3577	dirty-bitmap: Drop unused functions We had several functions that no one is currently using, and which use sector-based interfaces. I'm trying to convert towards byte-based interfaces, so it's easier to just drop the unused functions: bdrv_dirty_bitmap_get_meta bdrv_dirty_bitmap_get_meta_locked bdrv_dirty_bitmap_reset_meta bdrv_dirty_bitmap_meta_granularity Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	113754f3a8	qcow2: Ensure bitmap serialization is aligned When subdividing a bitmap serialization, the code in hbitmap.c enforces that start/count parameters are aligned (except that count can end early at end-of-bitmap). We exposed this required alignment through bdrv_dirty_bitmap_serialization_align(), but forgot to actually check that we comply with it. Fortunately, qcow2 is never dividing bitmap serialization smaller than one cluster (which is a minimum of 512 bytes); so we are always compliant with the serialization alignment (which insists that we partition at least 64 bits per chunk) because we are doing at least 4k bits per chunk. Still, it's safer to add an assertion (for the unlikely case that we'd ever support a cluster smaller than 512 bytes, or if the hbitmap implementation changes what it considers to be aligned), rather than leaving bdrv_dirty_bitmap_serialization_align() without a caller. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	ecbfa2817d	hbitmap: Rename serialization_granularity to serialization_align The only client of hbitmap_serialization_granularity() is dirty-bitmap's bdrv_dirty_bitmap_serialization_align(). Keeping the two names consistent is worthwhile, and the shorter name is more representative of what the function returns (the required alignment to be used for start/count of other serialization functions, where violating the alignment causes assertion failures). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	765d9df962	block: Typo fix in copy_on_readv() Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Peter Maydell	cfe4cade05	Block layer patches -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJZylugAAoJEH8JsnLIjy/Wp1QP/jtwre8WwtlX3SZ82cyiwonc fODcokF6iEsX7vs9vMr8lkbwF4mjxaU2Xf/knC98J/6rCxM8DRP/MW4BZttTxDI9 ++U926cIHHzjtSDalcjfTKnJD7PikHbLXz3SJ+u9hPnIeGs56hybCU6iHrrIcOTs YSMtD3eUQ8B8JaoegKgNyqkhvfhEdHlWFCq9/T2YEWwYMzGt1jvSTRqe3vr8vIu3 v7QKvh35gm965yaUTGpv7Ej7TOBTWOaYBo9D1TBKNEZOFTBjqyldciyBoDhCF2P9 +4EsNTkZ7u20Rko41dzsZPzhjG10gm/lNZNj3Cul9ta4kQ0hGeOjEujd9L9ANTGl gwnPPHKwgax5O+ctCPmrU7yHG+XIQD3gckC69qQeRXnPYQ4Jeo/LqhwjU+FcZfHs 97Lz6CHQHgtBP9JJwBMtUp77HJ58KPnnWIxGkb9u2vm4CpvRFkMrx5ekmj//9klX 5niRiqkNdrkEUnu/FIXOXxSmwnlhedAGQNq7ALSoW95El1QCy8Mm0eKEvmHyvZzd z2gSufLX6ynOaG4x5oY5eezKm6F4Hxwt+w8Svj9PHXIrmrEIop11LG5MVsDGDjyh XKiLddEIVKTYGwffX0CGTLYA34m2uHPkVVrMOIEvni3G6byXkb10+4pBpuTu/O/h wQPFraquH1I2B5YETsMa =7Bt2 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches # gpg: Signature made Tue 26 Sep 2017 14:52:32 BST # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (24 commits) block/qcow2-bitmap: fix use of uninitialized pointer qemu-iotests: add shrinking image test qcow2: add shrink image support qcow2: add qcow2_cache_discard qemu-img: add --shrink flag for resize iotests: fix 181: enable postcopy-ram capability on target qemu-iotests: Test change-backing-file command block: Fix permissions after bdrv_reopen() block: reopen: Queue children after their parents block: Base permissions on rw state after reopen block: Add reopen queue to bdrv_check_perm() block: Add reopen_queue to bdrv_child_perm() qemu-io: Drop write permissions before read-only reopen block: Clean up some bad code in the vvfat driver block/throttle-groups.c: allocate RestartData on the heap throttle: Assert that bkt->max is valid in throttle_compute_wait() iotests: Print full path of bad output if mismatch iotests: use virtio aliases for 067 iotests: use -ccw on s390x for 051 iotests: use -ccw on s390x for 040, 139, and 182 ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-09-27 16:48:39 +01:00
Vladimir Sementsov-Ogievskiy	5330f32b71	block/qcow2-bitmap: fix use of uninitialized pointer Without initialization to zero dirty_bitmap field may be not zero for a bitmap which should not be stored and qcow2_store_persistent_dirty_bitmaps will erroneously call store_bitmap for it which leads to SIGSEGV on bdrv_dirty_bitmap_name. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20170922144353.4220-1-vsementsov@virtuozzo.com Cc: qemu-stable@nongnu.org Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-09-26 15:00:32 +02:00
Pavel Butsykin	46b732cdf3	qcow2: add shrink image support This patch add shrinking of the image file for qcow2. As a result, this allows us to reduce the virtual image size and free up space on the disk without copying the image. Image can be fragmented and shrink is done by punching holes in the image file. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170918124230.8152-4-pbutsykin@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-09-26 15:00:32 +02:00
Pavel Butsykin	f71c08ea8e	qcow2: add qcow2_cache_discard Whenever l2/refcount table clusters are discarded from the file we can automatically drop unnecessary content of the cache tables. This reduces the chance of eviction useful cache data and eliminates inconsistent data in the cache with the data in the file. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170918124230.8152-3-pbutsykin@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-09-26 15:00:32 +02:00
Kevin Wolf	e0995dc3da	block: Add reopen_queue to bdrv_child_perm() In the context of bdrv_reopen(), we'll have to look at the state of the graph as it will be after the reopen. This interface addition is in preparation for the change. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-09-26 14:46:23 +02:00
Thomas Huth	7a6ab45e19	block: Clean up some bad code in the vvfat driver Remove the unnecessary home-grown redefinition of the assert() macro here, and remove the unusable debug code at the end of the checkpoint() function. The code there uses assert() with side-effects (assignment to the "mapping" variable), which should be avoided. Looking more closely, it seems as it is apparently also only usable for one certain directory layout (with a file named USB.H in it) and thus is of no use for the rest of the world. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-26 14:46:23 +02:00
Manos Pitsidianakis	43a5dc02fd	block/throttle-groups.c: allocate RestartData on the heap RestartData is the opaque data of the throttle_group_restart_queue_entry coroutine. By being stack allocated, it isn't available anymore if aio_co_enter schedules the coroutine with a bottom half and runs after throttle_group_restart_queue returns. Cc: qemu-stable@nongnu.org Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-26 14:46:23 +02:00
Fam Zheng	97ec9117c3	file-posix: Clear out first sector in hdev_create People get surprised when, after "qemu-img create -f raw /dev/sdX", they still see qcow2 with "qemu-img info", if previously the bdev had a qcow2 header. While this is natural because raw doesn't need to write any magic bytes during creation, hdev_create is free to clear out the first sector to make sure the stale qcow2 header doesn't cause such confusion. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-26 14:46:23 +02:00
Vladimir Sementsov-Ogievskiy	a693437037	block/nbd-client: nbd_co_send_request: fix return code It's incorrect to return success rc >= 0 if we skip qio_channel_writev_all() call due to s->quit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20170920124507.18841-4-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2017-09-25 08:21:26 -05:00
Vladimir Sementsov-Ogievskiy	9397067221	block/nbd-client: simplify check in nbd_co_receive_reply If we are woken up from while() loop in nbd_read_reply_entry handles must be equal. If we are woken up from nbd_recv_coroutines_wake_all s->quit must be true, so we do not need checking handles equality. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20170920124507.18841-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2017-09-25 08:21:26 -05:00
Vladimir Sementsov-Ogievskiy	319a56cde7	block/nbd-client: refactor nbd_co_receive_reply "NBDReply *reply" parameter of nbd_co_receive_reply is used only to pass return value for nbd_co_request (reply.error). Remove it and use function return value instead. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20170920124507.18841-2-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2017-09-25 08:21:25 -05:00
Eric Blake	cfa3ad635c	nbd-client: Use correct macro parenthesization If 'bs' is a complex expression, we were only casting the front half rather than the full expression. Luckily, none of the callers were passing bad arguments, but it's better to be robust up front. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170918214649.17550-1-eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-09-25 08:21:25 -05:00
Paolo Bonzini	7c9e527659	scsi, file-posix: add support for persistent reservation management It is a common requirement for virtual machine to send persistent reservations, but this currently requires either running QEMU with CAP_SYS_RAWIO, or using out-of-tree patches that let an unprivileged QEMU bypass Linux's filter on SG_IO commands. As an alternative mechanism, the next patches will introduce a privileged helper to run persistent reservation commands without expanding QEMU's attack surface unnecessarily. The helper is invoked through a "pr-manager" QOM object, to which file-posix.c passes SG_IO requests for PERSISTENT RESERVE OUT and PERSISTENT RESERVE IN commands. For example: $ qemu-system-x86_64 -device virtio-scsi \ -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock -drive if=none,id=hd,driver=raw,file.filename=/dev/sdb,file.pr-manager=helper0 -device scsi-block,drive=hd or: $ qemu-system-x86_64 -device virtio-scsi \ -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock -blockdev node-name=hd,driver=raw,file.driver=host_device,file.filename=/dev/sdb,file.pr-manager=helper0 -device scsi-block,drive=hd Multiple pr-manager implementations are conceivable and possible, though only one is implemented right now. For example, a pr-manager could: - talk directly to the multipath daemon from a privileged QEMU (i.e. QEMU links to libmpathpersist); this makes reservation work properly with multipath, but still requires CAP_SYS_RAWIO - use the Linux IOC_PR_* ioctls (they require CAP_SYS_ADMIN though) - more interestingly, implement reservations directly in QEMU through file system locks or a shared database (e.g. sqlite) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-09-22 01:06:51 +02:00
Alistair Francis	b62e39b469	General warn report fixups Tidy up some of the warn_report() messages after having converted them to use warn_report(). Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <9cb1d23551898c9c9a5f84da6773e99871285120.1505158760.git.alistair.francis@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-09-19 14:09:34 +02:00
Alistair Francis	8297be80f7	Convert multi-line fprintf() to warn_report() Convert all the multi-line uses of fprintf(stderr, "warning:"..."\n"... to use warn_report() instead. This helps standardise on a single method of printing warnings to the user. All of the warnings were changed using these commands: find ./* -type f -exec sed -i \ 'N; {s\|fprintf(.".warning[,:] $.$\\n"$.$);\|warn_report("\1"\2);\|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N; {s\|fprintf(.".warning[,:] $.$\\n"$.$);\|warn_report("\1"\2);\|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N; {s\|fprintf(.".warning[,:] $.$\\n"$.$);\|warn_report("\1"\2);\|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N {s\|fprintf(.".warning[,:] $.$\\n"$.$);\|warn_report("\1"\2);\|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N;N {s\|fprintf(.".warning[,:] $.$\\n"$.$);\|warn_report("\1"\2);\|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N;N;N {s\|fprintf(.".warning[,:] $.$\\n"$.$);\|warn_report("\1"\2);\|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N;N;N;N; {s\|fprintf(.".warning[,:] $.$\\n"$.$);\|warn_report("\1"\2);\|Ig}' \ {} + Indentation fixed up manually afterwards. Some of the lines were manually edited to reduce the line length to below 80 charecters. Some of the lines with newlines in the middle of the string were also manually edit to avoid checkpatch errrors. The #include lines were manually updated to allow the code to compile. Several of the warning messages can be improved after this patch, to keep this patch mechanical this has been moved into a later patch. Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Anthony Perard <anthony.perard@citrix.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Yongbok Kim <yongbok.kim@imgtec.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Alexander Graf <agraf@suse.de> Cc: Jason Wang <jasowang@redhat.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <5def63849ca8f551630c6f2b45bcb1c482f765a6.1505158760.git.alistair.francis@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-09-19 14:09:34 +02:00
Alistair Francis	2ab4b13563	Convert single line fprintf(.../n) to warn_report() Convert all the single line uses of fprintf(stderr, "warning:"..."\n"... to use warn_report() instead. This helps standardise on a single method of printing warnings to the user. All of the warnings were changed using this command: find ./* -type f -exec sed -i \ 's\|fprintf(.".warning[,:] $.$\\n"$.$);\|warn_report("\1"\2);\|Ig' \ {} + Some of the lines were manually edited to reduce the line length to below 80 charecters. The #include lines were manually updated to allow the code to compile. Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Michael Roth <mdroth@linux.vnet.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Yongbok Kim <yongbok.kim@imgtec.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> [mips] Message-Id: <ae8f8a7f0a88ded61743dff2adade21f8122a9e7.1505158760.git.alistair.francis@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-09-19 14:09:34 +02:00
Alistair Francis	55d527a94d	Convert remaining error_report() to warn_report() In a previous patch (`3dc6f86936`) we converted uses of error_report("warning:"... to use warn_report() instead. This was to help standardise on a single method of printing warnings to the user. There appears to have been some cases that slipped through in patch sets applied around the same time, this patch catches the few remaining cases. All of the warnings were changed using this command: find ./* -type f -exec sed -i \ 's\|error_report(".*warning[,:] \|warn_report("\|Ig' {} + Indentation fixed up manually afterwards. Two messages were manually fixed up as well. Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Alexander Graf <agraf@suse.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <eec8cba0d5434bd828639e5e45f12182490ff47d.1505158760.git.alistair.francis@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-09-19 14:09:34 +02:00
Paolo Bonzini	08e2c9f19c	scsi: move block/scsi.h to include/scsi/constants.h Complete the transition by renaming this header, which was shared by block/iscsi.c and the SCSI emulation code. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-09-19 14:09:31 +02:00
Paolo Bonzini	e5b5728cd3	scsi: move non-emulation specific code to scsi/ util/scsi.c includes some SCSI code that is shared by block/iscsi.c and hw/scsi, but the introduction of the persistent reservation helper will add many more instances of this. There is also include/block/scsi.h, which actually is not part of the core block layer. The persistent reservation manager will also need a home. A scsi/ directory provides one for both the aforementioned shared code and the PR manager code. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-09-19 14:09:11 +02:00
Fam Zheng	2875135807	scsi: Refactor scsi sense interpreting code So that it can be reused outside of iscsi.c. Also update MAINTAINERS to include the new files in SCSI section. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170821141008.19383-2-famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-09-19 14:09:11 +02:00
Peter Maydell	75be9a52b1	nbd patches for 2017-09-06 - Daniel P. Berrange: [0/2] Fix / skip recent iotests with LUKS driver - Eric Blake: [0/3] nbd: Use common read/write-all qio functions -----BEGIN PGP SIGNATURE----- Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAABCAAGBQJZsBGjAAoJEKeha0olJ0NqVRoH/iiNEB2SlZFFl5W++wf3Ekq/ lvtZjK3rxpvRXvy6LiRsYVs27Etc8E9aSw2UK6aaqgA3qR8g3zdmwUZb9w3slkeI OXedt0fS5IpQ4UP0ORUBb/LgyOgW3uA0UjHBTEAKl0SyvFPx+TrTZXxqQUqlAc9A lFaA0g71xvfqWWhXmt0PQjRr9bBEpe+4L4NgOypa+Z3xbBAektx390S8N/b/P8fC FNwAqBPTY5XAgJGnEhL9EUOdUWnVgoyG1MR63puJzULYi+2+TlpR2w030qRif75b h7TqYUvwKLnoqMyhBb5LmyhcqwNdphz/1DsEudk18XGuvC94WYkopC3rT7TPWLs= =vGUc -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2017-09-06' into staging nbd patches for 2017-09-06 - Daniel P. Berrange: [0/2] Fix / skip recent iotests with LUKS driver - Eric Blake: [0/3] nbd: Use common read/write-all qio functions # gpg: Signature made Wed 06 Sep 2017 16:17:55 BST # gpg: using RSA key 0xA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" # gpg: aka "[jpeg image of size 6874]" # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2017-09-06: nbd: Use new qio_channel_*_all() functions io: Add new qio_channel_read{, v}_all_eof functions io: Yield rather than wait when already in coroutine iotests: blacklist 194 with the luks driver iotests: rewrite 192 to use _launch_qemu to fix LUKS support Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-09-07 17:53:59 +01:00
Peter Maydell	8ee5f9b3ec	Block layer patches -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJZr/vJAAoJEH8JsnLIjy/WrNMP/RMlpIfzjPTIKl1qwdxEbtEe kdsQulnSILVAWnXldB6xiQ8/epO2oTP+8sE9VCAoblQfJjD6RgffF1YCC7h1ZyBX 182ZnhapIwprH5RLKz/kgjfkx5/bCYjqpQ3JzznKJHNXJOAexznrYJMcbA2agfII 5qijA06dDoMIQTz49J2vvFAHrRUq/JqK85Ao8Zk41GDHDan5OfvQwsgt+Wa0V3vz mV6G1UsWCe4pmrv7v7/buhkVypy/BYz7vu6N20+2o3GDLwHmsgfKogUiSAC1N3iR olkeKtXdplY17iO6VgVrmFdkvaja0XCxYJjXnL54x/f1lQQQc01wUFNrh6WoIQLO Bl+XZ0oEQpFKJeBlu9mbDvgit0AGYE/yaLkCnfRFOU15lW5rjwqpF8husU0ntUcI TzGWt21kG0EXisejLMGEzEkMwkdhTwX6U+U7x5pF+x+pwSdcREDekeFcVhsb42Y/ brTgZCXdf32eJ8gOSzFoBJ5KfFaCqKgA6lWAv/kLsVs8DN+MnAv3SJGRBr22854W yJC5e3yLh36RVemjBqbqsU9VMD/P8fB3nJQwZRMyQh5A3RNxrK1y6e4XIqRwGqcC aj4cT2GbLWFH+EJUVdSRELmrLJPLyj5a1lU28Dq6b2Q34f9Hvg8GjSBOFf+4Vx6C N/z6+8O1mDtXdGHuCbmI =qJjo -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches # gpg: Signature made Wed 06 Sep 2017 14:44:41 BST # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: qcow2: move qcow2_store_persistent_dirty_bitmaps() before cache flushing qemu-iotests: add 184 for throttle filter driver block: add throttle block filter driver block: convert ThrottleGroup to object with QOM block: tidy ThrottleGroupMember initializations block: add aio_context field in ThrottleGroupMember block: move ThrottleGroup membership to ThrottleGroupMember block: document semantics of bdrv_co_preadv\|pwritev qcow: Check failure of bdrv_getlength() and bdrv_truncate() qcow: Change signature of get_cluster_offset() block: add default implementations for bdrv_co_get_block_status() block: remove bdrv_truncate callback in blkdebug block: remove unused bdrv_media_changed block: pass bdrv_* methods to bs->file by default in block filters Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-09-07 10:45:18 +01:00
Eric Blake	030fa7f6f9	nbd: Use new qio_channel_*_all() functions Rather than open-coding our own read/write-all functions, we can make use of the recently-added qio code. It slightly changes the error message in one of the iotests. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170905191114.5959-4-eblake@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com>	2017-09-06 10:11:54 -05:00
Pavel Butsykin	83a8c775a8	qcow2: move qcow2_store_persistent_dirty_bitmaps() before cache flushing After calling qcow2_inactivate(), all qcow2 caches must be flushed, but this may not happen, because the last call qcow2_store_persistent_dirty_bitmaps() can lead to marking l2/refcont cache as dirty. Let's move qcow2_store_persistent_dirty_bitmaps() before the caсhe flushing to fix it. Cc: qemu-stable@nongnu.org Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-06 14:40:18 +02:00
Manos Pitsidianakis	d8e7d87ec4	block: add throttle block filter driver block/throttle.c uses existing I/O throttle infrastructure inside a block filter driver. I/O operations are intercepted in the filter's read/write coroutines, and referred to block/throttle-groups.c The driver can be used with the syntax -drive driver=throttle,file.filename=foo.qcow2,throttle-group=bar which registers the throttle filter node with the ThrottleGroup 'bar'. The given group must be created beforehand with object-add or -object. Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-06 10:12:02 +02:00
Manos Pitsidianakis	432d889e55	block: convert ThrottleGroup to object with QOM ThrottleGroup is converted to an object. This will allow the future throttle block filter drive easy creation and configuration of throttle groups in QMP and cli. A new QAPI struct, ThrottleLimits, is introduced to provide a shared struct for all throttle configuration needs in QMP. ThrottleGroups can be created via CLI as -object throttle-group,id=foo,x-iops-total=100,x-.. where x-* are individual limit properties. Since we can't add non-scalar properties in -object this interface must be used instead. However, setting these properties must be disabled after initialization because certain combinations of limits are forbidden and thus configuration changes should be done in one transaction. The individual properties will go away when support for non-scalar values in CLI is implemented and thus are marked as experimental. ThrottleGroup also has a `limits` property that uses the ThrottleLimits struct. It can be used to create ThrottleGroups or set the configuration in existing groups as follows: { "execute": "object-add", "arguments": { "qom-type": "throttle-group", "id": "foo", "props" : { "limits": { "iops-total": 100 } } } } { "execute" : "qom-set", "arguments" : { "path" : "foo", "property" : "limits", "value" : { "iops-total" : 99 } } } This also means a group's configuration can be fetched with qom-get. Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-05 18:12:21 +02:00
Manos Pitsidianakis	f738cfc843	block: tidy ThrottleGroupMember initializations Move the CoMutex and CoQueue inits inside throttle_group_register_tgm() which is called whenever a ThrottleGroupMember is initialized. There's no need for them to be separate. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-05 16:47:52 +02:00
Manos Pitsidianakis	c61791fc23	block: add aio_context field in ThrottleGroupMember timer_cb() needs to know about the current Aio context of the throttle request that is woken up. In order to make ThrottleGroupMember backend agnostic, this information is stored in an aio_context field instead of accessing it from BlockBackend. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-05 16:47:52 +02:00
Manos Pitsidianakis	022cdc9f40	block: move ThrottleGroup membership to ThrottleGroupMember This commit eliminates the 1:1 relationship between BlockBackend and throttle group state. Users will be able to create multiple throttle nodes, each with its own throttle group state, in the future. The throttle group state cannot be per-BlockBackend anymore, it must be per-throttle node. This is done by gathering ThrottleGroup membership details from BlockBackendPublic into ThrottleGroupMember and refactoring existing code to use the structure. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-05 16:47:51 +02:00
Peter Maydell	d3e3447d3d	Merge QEMU I/O 2017/09/05 v2 -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJZrpcMAAoJEL6G67QVEE/fgCIP/jFlfM7PNYc3UK4ePTG2gEnX zg1rXa8tg35Pvo2xF0E7pw+tevu6wghJPRegPip1qh7tEGHIm421ct0F2MyqByP2 NlgPxJMw7+klIg/Pmmt64gCybD1Sm//aEt1vaiJvG9unLWfpedQzhkc7L1MTpB6d r2k4PEdZPp+sQ9tXe1fRWbba548GYx4VnrSbe68+2pDMKSykcY4AEX+Mzr78aP/T yCABwz5tVlqOLjUTFVoV+zyDK0va0GxWXZW167olfIeZye4nuvW4oo3Q/ruFG6eo a1B/XVDHDlqC31pXttAP0izq4yRNcZXfgMSjZyfGUS8wjKAzP81uyE1H2gNGciTo pbcEKNhW+sjUV6ooTyHzD5pvRc/8lt/DG/FzMTmNZq6piMswPsrMRaAoQ9COOq3s Y28xQngCaw05zKYPfPU30y04OcDAA8x5iBuRR+iZzJcJO33gA7+kUO447ib2E7qL aDRR7FVhjbVRWkF5QTxjqq/9cuIqSu7vqS+CTgIoo+VEtdFsU4DlKZ07j1JfPHq0 1Tq6mMeAfBPUAjMp/UCMK2eF/BuUoeXV4jd4uOGvk+JDaROzA5VVhyOAXZLX0m90 auTh1L19j2d/8SLl+XMsTZViG5zD0X/Nukw98l0OMBAhfewYhNL/5PW16FldYnS9 SwqHcFxNRGABOoTsOKB/ =mnn5 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/berrange/tags/pull-qio-20170905-2' into staging Merge QEMU I/O 2017/09/05 v2 # gpg: Signature made Tue 05 Sep 2017 13:22:36 BST # gpg: using RSA key 0xBE86EBB415104FDF # gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>" # gpg: aka "Daniel P. Berrange <berrange@redhat.com>" # Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF * remotes/berrange/tags/pull-qio-20170905-2: io: fix check for handshake completion in TLS test io: add new qio_channel_{readv, writev, read, write}_all functions io: fix typo in docs comment for qio_channel_read util: remove the obsolete non-blocking connect io: fix temp directory used by test-io-channel-tls test Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-09-05 14:14:33 +01:00
Cao jin	b258793258	util: remove the obsolete non-blocking connect The non-blocking connect mechanism is obsolete, and it doesn't work well in inet connection, because it will call getaddrinfo first and getaddrinfo will blocks on DNS lookups. Since commit `e65c67e4` & `d984464e`, the non-blocking connect of migration goes through QIOChannel in a different manner(using a thread), and nobody use this old non-blocking connect anymore. Any newly written code which needs a non-blocking connect should use the QIOChannel code, so we can drop NonBlockingConnectHandler as a concept entirely. Suggested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2017-09-05 13:21:58 +01:00
Eric Blake	d7a753a148	qcow: Check failure of bdrv_getlength() and bdrv_truncate() Omitting the check for whether bdrv_getlength() and bdrv_truncate() failed meant that it was theoretically possible to return an incorrect offset to the caller. More likely, conditions for either of these functions to fail would also cause one of our other calls (such as bdrv_pread() or bdrv_pwrite_sync()) to also fail, but auditing that we are safe is difficult compared to just patching things to always forward on the error rather than ignoring it. Use osdep.h macros instead of open-coded rounding while in the area. Reported-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-04 18:33:00 +02:00
Eric Blake	56439e9d55	qcow: Change signature of get_cluster_offset() The old signature has an ambiguous meaning for a return of 0: either no allocation was requested or necessary, or an error occurred (but any errno associated with the error is lost to the caller, which then has to assume EIO). Better is to follow the example of qcow2, by changing the signature to have a separate return value that cleanly distinguishes between failure and success, along with a parameter that cleanly holds a 64-bit value. Then update all callers. While auditing that all return paths return a negative errno (rather than -1), I also simplified places where we can pass NULL rather than a local Error that just gets thrown away. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-04 18:33:00 +02:00
Manos Pitsidianakis	f7cc69b326	block: add default implementations for bdrv_co_get_block_status() bdrv_co_get_block_status_from_file() and bdrv_co_get_block_status_from_backing() set *file to bs->file and bs->backing respectively, so that bdrv_co_get_block_status() can recurse to them. Future block drivers won't have to duplicate code to implement this. Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-04 18:31:13 +02:00
Manos Pitsidianakis	d8e12cd322	block: remove bdrv_truncate callback in blkdebug Now that bdrv_truncate is passed to bs->file by default, remove the callback from block/blkdebug.c and set is_filter to true. is_filter also gives access to other callbacks that are forwarded automatically to bs->file for filters. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-04 18:31:13 +02:00
Manos Pitsidianakis	f024aee867	block: remove unused bdrv_media_changed This function is not used anywhere, so remove it. Markus Armbruster adds: The i82078 floppy device model used to call bdrv_media_changed() to implement its media change bit when backed by a host floppy. This went away in `21fcf36` "fdc: simplify media change handling". Probably broke host floppy media change. Host floppy pass-through was dropped in commit `f709623`. bdrv_media_changed() has never been used for anything else. Remove it. (Source is Message-ID: <87y3ruaypm.fsf@dusky.pond.sub.org>) Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-04 18:31:13 +02:00
Marc-André Lureau	ebf677c849	qapi: drop the sentinel in enum array Now that all usages have been converted to user lookup helpers. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20170822132255.23945-14-marcandre.lureau@redhat.com> [Rebased, superfluous local variable dropped, missing check-qom-proplist.c update added] Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1503564371-26090-17-git-send-email-armbru@redhat.com>	2017-09-04 13:09:13 +02:00
Marc-André Lureau	f7abe0ecd4	qapi: Change data type of the FOO_lookup generated for enum FOO Currently, a FOO_lookup is an array of strings terminated by a NULL sentinel. A future patch will generate enums with "holes". NULL-termination will cease to work then. To prepare for that, store the length in the FOO_lookup by wrapping it in a struct and adding a member for the length. The sentinel will be dropped next. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20170822132255.23945-13-marcandre.lureau@redhat.com> [Basically redone] Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1503564371-26090-16-git-send-email-armbru@redhat.com> [Rebased]	2017-09-04 13:09:13 +02:00
Markus Armbruster	977c736f80	qapi: Mechanically convert FOO_lookup[...] to FOO_str(...) Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1503564371-26090-14-git-send-email-armbru@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>	2017-09-04 13:09:13 +02:00
Markus Armbruster	5b5f825d44	qapi: Generate FOO_str() macro for QAPI enum FOO The next commit will put it to use. May look pointless now, but we're going to change the FOO_lookup's type, and then it'll help. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1503564371-26090-13-git-send-email-armbru@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>	2017-09-04 13:09:13 +02:00
Marc-André Lureau	8d5fb199fb	quorum: Use qapi_enum_parse() in quorum_open() Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20170822132255.23945-12-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Rebased, qemu_opt_get() factored out, commit message tweaked] Cc: Alberto Garcia <berto@igalia.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1503564371-26090-9-git-send-email-armbru@redhat.com>	2017-09-04 13:09:13 +02:00
Marc-André Lureau	f9509d1517	block: Use qemu_enum_parse() in blkdebug_debug_breakpoint() The error message on invalid blkdebug events changes from qemu-system-x86_64: LOCATION: Invalid event name "VALUE" to qemu-system-x86_64: LOCATION: invalid parameter value: VALUE Slight degradation, but the message is sub-par even before the patch. When complaining about a parameter value, both parameter name and value should be mentioned, as the value may well not be unique. Left for another day. Also left is the error message's unhelpful location: it points to the config=FILENAME rather than into that file. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20170822132255.23945-11-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Rebased, commit message rewritten] Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: qemu-block@nongnu.org Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1503564371-26090-8-git-send-email-armbru@redhat.com>	2017-09-04 13:09:13 +02:00
Markus Armbruster	06c60b6c46	qapi: Drop superfluous qapi_enum_parse() parameter max The lookup tables have a sentinel, no need to make callers pass their size. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1503564371-26090-3-git-send-email-armbru@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [Rebased, commit message corrected]	2017-09-04 13:09:13 +02:00
Peter Maydell	223cd0e13f	-----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJZp+UNAAoJENro4Ql1lpzlL3AP/3gyYuAt4vR9FzeDx64XfPzB x31p50TadRXRIrb5mmN69dXZbg0pmnk68m0HEeSXBl0wh+gQVVPL2xfaMow2UhIw jd0v9IxkR8PH9ruEso3fJH1RbNGy9aRUlgCYQdGo3Y4W3IZhOsSOKwdmrU46rohy Bq+RzEL0sWH5I6v+ylFJXktNrVY6n1P1epWY5BnldDm58+l727z/H1rnHPA3t6sL FHoCmDypimXE4bOEXUQ9y30z1KGYlSmVE9Jm9ABGakcnK3LK0nZl758/DEJDZg02 Ma+TJT3lnwqbLWPIanikeAiP6pf2NkYVhaJN42rqrYhFbOsl6ge2yzHxK83dzju+ 3b+Rk9yO932nQLwPTFGA1VGupAUqBtdDIMfZy8RpVD1anA83xgphBP2xPJh0Jsnj SAFinRdl1XFFVERoTLpMUqJWujp2mBsR14Ljw9dnF0HEfvr2jLkEyTwb6LwHyInx pAT06s9grsv0wlvaH+fZK5P1KviHr8TjX56qQM0YuGYr8LzvWAbd3mPor7c0EtR6 pr2GhbKQIhCq/foRD9nWMDlmUCWmJBjaCk++XUnmwFr61eegLku0jpRiClwFwPI3 I9dNfiJWrQFdtLFi2xi6A/ibtmCE9JS4lAZYw3ZVGnW8ulx0C2qev5HrgkcDtgq+ vmNfitmbOSG5ZvBn+3eC =jCiK -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/elmarco/tags/tidy-pull-request' into staging # gpg: Signature made Thu 31 Aug 2017 11:29:33 BST # gpg: using RSA key 0xDAE8E10975969CE5 # gpg: Good signature from "Marc-André Lureau <marcandre.lureau@redhat.com>" # gpg: aka "Marc-André Lureau <marcandre.lureau@gmail.com>" # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: 87A9 BD93 3F87 C606 D276 F62D DAE8 E109 7596 9CE5 * remotes/elmarco/tags/tidy-pull-request: (29 commits) eepro100: replace g_malloc()+memcpy() with g_memdup() test-iov: replace g_malloc()+memcpy() with g_memdup() i386: replace g_malloc()+memcpy() with g_memdup() i386: introduce ELF_NOTE_SIZE macro decnumber: use DIV_ROUND_UP kvm: use DIV_ROUND_UP i386/dump: use DIV_ROUND_UP ppc: use DIV_ROUND_UP msix: use DIV_ROUND_UP usb-hub: use DIV_ROUND_UP q35: use DIV_ROUND_UP piix: use DIV_ROUND_UP virtio-serial: use DIV_ROUND_UP console: use DIV_ROUND_UP monitor: use DIV_ROUND_UP virtio-gpu: use DIV_ROUND_UP vga: use DIV_ROUND_UP ui: use DIV_ROUND_UP vnc: use DIV_ROUND_UP vvfat: use DIV_ROUND_UP ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-08-31 15:52:43 +01:00
Peter Maydell	1d2a8e0690	-----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJZp8cdAAoJEJykq7OBq3PIyeQIALXlHMTJM+I2dfUZfkIYFrEk Euf0z1URMJ9k5hKy1kIhAVlmGWs2fB1snTCm9tZjCtPqMjH5EDWb8z+zrqmorpcQ LyIccYdT/XrFeU1x+n4PlhaubQKXiAfZbUbgZpbkZwGgX0k51gx3V9z1smHme6AX CIODhgotqbJ0Hy2kuAP8TM2OPgx1tcyel34GuT5e3Rrb8nL0QfHfG4nxcpWBB0q8 iipoJfBvKWpRV0azSg+s51x1FFcB3iDKr81uBVABOyLtVW13nF6EMRIP76rqy5qp relNDo6kdmh0W19motNPjOa4BhnPQakEfF+bdARBOJPbXsFzd5X193yQBKW+nq4= =5ltA -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging # gpg: Signature made Thu 31 Aug 2017 09:21:49 BST # gpg: using RSA key 0x9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: qcow2: allocate cluster_cache/cluster_data on demand qemu-doc: Add UUID support in initiator name tests: migration/guestperf Python 2.6 argparse compatibility docker.py: Python 2.6 argparse compatibility scripts: add argparse module for Python 2.6 compatibility misc: Remove unused Error variables oslib-posix: Print errors before aborting on qemu_alloc_stack() throttle: Test the valid range of config values throttle: Make burst_length 64bit and add range checks throttle: Make LeakyBucket.avg and LeakyBucket.max integer types throttle: Remove throttle_fix_bucket() / throttle_unfix_bucket() throttle: Make throttle_is_valid() a bit less verbose throttle: Update the throttle_fix_bucket() documentation throttle: Fix wrong variable name in the header documentation nvme: Fix get/set number of queues feature, again Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-08-31 14:33:54 +01:00
Marc-André Lureau	78ee96de64	vvfat: use DIV_ROUND_UP I used the clang-tidy qemu-round check to generate the fix: https://github.com/elmarco/clang-tools-extra Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>	2017-08-31 12:29:07 +02:00
Marc-André Lureau	13f1493f82	vpc: use DIV_ROUND_UP I used the clang-tidy qemu-round check to generate the fix: https://github.com/elmarco/clang-tools-extra Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>	2017-08-31 12:29:07 +02:00
Marc-André Lureau	21cf3e1201	qcow2: use DIV_ROUND_UP I used the clang-tidy qemu-round check to generate the fix: https://github.com/elmarco/clang-tools-extra Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>	2017-08-31 12:29:07 +02:00
Marc-André Lureau	6fb0022b48	dmg: use DIV_ROUND_UP I used the clang-tidy qemu-round check to generate the fix: https://github.com/elmarco/clang-tools-extra Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>	2017-08-31 12:29:07 +02:00
Marc-André Lureau	cf7a09c1e4	vhdx: use QEMU_ALIGN_DOWN I used the clang-tidy qemu-round check to generate the fix: https://github.com/elmarco/clang-tools-extra Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>	2017-08-31 12:29:07 +02:00
Vladimir Sementsov-Ogievskiy	f35dff7e13	block/nbd-client: refactor request send/receive Add nbd_co_request, to remove code duplications in nbd_client_co_{pwrite,pread,...} functions. Also this is needed for further refactoring. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20170804151440.320927-8-vsementsov@virtuozzo.com> [eblake: make nbd_co_request a wrapper, rather than merging two existing functions] Signed-off-by: Eric Blake <eblake@redhat.com>	2017-08-30 13:00:38 -05:00
Vladimir Sementsov-Ogievskiy	07b1b99c78	block/nbd-client: rename nbd_recv_coroutines_enter_all Rename nbd_recv_coroutines_enter_all to nbd_recv_coroutines_wake_all, as it most probably just adds all recv coroutines into co_queue_wakeup, rather than directly enter them. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20170804151440.320927-9-vsementsov@virtuozzo.com> [eblake: tweak commit message] Signed-off-by: Eric Blake <eblake@redhat.com>	2017-08-30 13:00:38 -05:00
Vladimir Sementsov-Ogievskiy	6faa077772	block/nbd-client: get rid of ssize_t Use int variable for nbd_co_send_request return value (as nbd_co_send_request returns int). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20170804151440.320927-6-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2017-08-30 13:00:38 -05:00
Stefan Hajnoczi	3c2d5183f9	nbd-client: avoid read_reply_co entry if send failed The following segfault is encountered if the NBD server closes the UNIX domain socket immediately after negotiation: Program terminated with signal SIGSEGV, Segmentation fault. #0 aio_co_schedule (ctx=0x0, co=0xd3c0ff2ef0) at util/async.c:441 441 QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines, (gdb) bt #0 0x000000d3c01a50f8 in aio_co_schedule (ctx=0x0, co=0xd3c0ff2ef0) at util/async.c:441 #1 0x000000d3c012fa90 in nbd_coroutine_end (bs=bs@entry=0xd3c0fec650, request=<optimized out>) at block/nbd-client.c:207 #2 0x000000d3c012fb58 in nbd_client_co_preadv (bs=0xd3c0fec650, offset=0, bytes=<optimized out>, qiov=0x7ffc10a91b20, flags=0) at block/nbd-client.c:237 #3 0x000000d3c0128e63 in bdrv_driver_preadv (bs=bs@entry=0xd3c0fec650, offset=offset@entry=0, bytes=bytes@entry=512, qiov=qiov@entry=0x7ffc10a91b20, flags=0) at block/io.c:836 #4 0x000000d3c012c3e0 in bdrv_aligned_preadv (child=child@entry=0xd3c0ff51d0, req=req@entry=0x7f31885d6e90, offset=offset@entry=0, bytes=bytes@entry=512, align=align@entry=1, qiov=qiov@entry=0x7ffc10a91b20, f +lags=0) at block/io.c:1086 #5 0x000000d3c012c6b8 in bdrv_co_preadv (child=0xd3c0ff51d0, offset=offset@entry=0, bytes=bytes@entry=512, qiov=qiov@entry=0x7ffc10a91b20, flags=flags@entry=0) at block/io.c:1182 #6 0x000000d3c011cc17 in blk_co_preadv (blk=0xd3c0ff4f80, offset=0, bytes=512, qiov=0x7ffc10a91b20, flags=0) at block/block-backend.c:1032 #7 0x000000d3c011ccec in blk_read_entry (opaque=0x7ffc10a91b40) at block/block-backend.c:1079 #8 0x000000d3c01bbb96 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:79 #9 0x00007f3196cb8600 in __start_context () at /lib64/libc.so.6 The problem is that nbd_client_init() uses nbd_client_attach_aio_context() -> aio_co_schedule(new_context, client->read_reply_co). Execution of read_reply_co is deferred to a BH which doesn't run until later. In the mean time blk_co_preadv() can be called and nbd_coroutine_end() calls aio_wake() on read_reply_co. At this point in time read_reply_co's ctx isn't set because it has never been entered yet. This patch simplifies the nbd_co_send_request() -> nbd_co_receive_reply() -> nbd_coroutine_end() lifecycle to just nbd_co_send_request() -> nbd_co_receive_reply(). The request is "ended" if an error occurs at any point. Callers no longer have to invoke nbd_coroutine_end(). This cleanup also eliminates the segfault because we don't call aio_co_schedule() to wake up s->read_reply_co if sending the request failed. It is only necessary to wake up s->read_reply_co if a reply was received. Note this only happens with UNIX domain sockets on Linux. It doesn't seem possible to reproduce this with TCP sockets. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20170829122745.14309-2-stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2017-08-30 13:00:37 -05:00
Stefan Hajnoczi	3e4c705212	qcow2: allocate cluster_cache/cluster_data on demand Most qcow2 files are uncompressed so it is wasteful to allocate (32 + 1) * cluster_size + 512 bytes upfront. Allocate s->cluster_cache and s->cluster_data when the first read operation is performance on a compressed cluster. The buffers are freed in .bdrv_close(). .bdrv_open() no longer has any code paths that can allocate these buffers, so remove the free functions in the error code path. This patch can result in significant memory savings when many qcow2 disks are attached or backing file chains are long: Before 12.81% (1,023,193,088B) After 5.36% (393,893,888B) Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170821135530.32344-1-stefanha@redhat.com Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-08-30 18:02:10 +01:00
Alberto Garcia	c3a8fe331e	misc: Remove unused Error variables There's a few cases which we're passing an Error pointer to a function only to discard it immediately afterwards without checking it. In these cases we can simply remove the variable and pass NULL instead. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20170829120836.16091-1-berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-08-30 11:58:26 +01:00
Stefan Hajnoczi	40f4a21895	nbd-client: avoid spurious qio_channel_yield() re-entry The following scenario leads to an assertion failure in qio_channel_yield(): 1. Request coroutine calls qio_channel_yield() successfully when sending would block on the socket. It is now yielded. 2. nbd_read_reply_entry() calls nbd_recv_coroutines_enter_all() because nbd_receive_reply() failed. 3. Request coroutine is entered and returns from qio_channel_yield(). Note that the socket fd handler has not fired yet so ioc->write_coroutine is still set. 4. Request coroutine attempts to send the request body with nbd_rwv() but the socket would still block. qio_channel_yield() is called again and assert(!ioc->write_coroutine) is hit. The problem is that nbd_read_reply_entry() does not distinguish between request coroutines that are waiting to receive a reply and those that are not. This patch adds a per-request bool receiving flag so nbd_read_reply_entry() can avoid spurious aio_wake() calls. Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20170822125113.5025-1-stefanha@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Tested-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2017-08-23 11:22:15 -05:00
Fam Zheng	045a2f8254	mirror: Mark target BB as "force allow inactivate" Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170823134242.12080-4-famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2017-08-23 10:21:55 -05:00
Fam Zheng	ca2e214411	block-backend: Allow more "can inactivate" cases These two conditions corresponds to mirror job's source and target, which need to be allowed as they are part of the non-shared storage migration workflow: failing to inactivate either will result in a failure during migration completion. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170823134242.12080-3-famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [eblake: improve comment grammar] Signed-off-by: Eric Blake <eblake@redhat.com>	2017-08-23 10:21:55 -05:00
Fam Zheng	c16de8f59a	block-backend: Refactor inactivate check The logic will be fixed (extended), move it to a separate function. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170823134242.12080-2-famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2017-08-23 10:21:55 -05:00
Igor Mammedov	d0a180131c	fix build failure in nbd_read_reply_entry() travis builds fail at HEAD at rc3 master with block/nbd-client.c: In function ‘nbd_read_reply_entry’: block/nbd-client.c:110:8: error: ‘ret’ may be used uninitialized in this function [-Werror=uninitialized] fix it by initializing 'ret' to 0 Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-08-23 12:24:41 +01:00
Eric Blake	72b6ffc766	nbd-client: Fix regression when server sends garbage When we switched NBD to use coroutines for qemu 2.9 (in particular, commit `a12a712a`), we introduced a regression: if a server sends us garbage (such as a corrupted magic number), we quit the read loop but do not stop sending further queued commands, resulting in the client hanging when it never reads the response to those additional commands. In qemu 2.8, we properly detected that the server is no longer reliable, and cancelled all existing pending commands with EIO, then tore down the socket so that all further command attempts get EPIPE. Restore the proper behavior of quitting (almost) all communication with a broken server: Once we know we are out of sync or otherwise can't trust the server, we must assume that any further incoming data is unreliable and therefore end all pending commands with EIO, and quit trying to send any further commands. As an exception, we still (try to) send NBD_CMD_DISC to let the server know we are going away (in part, because it is easier to do that than to further refactor nbd_teardown_connection, and in part because it is the only command where we do not have to wait for a reply). Based on a patch by Vladimir Sementsov-Ogievskiy. A malicious server can be created with the following hack, followed by setting NBD_SERVER_DEBUG to a non-zero value in the environment when running qemu-nbd: \| --- a/nbd/server.c \| +++ b/nbd/server.c \| @@ -919,6 +919,17 @@ static int nbd_send_reply(QIOChannel ioc, NBDReply reply, Error *errp) \| stl_be_p(buf + 4, reply->error); \| stq_be_p(buf + 8, reply->handle); \| \| + static int debug; \| + static int count; \| + if (!count++) { \| + const char str = getenv("NBD_SERVER_DEBUG"); \| + if (str) { \| + debug = atoi(str); \| + } \| + } \| + if (debug && !(count % debug)) { \| + buf[0] = 0; \| + } \| return nbd_write(ioc, buf, sizeof(buf), errp); \| } Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170814213426.24681-1-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-08-15 10:03:28 -05:00
Fam Zheng	5f7772c4d0	block-backend: Defer shared_perm tightening migration completion As in the case of nbd_export_new(), bdrv_invalidate_cache() can be called when migration is still in progress. In this case we are not ready to tighten the shared permissions fenced by blk->disable_perm. Defer to a VM state change handler. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170815130740.31229-4-famz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2017-08-15 10:03:28 -05:00
Fam Zheng	2b218f5dbc	file-posix: Do runtime check for ofd lock API It is reported that on Windows Subsystem for Linux, ofd operations fail with -EINVAL. In other words, QEMU binary built with system headers that exports F_OFD_SETLK doesn't necessarily run in an environment that actually supports it: $ qemu-system-aarch64 ... -drive file=test.vhdx,if=none,id=hd0 \ -device virtio-blk-pci,drive=hd0 qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to unlock byte 100 qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to unlock byte 100 qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to lock byte 100 As a matter of fact this is not WSL specific. It can happen when running a QEMU compiled against a newer glibc on an older kernel, such as in a containerized environment. Let's do a runtime check to cope with that. Reported-by: Andrew Baumann <Andrew.Baumann@microsoft.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-08-11 14:12:44 +02:00
Eric Blake	d0d5d0e31a	qcow2: Check failure of bdrv_getlength() qcow2_co_pwritev_compressed() should not call bdrv_truncate() if determining the size failed. Reported-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-08-11 13:23:47 +02:00
Eric Blake	c40fe9c06c	qcow2: Drop debugging dump_refcounts() It's been #if 0'd since its introduction in 2006, commit `585f8587`. We can revive dead code if we need it, but in the meantime, it has bit-rotted (for example, not checking for failure in bdrv_getlength()). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-08-11 13:23:45 +02:00
Eric Blake	81caa3cc3b	vpc: Check failure of bdrv_getlength() vpc_open() was checking for bdrv_getlength() failure in one, but not the other, location. Reported-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-08-11 13:23:40 +02:00
Jeff Cody	113fe792fd	block/nfs: fix mutex assertion in nfs_file_close() Commit `c096358e74` introduced assertion checks for when qemu_mutex() functions are called without the corresponding qemu_mutex_init() having initialized the mutex. This uncovered a latent bug in qemu's nfs driver - in nfs_client_close(), the NFSClient structure is overwritten with zeros, prior to the mutex being destroyed. Go ahead and destroy the mutex in nfs_client_close(), and change where we call qemu_mutex_init() so that it is correctly balanced. There are also a couple of memory leaks obscured by the memset, so this fixes those as well. Finally, we should be able to get rid of the memset(), as it isn't necessary. Cc: qemu-stable@nongnu.org Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Peter Lieven <pl@kamp.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-08-08 15:19:16 +02:00
Denis V. Lunev	e5e6268348	parallels: drop check that bdrv_truncate() is working This would be actually strange and error prone. If truncate() nowadays will fail, there is something fatally wrong. Let's check for that during the actual work. The only fallback case is when the file is not zero initialized. In this case we should switch to preallocation via fallocate(). Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Markus Armbruster <armbru@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-08-08 15:19:16 +02:00
Denis V. Lunev	d8b83e37c3	parallels: respect error code of bdrv_getlength() in allocate_clusters() If we can not get the file length, the state of BDS is broken completely. Return error to the caller. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Markus Armbruster <armbru@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-08-08 15:19:16 +02:00
Denis V. Lunev	70d9110b44	block: respect error code from bdrv_getlength in handle_aiocb_write_zeroes Original idea beyond the code in question was the following: we have failed to write zeroes with fallocate(FALLOC_FL_ZERO_RANGE) as the simplest approach and via fallocate(FALLOC_FL_PUNCH_HOLE)/fallocate(0). We have the only chance now: if the request comes beyond end of the file. Thus we should calculate file length and respect the error code from that op. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Markus Armbruster <armbru@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-08-08 15:19:16 +02:00
Fam Zheng	0e51b9b7c7	vmdk: Fix error handling/reporting of vmdk_check Errors from the callees must be captured and propagated to our caller, ensure this for both find_extent() and bdrv_getlength(). Reported-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-08-08 15:19:16 +02:00
Kevin Wolf	809eb70ed6	block/null: Remove 'filename' option This option was only added to allow 'null-co://' and 'null-aio://' as filenames, its value never served any actual purpose and was ignored. Nevertheless it was accepted as '-drive driver=null,filename=foo'. The correct way to enable the protocol prefixes (and that without adding a useless -drive option) is implementing .bdrv_parse_filename. This is what this patch does. Technically, this is an incompatible change, but the null block driver is only used for benchmarking, testing and debugging, and an option without effect isn't likely to be used by anyone anyway, so no bad effects are to be expected. Reported-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-08-08 15:19:16 +02:00
Jeff Cody	95d729835f	block/vhdx: check error return of bdrv_truncate() Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-08-08 14:37:00 +02:00
Jeff Cody	c6572fa0d2	block/vhdx: check error return of bdrv_flush() Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-08-08 14:37:00 +02:00
Jeff Cody	27539ac531	block/vhdx: check for offset overflow to bdrv_truncate() VHDX uses uint64_t types for most offsets, following the VHDX spec. However, bdrv_truncate() takes an int64_t value for the truncating offset. Check for overflow before calling bdrv_truncate(). While we are here, replace the bit shifting with QEMU_ALIGN_UP as well. N.B.: For a compliant image this is not an issue, as the maximum VHDX image size is defined per the spec to be 64TB. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-08-08 14:37:00 +02:00
Jeff Cody	3f910692c2	block/vhdx: check error return of bdrv_getlength() Calls to bdrv_getlength() were not checking for error. In vhdx.c, this can lead to truncating an image file, so it is a definite bug. In vhdx-log.c, the path for improper behavior is less clear, but it is best to check in any case. Some minor code movement of the log_guid intialization, as well. Reported-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-08-08 14:37:00 +02:00
Alberto Garcia	795be0621a	quorum: Set sectors-count to 0 when reporting a flush error The QUORUM_REPORT_BAD event has fields to report the sector in which the error was detected and the number of affected sectors starting from that one. This is important for read and write errors, but not for flush errors. For flush errors the current code reports the total size of the disk image. That is however not useful information in this case. Moreover, the bdrv_getlength() call can fail, and there's no good way of handling that failure. Since we're reporting useless information and we cannot even guarantee to do it in a consistent way, this patch changes the code to report 0 instead in all cases. Reported-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-08-08 14:37:00 +02:00
Daniel P. Berrange	f42cf447e2	block: move trace probes into bdrv_co_preadv\|pwritev There are trace probes in bdrv_co_readv\|writev, however, the block drivers are being gradually moved over to using the bdrv_co_preadv\|pwritev functions instead. As a result some block drivers miss the current probes. Move the probes into bdrv_co_preadv\|pwritev instead, so that they are triggered by more (all?) I/O code paths. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170804105036.11879-1-berrange@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-08-07 09:39:35 +01:00
Peter Maydell	3b64f272d3	Block layer patches for 2.10.0-rc1 -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJZgKgMAAoJEH8JsnLIjy/W8FQP/i8f6lf3kpmTyqfyG/d9rU+V jizaa3uHBmmctQ9Ib0k1woFudSc1Rxt1kXtG3Cj45nYjHXYzWZOA2p3LF0bpRVtX keq3yXTLE+Il0ED6WiklKMS+BjpzxlnnHJpqnP3axl2TWZQtlQFIqO/f51RYtrVe SOaNdXxqEFQgV3uJIwtPdo38BBZvbIwA99+gCHM1YURkCESUnmy5h8plpnovJtMT Ta8sF+LXAjHtusYzfghNJ/p0Rpg3DkUmvHgo5QTA1F54AAPa3TePewqxssHaNK1T cDfDocvq9/gEMCMca2uyWNFxeqDZzaExoNUo2EVYoCPXKWr+vPaEgs9+RjMv6XUw d6lXZ6F0Rpm1zdtYQI8R1/ZpYcf29oo13q6fW0EEDx8y+9LMRKZP7pRtWA+MpNzT 9iRMVm3y0G4FOaoWD9W9cMVfD9aJknz8j3pggIY8nUhvh7BqkEmbgoaO230AmDYc dVDDmGL8544g0x/v0USqe2ed/XdBkZSScOeKVeRpuS/r2E4UCBhhJNSefxzvn2+p GYj+M6HLZ+biyKBkK3gwyk1fT74vOMpOBzysbBpIN8kg9ySDSkFhvj7qE24YJMKT 6yuWQE7WzPmXMiUz6hpn/m5TSjtVAXvL3BDc2lMz0HW6tHWJ5asv9zJjku+Ze9P4 FGBPYzwJ585Wxq/Z6y5b =DIex -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches for 2.10.0-rc1 # gpg: Signature made Tue 01 Aug 2017 17:10:52 BST # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: block/qapi: Remove redundant NULL check to silence Coverity qemu-iotests/059: Fix leaked image files qemu-iotests/063: Fix leaked image qemu-iotests/162: Fix leaked temporary files qemu-iotests/153: Fix leaked scratch images qemu-iotests/141: Fix image cleanup qemu-iotests: Remove blkdebug.conf after tests qemu-iotests/041: Fix leaked scratch images block: fix leaks in bdrv_open_driver() block: fix dangling bs->explicit_options in block.c iotests: Add test of recent fix to 'qemu-img measure' iotests: Check dirty bitmap statistics in 124 iotests: Redirect stderr to stdout in 186 iotests: Fix test 156 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-08-01 17:27:36 +01:00
Kevin Wolf	8e8eb0a903	block/qapi: Remove redundant NULL check to silence Coverity When skipping implicit nodes in bdrv_block_device_info(), we know that bs0 is always non-NULL; initially, because it's taken from a BdrvChild and a BdrvChild never has a NULL bs, and after the first iteration because implicit nodes always have a backing file. Remove the NULL check and add an assertion that the implicit node does indeed have a backing file. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2017-08-01 18:09:33 +02:00
Vladimir Sementsov-Ogievskiy	8908eb1a4a	trace-events: fix code style: print 0x before hex numbers The only exception are groups of numers separated by symbols '.', ' ', ':', '/', like 'ab.09.7d'. This patch is made by the following: > find . -name trace-events \| xargs python script.py where script.py is the following python script: ========================= #!/usr/bin/env python import sys import re import fileinput rhex = '%[-+ .0-9](?:[hljztL]\|ll\|hh)?(?:x\|X\|"\sPRI[xX][^"]"?)' rgroup = re.compile('((?:' + rhex + '[.:/ ])+' + rhex + ')') rbad = re.compile('(?<!0x)' + rhex) files = sys.argv[1:] for fname in files: for line in fileinput.input(fname, inplace=True): arr = re.split(rgroup, line) for i in range(0, len(arr), 2): arr[i] = re.sub(rbad, '0x\g<0>', arr[i]) sys.stdout.write(''.join(arr)) ========================= Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Message-id: 20170731160135.12101-5-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-08-01 12:13:07 +01:00
Vladimir Sementsov-Ogievskiy	db73ee4bc8	trace-events: fix code style: %# -> 0x% In trace format '#' flag of printf is forbidden. Fix it to '0x%'. This patch is created by the following: check that we have a problem > find . -name trace-events \| xargs grep '%#' \| wc -l 56 check that there are no cases with additional printf flags before '#' > find . -name trace-events \| xargs grep "%[-+ 0'I]+#" \| wc -l 0 check that there are no wrong usage of '#' and '0x' together > find . -name trace-events \| xargs grep '0x%#' \| wc -l 0 fix the problem > find . -name trace-events \| xargs sed -i 's/%#/0x%/g' [Eric Blake noted that xargs grep '%[-+ 0'I]+#' should be xargs grep "%[-+ 0'I]+#" instead so the shell quoting is correct. --Stefan] Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20170731160135.12101-3-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-08-01 12:13:07 +01:00
Philippe Mathieu-Daudé	87e0331c5a	docs: fix broken paths to docs/devel/tracing.txt With the move of some docs/ to docs/devel/ on `ac06724a71`, no references were updated. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-07-31 13:12:53 +03:00
Philippe Mathieu-Daudé	f80ac75d0e	qcow2: fix null pointer dereference It seems this assert() was somehow misplaced. block/qcow2-refcount.c:2193:42: warning: Array access (from variable 'on_disk_reftable') results in a null pointer dereference on_disk_reftable[refblock_index] = refblock_offset; ~~~~~~~~~~~~~~~~ ^ Reported-by: Clang Static Analyzer Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-07-31 13:06:38 +03:00
Vladimir Sementsov-Ogievskiy	b6b75a99da	qcow2-bitmap: fix bitmap_free Fix possible crash on error path in qcow2_remove_persistent_dirty_bitmap. Although bitmap_free was added in `88ddffae8f` the bug was introduced later in commit `469c71edc7` (when qcow2_remove_persistent_dirty_bitmap was added). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20170714123341.373857-1-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-25 16:33:31 +02:00
Daniel P. Berrange	0696ae2c92	qcow: fix memory leaks related to encryption Fix leak of the 'encryptopts' string, which was mistakenly declared const. Fix leak of QemuOpts entry which should not have been deleted from the opts array. Reported by: coverity Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170714103105.5781-1-berrange@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-25 16:33:31 +02:00
Kevin Wolf	d3c8c67469	block: Skip implicit nodes in query-block/blockstats Commits `0db832f` and `6cdbceb` introduced the automatic insertion of filter nodes above the top layer of mirror and commit block jobs. The assumption made there was that since libvirt doesn't do node-level management of the block layer yet, it shouldn't be affected by added nodes. This is true as far as commands issued by libvirt are concerned. It only uses BlockBackend names to address nodes, so any operations it performs still operate on the root of the tree as intended. However, the assumption breaks down when you consider query commands, which return data for the wrong node now. These commands also return information on some child nodes (bs->file and/or bs->backing), which libvirt does make use of, and which refer to the wrong nodes, too. One of the consequences is that oVirt gets wrong information about the image size and stops the VM in response as long as a mirror or commit job is running: https://bugzilla.redhat.com/show_bug.cgi?id=1470634 This patch fixes the problem by hiding the implicit nodes created automatically by the mirror and commit block jobs in the output of query-block and BlockBackend-based query-blockstats as long as the user doesn't indicate that they are aware of those nodes by providing a node name for them in the QMP command to start the block job. The node-based commands query-named-block-nodes and query-blockstats with query-nodes=true still show all nodes, including implicit ones. This ensures that users that are capable of node-level management can still access the full information; users that only know BlockBackends won't use these commands. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Tested-by: Eric Blake <eblake@redhat.com>	2017-07-24 15:06:04 +02:00
Eric Blake	24bae02b19	qcow2: Fix sector calculation in qcow2_measure() We used MAX() instead of the intended MIN() when computing how many sectors to view in the current loop iteration of qcow2_measure(), and passed in a value of INT_MAX sectors instead of our more usual limit of BDRV_REQUEST_MAX_SECTORS (the latter avoids 32-bit overflow on conversion to bytes). For small files, the bug is harmless: bdrv_get_block_status_above() clamps its *pnum answer to the BDS size, regardless of any insanely larger input request. However, for any file at least 2T in size, we can very easily end up going into an infinite loop (the maximum of 0x100000000 sectors and INT_MAX is a 64-bit quantity, which becomes 0 when assigned to int; once nb_sectors is 0, we never make progress). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-24 15:06:04 +02:00
Eric Blake	6c98c57af3	dirty-bitmap: Report BlockDirtyInfo.count in bytes, as documented We've been documenting the value in bytes since its introduction in commit `b9a9b3a4` (v1.3), where it was actually reported in bytes. Commit `e4654d2` (v2.0) then removed things from block/qapi.c, in preparation for a rewrite to a list of dirty sectors in the next commit `21b5683` in block.c, but the new code mistakenly started reporting in sectors. Fixes: https://bugzilla.redhat.com/1441460 CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-24 15:06:04 +02:00
Mark Cave-Ayland	c8115f8eb8	block/vpc: fix uninitialised variable compiler warning Since commit `cfc87e00` "block/vpc.c: Handle write failures in get_image_offset()" older versions of gcc (in this case 4.7) incorrectly warn that "ret" can be used uninitialised in vpc_co_pwritev(). Setting ret to 0 at the start of vpc_co_pwritev() prevents the warning in gcc 4.7 and enables compilation with -Werror to succeed. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1500625265-23844-1-git-send-email-mark.cave-ayland@ilande.co.uk Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-07-21 15:00:07 +01:00
Max Reitz	7c8730d45f	block/vvfat: Fix compiler warning with gcc 7 gcc 7 complains that the sprintf() might write a null byte beyond the end of the tail buffer. That is wrong, but we can silence it by making i unsigned (it can never be negative anyway, see the if condition right before). For some reason, this allows gcc to suddenly accurately calculate the range of i so we can give the tail[] array the exact size it needs to have (which is 8 bytes) without gcc complaining. In addition, let us convert the sprintf() to snprintf(), because that is always nicer, and add an assertion about the range of the return value afterwards so we can see that "8 - len" will never be negative and thus "entry->name + MIN(j, 8 - len)" will never be out of bounds. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-18 15:14:36 +02:00
Hervé Poussineau	f80256b7ee	vvfat: initialize memory after allocating it This prevents some host to guest memory content leaks. Fixes: https://bugs.launchpad.net/qemu/+bug/1599539 Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-18 15:14:36 +02:00
Hervé Poussineau	e03da26b71	vvfat: correctly parse non-ASCII short and long file names Write support works again when image contains non-ASCII names. It is either the case when user created a non-ASCII filename, or when initial directory contained a non-ASCII filename (since `0c36111f57`) Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-18 15:14:36 +02:00
Hervé Poussineau	63d261cb0d	vvfat: add a constant for bootsector name Also add links to related compatibility problems. Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-18 15:14:36 +02:00
Hervé Poussineau	8c4517fd6e	vvfat: add constants for special values of name[0] Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-18 15:14:36 +02:00
Kevin Wolf	ec18b0a93a	block: List anonymous device BBs in query-block Instead of listing only monitor-owned BlockBackends in query-block, also add those anonymous BlockBackends that are owned by a qdev device and as such under the control of the user. This allows using query-block to inspect BlockBackends for the modern configuration syntax with -blockdev and -device. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2017-07-18 15:14:36 +02:00
Kevin Wolf	d5b68844e6	block/qapi: Use blk_all_next() for query-block This patch replaces the blk_next() loop in query-block by a blk_all_next() one so that we also get access to BlockBackends that aren't owned by the monitor. For now, the next thing we do is check whether each BB has a name, so there is no semantic difference. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2017-07-18 15:14:36 +02:00
Kevin Wolf	a429b9b5f4	block: Make blk_all_next() public Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2017-07-18 15:14:36 +02:00
Kevin Wolf	46eade7be8	block/qapi: Add qdev device name to query-block With -blockdev/-device, users can indirectly create anonymous BlockBackends, while the state of such backends is still of interest. As a preparation for making such BBs visible in query-block, make sure that they can be identified even without a name by adding the ID/QOM path of their qdev device to BlockInfo. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2017-07-18 15:14:35 +02:00
Kevin Wolf	77beef8365	block: Make blk_get_attached_dev_id() public Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2017-07-18 15:14:35 +02:00
Peter Maydell	cfc87e00c2	block/vpc.c: Handle write failures in get_image_offset() Coverity (CID 1355236) points out that get_image_offset() doesn't check that it actually succeeded in writing the updated block bitmap to the file. Check the error return from bdrv_pwrite_sync() and propagate an error response back up to the function which calls get_image_offset() for a write so that it can return the error to its caller. get_sector_offset() is only used for reads, but we move it to the same API for consistency. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-18 15:14:35 +02:00
Peter Maydell	9877860e7b	block/vmdk: Report failures in vmdk_read_cid() The function vmdk_read_cid() can fail if the read on the underlying block device fails, or if there's a format error in the VMDK file. However its API doesn't provide a mechanism to report these errors, and in some cases we were returning a CID of 0 and in some cases a CID of 0xffffffff, either of which might potentially be valid values. Change the function to return 0 on success or a negative errno, and return the CID via a uint32_t* argument. Update the callsites to handle and propagate the error appropriately. This fixes in passing a Coverity-spotted issue (CID 1350038) where we weren't checking the return value from sscanf(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-18 15:14:35 +02:00
Manos Pitsidianakis	27e4cf1303	block: remove timer canceling in throttle_config() throttle_config() cancels the timers of the calling BlockBackend. This doesn't make sense because other BlockBackends in the group remain untouched. There's no need to cancel the timers in the one specific BlockBackend so let's not do that. Throttled requests will run as scheduled and future requests will follow the new configuration. This also allows a throttle group's configuration to be changed even when it has no members. Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-18 15:14:35 +02:00
Manos Pitsidianakis	dbe824cc57	block: add clock_type field to ThrottleGroup Clock type in throttling is currently inferred by the ThrottleTimer's clock type even though it is a per-ThrottleGroup property; it doesn't make sense to have different clock types in the same group. Moving this to a field in ThrottleGroup can simplify some of the throttle functions. Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-18 15:14:35 +02:00
Kevin Wolf	b1e1fa0c3a	commit: Add NULL check for overlay_bs I can't see how overlay_bs could become NULL with the current code, but other code in this function already checks it and we can make Coverity happy with this check, so let's add it. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-07-18 15:14:35 +02:00
Peter Maydell	718d7f4f9c	-----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJZbNpiAAoJEJykq7OBq3PIKcIIAMEJpEiWQonSJZVV4fxqcbOF dXJSVxHqrrVUrcM8NY2zGXIwcGS8RBNZG+Yx/SEZgIljoYH4NFmbvKXWS2zgHSyr LUH0M6gYlQ/1vQTXwQrkJdtmgfc3xNVrQbanbynK3+aB1S5Y6pRGauDo8SqCBWu0 uLWkhcSQbG+OHD8Go5X1kZUSdpP8yOqKrxcNLe980ghi4HPMUydL3lbs4SwNlnRt NJIpMTGzJrL+CqyakIL+/PT9RBGCo4hllPD0CgX6HETEkuojxxXaqJIG+Tzj2FeU fXkoK1YQHEHLdVXnPkpModoPylhRqIQcPXoXt+aMvoLSM+bLooYXMYryMPoPDGI= =VgCF -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging # gpg: Signature made Mon 17 Jul 2017 16:40:18 BST # gpg: using RSA key 0x9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: block: fix shadowed variable in bdrv_co_pdiscard util/aio-win32: Only select on what we are actually waiting for Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-07-18 13:09:51 +01:00
Denis V. Lunev	593ed6f0a3	block: fix shadowed variable in bdrv_co_pdiscard We've had a shadowed 'ret' variable, which risks returning the wrong value, introduced in commit `b9c64947`. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20170710150559.30163-1-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-07-17 15:58:37 +01:00
Paolo Bonzini	5aca18a4ff	ssh: support I/O from any AioContext The coroutine may run in a different AioContext, causing the fd handler to busy wait. Fix this by resetting the handler in restart_coroutine, before the coroutine is restarted. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170629132749.997-12-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-07-17 11:34:20 +08:00
Paolo Bonzini	f1af3251f8	sheepdog: add queue_lock Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170629132749.997-11-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-07-17 11:34:20 +08:00
Paolo Bonzini	1f01e50b83	qed: protect table cache with CoMutex This makes the driver thread-safe. The CoMutex is dropped temporarily while accessing the data clusters or the backing file. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170629132749.997-10-pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-07-17 11:34:11 +08:00
Paolo Bonzini	61c7887e0f	qed: introduce bdrv_qed_init_state This will be used in the next patch, which will call bdrv_qed_do_open with a CoMutex taken. bdrv_qed_init_state provides a nice place to initialize it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170629132749.997-9-pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-07-17 11:33:11 +08:00
Paolo Bonzini	61124f03ab	block: invoke .bdrv_drain callback in coroutine context and from AioContext This will let the callback take a CoMutex in the next patch. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170629132749.997-8-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-07-17 11:28:15 +08:00
Paolo Bonzini	e7569c1829	qed: move tail of qed_aio_write_main to qed_aio_write_{cow, alloc} This part is never called for in-place writes, move it away to avoid the "backwards" coding style typical of callback-based code. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170629132749.997-7-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-07-17 11:28:15 +08:00
Paolo Bonzini	254aee4dbb	vvfat: make it thread-safe Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170629132749.997-6-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-07-17 11:28:15 +08:00
Paolo Bonzini	778b087e51	vpc: make it thread-safe Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170629132749.997-5-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-07-17 11:28:15 +08:00
Paolo Bonzini	1e88663979	vdi: make it thread-safe The VirtualBox driver is using a mutex to order all allocating writes, but it is not protecting accesses to the bitmap because they implicitly happen under the AioContext mutex. Change this to use a CoRwlock explicitly. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170629132749.997-4-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-07-17 11:28:15 +08:00
Paolo Bonzini	a8c57408cd	qcow2: call CoQueue APIs under CoMutex Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170629132749.997-2-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-07-17 11:28:15 +08:00
Peter Maydell	6c6076662d	* gdbstub fixes (Alex) * IOMMU MemoryRegion subclass (Alexey) * Chardev hotswap (Anton) * NBD_OPT_GO support (Eric) * Misc bugfixes * DEFINE_PROP_LINK (minus the ARM patches - Fam) * MAINTAINERS updates (Philippe) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJZaJejAAoJEL/70l94x66DwQ4H/0NUvh/Zfs64wE1iuZJACc24 1za02fFaB50vFDwQKWbM0GkHzDxoXBHk4Rvn92p+VSxpKtaAX4GRwCvxRA5GeUtm GAYbdIJUe0UELepKExrlUVzQcK9VfljoJpK3dZkP5Zzx83L2PAI/SexrZRibN2Uf yRI60uvlsMWU12nenzdVnYORd+TWDNKele7BhMrX/FX9wxaS1PlnsnKZggy6CU7G 8dwZJAZJ/s5tRGXyXyAQzLm5JZQCLnA6jxya540TbPeciFgbvvS2ydIitZ54vSPO VtmZ1rSWfTEbNF5xGD1Ztu8aAENr5/I05l6IjxZd45BdUCW3HxeJkc+7lE0K4uk= =wnVs -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * gdbstub fixes (Alex) * IOMMU MemoryRegion subclass (Alexey) * Chardev hotswap (Anton) * NBD_OPT_GO support (Eric) * Misc bugfixes * DEFINE_PROP_LINK (minus the ARM patches - Fam) * MAINTAINERS updates (Philippe) # gpg: Signature made Fri 14 Jul 2017 11:06:27 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (55 commits) spapr_rng: Convert to DEFINE_PROP_LINK cpu: Convert to DEFINE_PROP_LINK mips_cmgcr: Convert to DEFINE_PROP_LINK ivshmem: Convert to DEFINE_PROP_LINK dimm: Convert to DEFINE_PROP_LINK virtio-crypto: Convert to DEFINE_PROP_LINK virtio-rng: Convert to DEFINE_PROP_LINK virtio-scsi: Convert to DEFINE_PROP_LINK virtio-blk: Convert to DEFINE_PROP_LINK qdev: Add const qualifier to PropertyInfo definitions qmp: Use ObjectProperty.type if present qdev: Introduce DEFINE_PROP_LINK qdev: Introduce PropertyInfo.create qom: enforce readonly nature of link's check callback translate-all: remove redundant !tcg_enabled check in dump_exec_info vl: fix breakage of -tb-size nbd: Implement NBD_INFO_BLOCK_SIZE on client nbd: Implement NBD_INFO_BLOCK_SIZE on server nbd: Implement NBD_OPT_GO on client nbd: Implement NBD_OPT_GO on server ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-07-14 12:16:09 +01:00
Eric Blake	081dd1fe36	nbd: Implement NBD_INFO_BLOCK_SIZE on client The upstream NBD Protocol has defined a new extension to allow the server to advertise block sizes to the client, as well as a way for the client to inform the server whether it intends to obey block sizes. When using the block layer as the client, we will obey block sizes; but when used as 'qemu-nbd -c' to hand off to the kernel nbd module as the client, we are still waiting for the kernel to implement a way for us to learn if it will honor block sizes (perhaps by an addition to sysfs, rather than an ioctl), as well as any way to tell the kernel what additional block sizes to obey (NBD_SET_BLKSIZE appears to be accurate for the minimum size, but preferred and maximum sizes would probably be new ioctl()s), so until then, we need to make our request for block sizes conditional. When using ioctl(NBD_SET_BLKSIZE) to hand off to the kernel, use the minimum block size as the sector size if it is larger than 512, which also has the nice effect of cooperating with (non-qemu) servers that don't do read-modify-write when exposing a block device with 4k sectors; it might also allow us to visit a file larger than 2T on a 32-bit kernel. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170707203049.534-10-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-14 12:04:42 +02:00
Eric Blake	004a89fce9	nbd: Create struct for tracking export info The NBD Protocol is introducing some additional information about exports, such as minimum request size and alignment, as well as an advertised maximum request size. It will be easier to feed this information back to the block layer if we gather all the information into a struct, rather than adding yet more pointer parameters during negotiation. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170707203049.534-2-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-14 12:04:41 +02:00
Peter Maydell	a309b290aa	Error reporting patches for 2017-07-13 -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJZZ1/BAAoJEDhwtADrkYZTo7oP+gLj4B4kkp/DJnkzfuMMD1Ce ZPddZ8Z9RyXE4fS66sq1ODBQo5U+aQQZO7K234+jf8V4cKWW98lpVzLc3YdAHm2U ZF6Z9Rji5K4414ZsUcg92Zlovvdaji+mY0ooINav+4mqlONYrz29ntApWc0e0tGc e3tj4XDLhJrOM+mIx8vzixFlgSYj+6HgEiybYwolEK5svQbIQao3Y2omyb+zy0w0 RDT3XQnAAaZSOQAXcJGkhekkyMe0jMHOF0tULLx1uDQYctg9mUGlAGTZ5oTLgSve TCpSJwWCAx8XAJMkXyDRrdRFDLeUh6yGY7NTqAL3OuPSoAw9ygKrHyhTavxBJL+W rX7Qit3dmVrlZLviwNFQplAKYb10d08vBoKXmrnW5oVCmPEDvJIQfncbucpA/CNS ucdJ3RMLuDbbWdl+5tsL7jfiZAG7oSgAePTjN1rm0bDe5JN7NAU8WzHnKfE83iZq R+I3hofqGoiXSByYRLamZb+6nsURAxWPhcqcw7hdMsk7UI6dyZwWl9Fnm72w0BZK M5LHLkX0LYc+kZjiLKXlNK7Z50bXY0zKQpPCLH3nHA69iMiwVoozrjwa9iCKIxE+ 7ZlOfsu4ztExuicEyTr8b27CBrHjJjYDuFP0hroEOzqCKXUzegoq3oYMGP0doXxe o3xcwXVKT/1PudddyR4z =tByN -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2017-07-13' into staging Error reporting patches for 2017-07-13 # gpg: Signature made Thu 13 Jul 2017 12:55:45 BST # gpg: using RSA key 0x3870B400EB918653 # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-error-2017-07-13: Convert error_report_err() to warn_report_err() error: Implement the warn and free Error functions char-socket: Report TCP socket waiting as information Convert error_report() to warn_report() error: Functions to report warnings and informational messages util/qemu-error: Rename error_print_loc() to be more generic websock: Don't try to set errp directly block: Don't try to set errp directly xilinx: Fix latent error handling bug Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-07-14 09:36:40 +01:00
Alistair Francis	3dc6f86936	Convert error_report() to warn_report() Convert all uses of error_report("warning:"... to use warn_report() instead. This helps standardise on a single method of printing warnings to the user. All of the warnings were changed using these two commands: find ./* -type f -exec sed -i \ 's\|error_report(".*warning[,:] \|warn_report("\|Ig' {} + Indentation fixed up manually afterwards. The test-qdev-global-props test case was manually updated to ensure that this patch passes make check (as the test cases are case sensitive). Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Suggested-by: Thomas Huth <thuth@redhat.com> Cc: Jeff Cody <jcody@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Lieven <pl@kamp.de> Cc: Josh Durgin <jdurgin@redhat.com> Cc: "Richard W.M. Jones" <rjones@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Peter Crosthwaite <crosthwaite.peter@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Greg Kurz <groug@kaod.org> Cc: Rob Herring <robh@kernel.org> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Peter Chubb <peter.chubb@nicta.com.au> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Alexander Graf <agraf@suse.de> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Greg Kurz <groug@kaod.org> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed by: Peter Chubb <peter.chubb@data61.csiro.au> Acked-by: Max Reitz <mreitz@redhat.com> Acked-by: Marcel Apfelbaum <marcel@redhat.com> Message-Id: <e1cfa2cd47087c248dd24caca9c33d9af0c499b0.1499866456.git.alistair.francis@xilinx.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-07-13 13:49:58 +02:00
Max Reitz	772d1f973f	block/qcow2: falloc/full preallocating growth Implement the preallocation modes falloc and full for growing qcow2 images. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170613202107.10125-15-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:02 +02:00
Max Reitz	60c48a29b7	block/qcow2: Rename "fail_block" to just "fail" Now alloc_refcount_block() only contains a single fail label, so it makes more sense to just name it "fail" instead of "fail_block". Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-14-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:02 +02:00
Max Reitz	12cc30a8cb	block/qcow2: Add qcow2_refcount_area() This function creates a collection of self-describing refcount structures (including a new refcount table) at the end of a qcow2 image file. Optionally, these structures can also describe a number of additional clusters beyond themselves; this will be important for preallocated truncation, which will place the data clusters and L2 tables there. For now, we can use this function to replace the part of alloc_refcount_block() that grows the refcount table (from which it is actually derived). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-13-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:02 +02:00
Max Reitz	95b98f343b	block/qcow2: Metadata preallocation for truncate We can support PREALLOC_MODE_METADATA by invoking preallocate() in qcow2_truncate(). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170613202107.10125-12-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:02 +02:00
Max Reitz	652fecd005	block/qcow2: Lock s->lock in preallocate() preallocate() is and will be called only from places that do not otherwise need to lock s->lock: Currently that is qcow2_create2(), as of a future patch it will be called from qcow2_truncate(), too. It therefore makes sense to move locking that mutex into preallocate() itself. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-11-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:02 +02:00
Max Reitz	7bc45dc172	block/qcow2: Generalize preallocate() This patch adds two new parameters to the preallocate() function so we will be able to use it not just for preallocating a new image but also for preallocated image growth. The offset parameter allows the caller to specify a virtual offset from which to start preallocating. For newly created images this is always 0, but for preallocating growth this will be the old image length. The new_length parameter specifies the supposed new length of the image (basically the "end offset" for preallocation). During image truncation, bdrv_getlength() will return the old image length so we cannot rely on its return value then. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-10-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:02 +02:00
Max Reitz	35d72602ec	block/file-posix: Preallocation for truncate By using raw_regular_truncate() in raw_truncate(), we can now easily support preallocation. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-9-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:01 +02:00
Max Reitz	d0bc9e5d5e	block/file-posix: Generalize raw_regular_truncate Currently, raw_regular_truncate() is intended for setting the size of a newly created file. However, we also want to use it for truncating an existing file in which case only the newly added space (when growing) should be preallocated. This also means that if resizing failed, we should try to restore the original file size. This is important when using preallocation. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-8-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:01 +02:00
Max Reitz	9f63b07ee7	block/file-posix: Extract raw_regular_truncate() This functionality is part of raw_create() which we will be able to reuse nicely in raw_truncate(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20170613202107.10125-7-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:01 +02:00
Max Reitz	7dacd8bd3d	block/file-posix: Small fixes in raw_create() Variables should be declared at the start of a block, and if a certain parameter value is not supported it may be better to return -ENOTSUP instead of -EINVAL. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20170613202107.10125-6-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:01 +02:00
Max Reitz	3a691c50f1	block: Add PreallocMode to blk_truncate() blk_truncate() itself will pass that value to bdrv_truncate(), and all callers of blk_truncate() just set the parameter to PREALLOC_MODE_OFF for now. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-4-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:01 +02:00
Max Reitz	7ea37c3066	block: Add PreallocMode to bdrv_truncate() For block drivers that just pass a truncate request to the underlying protocol, we can now pass the preallocation mode instead of aborting if it is not PREALLOC_MODE_OFF. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:01 +02:00
Max Reitz	8243ccb743	block: Add PreallocMode to BD.bdrv_truncate() Add a PreallocMode parameter to the bdrv_truncate() function implemented by each block driver. Currently, we always pass PREALLOC_MODE_OFF and no driver accepts anything else. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:01 +02:00
Stefan Hajnoczi	c501c35220	qcow2: add bdrv_measure() support Use qcow2_calc_prealloc_size() to get the required file size. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20170705125738.8777-7-stefanha@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:00 +02:00
Stefan Hajnoczi	0eb4a8c1df	qcow2: extract image creation option parsing The image creation options parsed by qcow2_create() are also needed to implement .bdrv_measure(). Extract the parsing code, including input validation. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20170705125738.8777-6-stefanha@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:00 +02:00
Stefan Hajnoczi	7c5bcc4212	qcow2: make refcount size calculation conservative The refcount metadata size calculation is inaccurate and can produce numbers that are too small. This is bad because we should calculate a conservative number - one that is guaranteed to be large enough. This patch switches the approach to a fixed point calculation because the existing equation is hard to solve when inaccuracies are taken care of. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20170705125738.8777-5-stefanha@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:00 +02:00
Stefan Hajnoczi	95c67e3bd7	qcow2: extract preallocation calculation function Calculating the preallocated image size will be needed to implement .bdrv_measure(). Extract the code out into a separate function. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20170705125738.8777-4-stefanha@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:00 +02:00
Stefan Hajnoczi	a843a22a82	raw-format: add bdrv_measure() support Maximum size calculation is trivial for the raw format: it's just the requested image size (because there is no metadata). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20170705125738.8777-3-stefanha@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:00 +02:00
Vladimir Sementsov-Ogievskiy	615b5dcf2d	block: release persistent bitmaps on inactivate We should release them here to reload on invalidate cache. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170628120530.31251-31-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:59 +02:00
Vladimir Sementsov-Ogievskiy	469c71edc7	qcow2: add .bdrv_remove_persistent_dirty_bitmap Realize .bdrv_remove_persistent_dirty_bitmap interface. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170628120530.31251-29-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:59 +02:00
Vladimir Sementsov-Ogievskiy	56f364e6d7	block/dirty-bitmap: add bdrv_remove_persistent_dirty_bitmap Interface for removing persistent bitmap from its storage. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170628120530.31251-28-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:59 +02:00
Vladimir Sementsov-Ogievskiy	a3b52535e8	qmp: add x-debug-block-dirty-bitmap-sha256 Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20170628120530.31251-26-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:59 +02:00
Vladimir Sementsov-Ogievskiy	da0eb242ad	qcow2: add .bdrv_can_store_new_dirty_bitmap Realize .bdrv_can_store_new_dirty_bitmap interface. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170628120530.31251-23-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:59 +02:00
Vladimir Sementsov-Ogievskiy	169b879359	qcow2: store bitmaps on reopening image as read-only Store bitmaps and mark them read-only on reopening image as read-only. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170628120530.31251-21-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:58 +02:00
Vladimir Sementsov-Ogievskiy	5f72826e7f	qcow2: add persistent dirty bitmaps support Store persistent dirty bitmaps in qcow2 image. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170628120530.31251-20-vsementsov@virtuozzo.com [mreitz: Always assign ret in store_bitmap() in case of an error] Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:58 +02:00
Vladimir Sementsov-Ogievskiy	3dd10a06d1	block/dirty-bitmap: add bdrv_dirty_bitmap_next() Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170628120530.31251-19-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:58 +02:00
Vladimir Sementsov-Ogievskiy	a88b179fdb	block: introduce persistent dirty bitmaps New field BdrvDirtyBitmap.persistent means, that bitmap should be saved by format driver in .bdrv_close and .bdrv_inactivate. No format driver supports it for now. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20170628120530.31251-18-vsementsov@virtuozzo.com [mreitz: Fixed indentation] Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:58 +02:00
Vladimir Sementsov-Ogievskiy	a0319aacd4	block/dirty-bitmap: add autoload field to BdrvDirtyBitmap Mirror AUTO flag from Qcow2 bitmap in BdrvDirtyBitmap. This will be needed in future, to save this flag back to Qcow2 for persistent bitmaps. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20170628120530.31251-16-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:58 +02:00
Vladimir Sementsov-Ogievskiy	1b6b0562db	qcow2: support .bdrv_reopen_bitmaps_rw Realize bdrv_reopen_bitmaps_rw interface. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170628120530.31251-15-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:58 +02:00
Vladimir Sementsov-Ogievskiy	d1258dd0c8	qcow2: autoloading dirty bitmaps Auto loading bitmaps are bitmaps in Qcow2, with the AUTO flag set. They are loaded when the image is opened and become BdrvDirtyBitmaps for the corresponding drive. Extra data in bitmaps is not supported for now. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170628120530.31251-12-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:58 +02:00
Vladimir Sementsov-Ogievskiy	d6883bc968	block/dirty-bitmap: add readonly field to BdrvDirtyBitmap It will be needed in following commits for persistent bitmaps. If bitmap is loaded from read-only storage (and we can't mark it "in use" in this storage) corresponding BdrvDirtyBitmap should be read-only. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20170628120530.31251-11-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:58 +02:00
Vladimir Sementsov-Ogievskiy	8bfc932e1e	block/dirty-bitmap: fix comment for BlockDirtyBitmap.disabled field Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170628120530.31251-10-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:58 +02:00
Vladimir Sementsov-Ogievskiy	88ddffae8f	qcow2: add bitmaps extension Add bitmap extension as specified in docs/specs/qcow2.txt. For now, just mirror extension header into Qcow2 state and check constraints. Also, calculate refcounts for qcow2 bitmaps, to not break qemu-img check. For now, disable image resize if it has bitmaps. It will be fixed later. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170628120530.31251-9-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:57 +02:00
Vladimir Sementsov-Ogievskiy	8a5bb1f114	qcow2-refcount: rename inc_refcounts() and make it public This is needed for the following patch, which will introduce refcounts checking for qcow2 bitmaps. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170628120530.31251-8-vsementsov@virtuozzo.com [mreitz: s/inc_refcounts/qcow2_inc_refcounts_imrt/ in one more (new) place] Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:57 +02:00
Vladimir Sementsov-Ogievskiy	6bdc8b719a	block/dirty-bitmap: add deserialize_ones func Add bdrv_dirty_bitmap_deserialize_ones() function, which is needed for qcow2 bitmap loading, to handle unallocated bitmap parts, marked as all-ones. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170628120530.31251-7-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:57 +02:00
Vladimir Sementsov-Ogievskiy	ba06ff1a5c	block: fix bdrv_dirty_bitmap_granularity signature Make getter signature const-correct. This allows other functions with const dirty bitmap parameter use bdrv_dirty_bitmap_granularity(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20170628120530.31251-6-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:57 +02:00
Daniel P. Berrange	0a12f6f80e	qcow2: report encryption specific image information Currently 'qemu-img info' reports a simple "encrypted: yes" field. This is not very useful now that qcow2 can support multiple encryption formats. Users want to know which format is in use and some data related to it. Wire up usage of the qcrypto_block_get_info() method so that 'qemu-img info' can report about the encryption format and parameters in use $ qemu-img create \ --object secret,id=sec0,data=123456 \ -o encrypt.format=luks,encrypt.key-secret=sec0 \ -f qcow2 demo.qcow2 1G Formatting 'demo.qcow2', fmt=qcow2 size=1073741824 \ encryption=off encrypt.format=luks encrypt.key-secret=sec0 \ cluster_size=65536 lazy_refcounts=off refcount_bits=16 $ qemu-img info demo.qcow2 image: demo.qcow2 file format: qcow2 virtual size: 1.0G (1073741824 bytes) disk size: 480K encrypted: yes cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 encrypt: ivgen alg: plain64 hash alg: sha256 cipher alg: aes-256 uuid: 3fa930c4-58c8-4ef7-b3c5-314bb5af21f3 format: luks cipher mode: xts slots: [0]: active: true iters: 1839058 key offset: 4096 stripes: 4000 [1]: active: false key offset: 262144 [2]: active: false key offset: 520192 [3]: active: false key offset: 778240 [4]: active: false key offset: 1036288 [5]: active: false key offset: 1294336 [6]: active: false key offset: 1552384 [7]: active: false key offset: 1810432 payload offset: 2068480 master key iters: 438487 corrupt: false With the legacy "AES" encryption we just report the format name $ qemu-img create \ --object secret,id=sec0,data=123456 \ -o encrypt.format=aes,encrypt.key-secret=sec0 \ -f qcow2 demo.qcow2 1G Formatting 'demo.qcow2', fmt=qcow2 size=1073741824 \ encryption=off encrypt.format=aes encrypt.key-secret=sec0 \ cluster_size=65536 lazy_refcounts=off refcount_bits=16 $ ./qemu-img info demo.qcow2 image: demo.qcow2 file format: qcow2 virtual size: 1.0G (1073741824 bytes) disk size: 196K encrypted: yes cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 encrypt: format: aes corrupt: false Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170623162419.26068-20-berrange@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:57 +02:00
Daniel P. Berrange	1cd9a787a2	block: pass option prefix down to crypto layer While the crypto layer uses a fixed option name "key-secret", the upper block layer may have a prefix on the options. e.g. "encrypt.key-secret", in order to avoid clashes between crypto option names & other block option names. To ensure the crypto layer can report accurate error messages, we must tell it what option name prefix was used. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170623162419.26068-19-berrange@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:56 +02:00
Daniel P. Berrange	c01c214b69	block: remove all encryption handling APIs Now that all encryption keys must be provided upfront via the QCryptoSecret API and associated block driver properties there is no need for any explicit encryption handling APIs in the block layer. Encryption can be handled transparently within the block driver. We only retain an API for querying whether an image is encrypted or not, since that is a potentially useful piece of metadata to report to the user. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170623162419.26068-18-berrange@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:56 +02:00
Daniel P. Berrange	4652b8f3e1	qcow2: add support for LUKS encryption format This adds support for using LUKS as an encryption format with the qcow2 file, using the new encrypt.format parameter to request "luks" format. e.g. # qemu-img create --object secret,data=123456,id=sec0 \ -f qcow2 -o encrypt.format=luks,encrypt.key-secret=sec0 \ test.qcow2 10G The legacy "encryption=on" parameter still results in creation of the old qcow2 AES format (and is equivalent to the new 'encryption-format=aes'). e.g. the following are equivalent: # qemu-img create --object secret,data=123456,id=sec0 \ -f qcow2 -o encryption=on,encrypt.key-secret=sec0 \ test.qcow2 10G # qemu-img create --object secret,data=123456,id=sec0 \ -f qcow2 -o encryption-format=aes,encrypt.key-secret=sec0 \ test.qcow2 10G With the LUKS format it is necessary to store the LUKS partition header and key material in the QCow2 file. This data can be many MB in size, so cannot go into the QCow2 header region directly. Thus the spec defines a FDE (Full Disk Encryption) header extension that specifies the offset of a set of clusters to hold the FDE headers, as well as the length of that region. The LUKS header is thus stored in these extra allocated clusters before the main image payload. Aside from all the cryptographic differences implied by use of the LUKS format, there is one further key difference between the use of legacy AES and LUKS encryption in qcow2. For LUKS, the initialiazation vectors are generated using the host physical sector as the input, rather than the guest virtual sector. This guarantees unique initialization vectors for all sectors when qcow2 internal snapshots are used, thus giving stronger protection against watermarking attacks. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170623162419.26068-14-berrange@redhat.com Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:56 +02:00
Daniel P. Berrange	b25b387fa5	qcow2: convert QCow2 to use QCryptoBlock for encryption This converts the qcow2 driver to make use of the QCryptoBlock APIs for encrypting image content, using the legacy QCow2 AES scheme. With this change it is now required to use the QCryptoSecret object for providing passwords, instead of the current block password APIs / interactive prompting. $QEMU \ -object secret,id=sec0,file=/home/berrange/encrypted.pw \ -drive file=/home/berrange/encrypted.qcow2,encrypt.key-secret=sec0 The test 087 could be simplified since there is no longer a difference in behaviour when using blockdev_add with encrypted images for the running vs stopped CPU state. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170623162419.26068-12-berrange@redhat.com Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:56 +02:00
Daniel P. Berrange	446d306d23	qcow2: make qcow2_encrypt_sectors encrypt in place Instead of requiring separate input/output buffers for encrypting data, change qcow2_encrypt_sectors() to assume use of a single buffer, encrypting in place. The current callers all used the same buffer for input/output already. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170623162419.26068-11-berrange@redhat.com Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:56 +02:00
Daniel P. Berrange	d85f4222b4	qcow: convert QCow to use QCryptoBlock for encryption This converts the qcow driver to make use of the QCryptoBlock APIs for encrypting image content. This is only wired up to permit use of the legacy QCow encryption format. Users who wish to have the strong LUKS format should switch to qcow2 instead. With this change it is now required to use the QCryptoSecret object for providing passwords, instead of the current block password APIs / interactive prompting. $QEMU \ -object secret,id=sec0,file=/home/berrange/encrypted.pw \ -drive file=/home/berrange/encrypted.qcow,encrypt.format=aes,\ encrypt.key-secret=sec0 Though note that running QEMU system emulators with the AES encryption is no longer supported, so while the above syntax is valid, QEMU will refuse to actually run the VM in this particular example. Likewise when creating images with the legacy AES-CBC format qemu-img create -f qcow \ --object secret,id=sec0,file=/home/berrange/encrypted.pw \ -o encrypt.format=aes,encrypt.key-secret=sec0 \ /home/berrange/encrypted.qcow 64M Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170623162419.26068-10-berrange@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:56 +02:00
Daniel P. Berrange	1fad1f9400	qcow: make encrypt_sectors encrypt in place Instead of requiring separate input/output buffers for encrypting data, change encrypt_sectors() to assume use of a single buffer, encrypting in place. One current caller uses the same buffer for input/output already and the other two callers are easily converted to do so. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170623162419.26068-9-berrange@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:56 +02:00
Daniel P. Berrange	0cb8d47ba9	block: deprecate "encryption=on" in favor of "encrypt.format=aes" Historically the qcow & qcow2 image formats supported a property "encryption=on" to enable their built-in AES encryption. We'll soon be supporting LUKS for qcow2, so need a more general purpose way to enable encryption, with a choice of formats. This introduces an "encrypt.format" option, which will later be joined by a number of other "encrypt.XXX" options. The use of a "encrypt." prefix instead of "encrypt-" is done to facilitate mapping to a nested QAPI schema at later date. e.g. the preferred syntax is now qemu-img create -f qcow2 -o encrypt.format=aes demo.qcow2 Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170623162419.26068-8-berrange@redhat.com Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:55 +02:00
Daniel P. Berrange	6aa837f7bd	qcow: require image size to be > 1 for new images The qcow driver refuses to open images which are less than 2 bytes in size, but will happily create such images. Add a check in the create path to avoid this discrepancy. Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170623162419.26068-5-berrange@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:55 +02:00
Daniel P. Berrange	4a47f85431	block: add ability to set a prefix for opt names When integrating the crypto support with qcow/qcow2, we don't want to use the bare LUKS option names "hash-alg", "key-secret", etc. We need to namespace them to match the nested QAPI schema. e.g. "encrypt.hash-alg", "encrypt.key-secret" so that they don't clash with any general qcow options at a later date. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170623162419.26068-3-berrange@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:55 +02:00
Daniel P. Berrange	306a06e5f7	block: expose crypto option names / defs to other drivers The block/crypto.c defines a set of QemuOpts that provide parameters for encryption. This will also be needed by the qcow/qcow2 integration, so expose the relevant pieces in a new block/crypto.h header. Some helper methods taking QemuOpts are changed to take QDict to simplify usage in other places. Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170623162419.26068-2-berrange@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:44:55 +02:00
Eric Blake	51b0a48888	block: Make bdrv_is_allocated_above() byte-based We are gradually moving away from sector-based interfaces, towards byte-based. In the common case, allocation is unlikely to ever use values that are not naturally sector-aligned, but it is possible that byte-based values will let us be more precise about allocation at the end of an unaligned file that can do byte-based access. Changing the signature of the function to use int64_t *pnum ensures that the compiler enforces that all callers are updated. For now, the io.c layer still assert()s that all callers are sector-aligned, but that can be relaxed when a later patch implements byte-based block status. Therefore, for the most part this patch is just the addition of scaling at the callers followed by inverse scaling at bdrv_is_allocated(). But some code, particularly stream_run(), gets a lot simpler because it no longer has to mess with sectors. Leave comments where we can further simplify by switching to byte-based iterations, once later patches eliminate the need for sector-aligned operations. For ease of review, bdrv_is_allocated() was tackled separately. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:07 +02:00
Eric Blake	c00716beb3	block: Minimize raw use of bds->total_sectors bdrv_is_allocated_above() was relying on intermediate->total_sectors, which is a field that can have stale contents depending on the value of intermediate->has_variable_length. An audit shows that we are safe (we were first calling through bdrv_co_get_block_status() which in turn calls bdrv_nb_sectors() and therefore just refreshed the current length), but it's nicer to favor our accessor functions to avoid having to repeat such an audit, even if it means refresh_total_sectors() is called more frequently. Suggested-by: John Snow <jsnow@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:07 +02:00
Eric Blake	d6a644bbfe	block: Make bdrv_is_allocated() byte-based We are gradually moving away from sector-based interfaces, towards byte-based. In the common case, allocation is unlikely to ever use values that are not naturally sector-aligned, but it is possible that byte-based values will let us be more precise about allocation at the end of an unaligned file that can do byte-based access. Changing the signature of the function to use int64_t pnum ensures that the compiler enforces that all callers are updated. For now, the io.c layer still assert()s that all callers are sector-aligned on input and that pnum is sector-aligned on return to the caller, but that can be relaxed when a later patch implements byte-based block status. Therefore, this code adds usages like DIV_ROUND_UP(,BDRV_SECTOR_SIZE) to callers that still want aligned values, where the call might reasonbly give non-aligned results in the future; on the other hand, no rounding is needed for callers that should just continue to work with byte alignment. For the most part this patch is just the addition of scaling at the callers followed by inverse scaling at bdrv_is_allocated(). But some code, particularly bdrv_commit(), gets a lot simpler because it no longer has to mess with sectors; also, it is now possible to pass NULL if the caller does not care how much of the image is allocated beyond the initial offset. Leave comments where we can further simplify once a later patch eliminates the need for sector-aligned requests through bdrv_is_allocated(). For ease of review, bdrv_is_allocated_above() will be tackled separately. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:07 +02:00
Eric Blake	6f8e35e241	backup: Switch backup_run() to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Change the internal loop iteration of backups to track by bytes instead of sectors (although we are still guaranteed that we iterate by steps that are cluster-aligned). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	03f5d60bbf	backup: Switch backup_do_cow() to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Convert another internal function (no semantic change). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	f6ac207893	backup: Switch block_backup.h to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Continue by converting the public interface to backup jobs (no semantic change), including a change to CowRequest to track by bytes instead of cluster indices. Note that this does not change the difference between the public interface (starting point, and size of the subsequent range) and the internal interface (starting and end points). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Xie Changlong <xiechanglong@cmss.chinamobile.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	cf79cdf662	backup: Switch BackupBlockJob to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Continue by converting an internal structure (no semantic change), and all references to tracking progress. Drop a redundant local variable bytes_per_cluster. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	e8a81e9cad	block: Drop unused bdrv_round_sectors_to_clusters() Now that the last user [mirror_iteration()] has converted to using bytes, we no longer need a function to round sectors to clusters. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	fb2ef7919b	mirror: Switch mirror_iteration() to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Change the internal loop iteration of mirroring to track by bytes instead of sectors (although we are still guaranteed that we iterate by steps that are both sector-aligned and multiples of the granularity). Drop the now-unused mirror_clip_sectors(). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	ae4cc8777b	mirror: Switch mirror_do_read() to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Convert another internal function, preserving all existing semantics, and adding one more assertion that things are still sector-aligned (so that conversions to sectors in mirror_read_complete don't need to round). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	782d97efec	mirror: Switch mirror_cow_align() to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Convert another internal function (no semantic change), and add mirror_clip_bytes() as a counterpart to mirror_clip_sectors(). Some of the conversion is a bit tricky, requiring temporaries to convert between units; it will be cleared up in a following patch. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	931e52607f	mirror: Update signature of mirror_clip_sectors() Rather than having a void function that modifies its input in-place as the output, change the signature to reduce a layer of indirection and return the result. Suggested-by: John Snow <jsnow@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	e6f2419389	mirror: Switch mirror_do_zero_or_discard() to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Convert another internal function (no semantic change). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	b436982f04	mirror: Switch MirrorBlockJob to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Continue by converting an internal structure (no semantic change), and all references to the buffer size. Add an assertion that our use of s->granularity >> BDRV_SECTOR_BITS (necessary for interaction with sector-based dirty bitmaps, until a later patch converts those to be byte-based) does not suffer from truncation problems. [checkpatch has a false positive on use of MIN() in this patch] Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	317a6676a2	commit: Switch commit_run() to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Change the internal loop iteration of committing to track by bytes instead of sectors (although we are still guaranteed that we iterate by steps that are sector-aligned). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	d8a9858408	commit: Switch commit_populate() to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Start by converting an internal function (no semantic change). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	d535435f4a	stream: Switch stream_run() to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Change the internal loop iteration of streaming to track by bytes instead of sectors (although we are still guaranteed that we iterate by steps that are sector-aligned). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	158c649257	stream: Drop reached_end for stream_complete() stream_complete() skips the work of rewriting the backing file if the job was cancelled, if data->reached_end is false, or if there was an error detected (non-zero data->ret) during the streaming. But note that in stream_run(), data->reached_end is only set if the loop ran to completion, and data->ret is only 0 in two cases: either the loop ran to completion (possibly by cancellation, but stream_complete checks for that), or we took an early goto out because there is no bs->backing. Thus, we can preserve the same semantics without the use of reached_end, by merely checking for bs->backing (and logically, if there was no backing file, streaming is a no-op, so there is no backing file to rewrite). Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	8493211c02	stream: Switch stream_populate() to byte-based We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Start by converting an internal function (no semantic change). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	5cb1a49e01	trace: Show blockjob actions via bytes, not sectors Upcoming patches are going to switch to byte-based interfaces instead of sector-based. Even worse, trace_backup_do_cow_enter() had a weird mix of cluster and sector indices. The trace interface is low enough that there are no stability guarantees, and therefore nothing wrong with changing our units, even in cases like trace_backup_do_cow_skip() where we are not changing the trace output. So make the tracing uniformly use bytes. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Eric Blake	f3e4ce4af3	blockjob: Track job ratelimits via bytes, not sectors The user interface specifies job rate limits in bytes/second. It's pointless to have our internal representation track things in sectors/second, particularly since we want to move away from sector-based interfaces. Fix up a doc typo found while verifying that the ratelimit code handles the scaling difference. Repetition of expressions like 'n * BDRV_SECTOR_SIZE' will be cleaned up later when functions are converted to iterate over images by bytes rather than by sectors. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Hervé Poussineau	8b544293ef	vvfat: change OEM name to 'MSWIN4.1' According to specification: "'MSWIN4.1' is the recommanded setting, because it is the setting least likely to cause compatibility problems. If you want to put something else in here, that is your option, but the result may be that some FAT drivers might not recognize the volume." Specification: "FAT: General overview of on-disk format" v1.03, page 9 Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:06 +02:00
Hervé Poussineau	78f002c901	vvfat: handle KANJI lead byte 0xe5 Specification: "FAT: General overview of on-disk format" v1.03, page 23 Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:05 +02:00
Hervé Poussineau	6817efea3a	vvfat: limit number of entries in root directory in FAT12/FAT16 FAT12/FAT16 root directory is two sectors in size, which allows only 512 directory entries. Prevent QEMU startup if too much files exist, instead of overflowing root directory. Also introduce variable root_entries, which will be required for FAT32. Fixes: https://bugs.launchpad.net/qemu/+bug/1599539/comments/4 Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:05 +02:00
Hervé Poussineau	339cebcc01	vvfat: correctly generate numeric-tail of short file names More specifically: - try without numeric-tail only if LFN didn't have invalid short chars - start at ~1 (instead of ~0) - handle case if numeric tail is more than one char (ie > 10) Windows 9x Scandisk doesn't see anymore mismatches between short file names and long file names for non-ASCII filenames. Specification: "FAT: General overview of on-disk format" v1.03, page 31 Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:05 +02:00
Hervé Poussineau	0c36111f57	vvfat: correctly create base short names for non-ASCII filenames More specifically, create short name from filename and change blacklist of invalid chars to whitelist of valid chars. Windows 9x also now correctly see long file names of filenames containing a space, but Scandisk still complains about mismatch between SFN and LFN. [kwolf: Build fix for this intermediate patch (it included declarations for variables that are only used in the next patch) ] Specification: "FAT: General overview of on-disk format" v1.03, pages 30-31 Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:05 +02:00
Hervé Poussineau	09ec4119fb	vvfat: correctly create long names for non-ASCII filenames Assume that input filename is encoded as UTF-8, so correctly create UTF-16 encoding. Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:05 +02:00
Hervé Poussineau	f82d92bb02	vvfat: always create . and .. entries at first and in that order readdir() doesn't always return . and .. entries at first and in that order. This leads to not creating them at first in the directory, which raises some errors on file system checking utilities like MS-DOS Scandisk. Specification: "FAT: General overview of on-disk format" v1.03, page 25 Fixes: https://bugs.launchpad.net/qemu/+bug/1599539 Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:05 +02:00
Hervé Poussineau	92e28d8220	vvfat: fix field names in FAT12/FAT16 and FAT32 boot sectors Specification: "FAT: General overview of on-disk format" v1.03, pages 11-13 Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:05 +02:00
Hervé Poussineau	4dc705dc7e	vvfat: introduce offset_to_bootsector, offset_to_fat and offset_to_root_dir - offset_to_bootsector is the number of sectors up to FAT bootsector - offset_to_fat is the number of sectors up to first File Allocation Table - offset_to_root_dir is the number of sectors up to root directory sector Replace first_sectors_number - 1 by offset_to_bootsector. Replace first_sectors_number by offset_to_fat. Replace faked_sectors by offset_to_rootdir. Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:05 +02:00
Hervé Poussineau	ad05b31857	vvfat: rename useless enumeration values MODE_FAKED and MODE_RENAMED are not and were never used. Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:05 +02:00
Hervé Poussineau	5f5b29dfce	vvfat: fix typos Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:05 +02:00
Hervé Poussineau	d6a7e54ed3	vvfat: replace tabs by 8 spaces This was a complete mess. On 2299 indented lines: - 1329 were with spaces only - 617 with tabulations only - 353 with spaces and tabulations Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:05 +02:00
Hervé Poussineau	139921aaa7	vvfat: fix qemu-img map and qemu-img convert - bs->total_sectors is the number of sectors of the whole disk - s->sector_count is the number of sectors of the FAT partition This fixes the following assert in qemu-img map: qemu-img.c:2641: get_block_status: Assertion `nb_sectors' failed. This also fixes an infinite loop in qemu-img convert. Fixes: `4480e0f924` Fixes: https://bugs.launchpad.net/qemu/+bug/1599539 Cc: qemu-stable@nongnu.org Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:05 +02:00
Eric Blake	544daf6679	blkdebug: Support .bdrv_co_get_block_status Without a passthrough status of BDRV_BLOCK_RAW, anything wrapped by blkdebug appears 100% allocated as data. Better is treating it the same as the underlying file being wrapped. Update iotest 177 for the new expected output. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:05 +02:00
Eric Blake	d5254033da	block: Simplify use of BDRV_BLOCK_RAW The lone caller that cares about a return of BDRV_BLOCK_RAW (namely, io.c:bdrv_co_get_block_status) completely replaces the return value, so there is no point in passing BDRV_BLOCK_DATA. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:05 +02:00
Eric Blake	81c219ac6c	block: Guarantee that file is set on bdrv_get_block_status() We document that file is valid if the return is not an error and includes BDRV_BLOCK_OFFSET_VALID, but forgot to obey this contract when a driver (such as blkdebug) lacks a callback. Messed up in commit `67a0fd2` (v2.6), when we added the file parameter. Enhance qemu-iotest 177 to cover this, using a sequence that would print garbage or even SEGV, because it was dererefencing through uninitialized memory. [The resulting test output shows that we have less-than-ideal block status from the blkdebug driver, but that's a separate fix coming up soon.] Setting file on all paths that return BDRV_BLOCK_OFFSET_VALID is enough to fix the crash, but we can go one step further: always setting file, even on error, means that a broken caller that blindly dereferences file without checking for error is now more likely to get a reliable SEGV instead of randomly acting on garbage, making it easier to diagnose such buggy callers. Adding an assertion that file is set where expected doesn't hurt either. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:05 +02:00
Paolo Bonzini	96d06835dc	nbd: fix NBD over TLS When attaching the NBD QIOChannel to an AioContext, the TLS channel should be used, not the underlying socket channel. This is because, trivially, the TLS channel will be the one that we read/write to and thus the one that will get the qio_channel_yield() call. Fixes: `ff82911cd3` Cc: qemu-stable@nongnu.org Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Tested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-04 14:30:03 +02:00
Eric Blake	c61e684e44	block: Exploit BDRV_BLOCK_EOF for larger zero blocks When we have a BDS with unallocated clusters, but asking the status of its underlying bs->file or backing layer encounters an end-of-file condition, we know that the rest of the unallocated area will read as zeroes. However, pre-patch, this required two separate calls to bdrv_get_block_status(), as the first call stops at the point where the underlying file ends. Thanks to BDRV_BLOCK_EOF, we can now widen the results of the primary status if the secondary status already includes BDRV_BLOCK_ZERO. In turn, this fixes a TODO mentioned in iotest 154, where we can now see that all sectors in a partial cluster at the end of a file read as zero when coupling the shorter backing file's status along with our knowledge that the remaining sectors came from an unallocated cluster. Also, note that the loop in bdrv_co_get_block_status_above() had an inefficent exit: in cases where the active layer sets BDRV_BLOCK_ZERO but does NOT set BDRV_BLOCK_ALLOCATED (namely, where we know we read zeroes merely because our unallocated clusters lie beyond the backing file's shorter length), we still ended up probing the backing layer even though we already had a good answer. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170505021500.19315-3-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-30 21:48:06 +08:00
Eric Blake	fb0d8654ff	block: Add BDRV_BLOCK_EOF to bdrv_get_block_status() Just as the block layer already sets BDRV_BLOCK_ALLOCATED as a shortcut for subsequent operations, there are also some optimizations that are made easier if we can quickly tell that *pnum will advance us to the end of a file, via a new BDRV_BLOCK_EOF which gets set by the block layer. This just plumbs up the new bit; subsequent patches will make use of it. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170505021500.19315-2-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-30 21:48:06 +08:00
Max Reitz	f69165a8fe	block: Do not strcmp() with NULL uri->scheme uri_parse(...)->scheme may be NULL. In fact, probably every field may be NULL, and the callers do test this for all of the other fields but not for scheme (except for block/gluster.c; block/vxhs.c does not access that field at all). We can easily fix this by using g_strcmp0() instead of strcmp(). Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170613205726.13544-1-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-06-26 14:54:46 +02:00
Max Reitz	05cc758a3d	blkverify: Catch bs->exact_filename overflow The bs->exact_filename field may not be sufficient to store the full blkverify node filename. In this case, we should not generate a filename at all instead of an unusable one. Cc: qemu-stable@nongnu.org Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170613172006.19685-3-mreitz@redhat.com Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-06-26 14:54:46 +02:00
Max Reitz	de81d72d3d	blkdebug: Catch bs->exact_filename overflow The bs->exact_filename field may not be sufficient to store the full blkdebug node filename. In this case, we should not generate a filename at all instead of an unusable one. Cc: qemu-stable@nongnu.org Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170613172006.19685-2-mreitz@redhat.com Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-06-26 14:54:46 +02:00
Manos Pitsidianakis	f5a5ca7969	block: change variable names in BlockDriverState Change the 'int count' parameter in pwrite_zeros, pdiscard related functions (and some others) to 'int bytes', as they both refer to bytes. This helps with code legibility. Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Message-id: 20170609101808.13506-1-el13635@mail.ntua.gr Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-06-26 14:54:46 +02:00
Kevin Wolf	c5f1ad429c	block: Remove bdrv_aio_readv/writev/flush() These functions are unused now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:15 +02:00
Kevin Wolf	0f714ec706	qed: Use bdrv_co_* for coroutine_fns All functions that are marked coroutine_fn can directly call the bdrv_co_* version of functions instead of going through the wrapper. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:15 +02:00
Kevin Wolf	87f0d88261	qed: Add coroutine_fn to I/O path functions Now that we stay in coroutine context for the whole request when doing reads or writes, we can add coroutine_fn annotations to many functions that can do I/O or yield directly. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:15 +02:00
Kevin Wolf	c0e8f98927	qed: Use a coroutine for need_check_timer This fixes the last place where we degraded from AIO to actual blocking synchronous I/O requests. Putting it into a coroutine means that instead of blocking, the coroutine simply yields while doing I/O. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:15 +02:00
Kevin Wolf	48cc565e76	qed: Simplify request handling Now that we process a request in the same coroutine from beginning to end and don't drop out of it any more, we can look like a proper coroutine-based driver and simply call qed_aio_next_io() and get a return value from it instead of spawning an additional coroutine that reenters the parent when it's done. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:15 +02:00
Kevin Wolf	0806c3b5dd	qed: Use CoQueue for serialising allocations Now that we're running in coroutine context, the ad-hoc serialisation code (which drops a request that has to wait out of coroutine context) can be replaced by a CoQueue. This means that when we resume a serialised request, it is running in coroutine context again and its I/O isn't blocking any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:15 +02:00
Kevin Wolf	89f89709c7	qed: Implement .bdrv_co_readv/writev Most of the qed code is now synchronous and matches the coroutine model. One notable exception is the serialisation between requests which can still schedule a callback. Before we can replace this with coroutine locks, let's convert the driver's external interfaces to the coroutine versions. We need to be careful to handle both requests that call the completion callback directly from the calling coroutine (i.e. fully synchronous code) and requests that involve some callback, so that we need to yield and wait for the completion callback coming from outside the coroutine. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:15 +02:00
Kevin Wolf	018598747c	qed: Remove recursion in qed_aio_next_io() Instead of calling itself recursively as the last thing, just convert qed_aio_next_io() into a loop. This patch is best reviewed with 'git show -w' because most of it is just whitespace changes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:15 +02:00
Kevin Wolf	dddf8db10b	qed: Remove ret argument from qed_aio_next_io() All callers pass ret = 0, so we can just remove it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Kevin Wolf	0596be7e6a	qed: Add return value to qed_aio_read/write_data() Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but just return an error code and let the caller handle it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Kevin Wolf	d6daddcdeb	qed: Add return value to qed_aio_write_inplace/alloc() Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but just return an error code and let the caller handle it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Kevin Wolf	a101341aa0	qed: Add return value to qed_aio_write_cow() Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but just return an error code and let the caller handle it. While refactoring qed_aio_write_alloc() to accomodate the change, qed_aio_write_zero_cluster() ended up with a single line, so I chose to inline that line and remove the function completely. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Kevin Wolf	eaf0bc56f5	qed: Add return value to qed_aio_write_main() Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but just return an error code and let the caller handle it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Kevin Wolf	88d2dd72bc	qed: Add return value to qed_aio_write_l2_update() Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but just return an error code and let the caller handle it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Kevin Wolf	fb18de21e0	qed: Add return value to qed_aio_write_l1_update() Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but just return an error code and let the caller handle it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Kevin Wolf	fae25ac7bd	qed: Inline qed_commit_l2_update() qed_commit_l2_update() is unconditionally called at the end of qed_aio_write_l1_update(). Inline it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Kevin Wolf	a4d8f1aee1	qed: Make qed_aio_write_main() synchronous Note that this code is generally not running in coroutine context, so this is an actual blocking synchronous operation. We'll fix this in a moment. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Kevin Wolf	3e248cdcd9	qed: Make qed_aio_read_data() synchronous Note that this code is generally not running in coroutine context, so this is an actual blocking synchronous operation. We'll fix this in a moment. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Kevin Wolf	453e53e2a1	qed: Remove callback from qed_write_table() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Kevin Wolf	29470d11bf	qed: Remove GenericCB The GenericCB infrastructure isn't used any more. Remove it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Kevin Wolf	602b57fba4	qed: Make qed_write_table() synchronous Note that this code is generally not running in coroutine context, so this is an actual blocking synchronous operation. We'll fix this in a moment. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Kevin Wolf	f13d712bb2	qed: Remove callback from qed_write_header() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Kevin Wolf	7076309aef	qed: Make qed_write_header() synchronous Note that this code is generally not running in coroutine context, so this is an actual blocking synchronous operation. We'll fix this in a moment. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Kevin Wolf	b4ac32f34f	qed: Remove callback from qed_copy_from_backing_file() With this change, qed_aio_write_prefill() and qed_aio_write_postfill() collapse into a single function. This is reflected by a rename of the combined function to qed_aio_write_cow(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Kevin Wolf	0f7aa24d2c	qed: Make qed_copy_from_backing_file() synchronous Note that this code is generally not running in coroutine context, so this is an actual blocking synchronous operation. We'll fix this in a moment. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Kevin Wolf	e85c528142	qed: Make qed_read_backing_file() synchronous Note that this code is generally not running in coroutine context, so this is an actual blocking synchronous operation. We'll fix this in a moment. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Kevin Wolf	0f21b7a1b7	qed: Remove callback from qed_find_cluster() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Kevin Wolf	a8165d2d66	qed: Remove callback from qed_read_l2_table() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Kevin Wolf	f6513529c6	qed: Remove callback from qed_read_table() Instead of passing the return value to a callback, return it to the caller so that the callback can be inlined there. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Kevin Wolf	11273076e9	qed: Make qed_read_table() synchronous Note that this code is generally not running in coroutine context, so this is an actual blocking synchronous operation. We'll fix this in a moment. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Kevin Wolf	3b7cd9fd8f	qed: Use bottom half to resume waiting requests The qed driver serialises allocating write requests. When the active allocation is finished, the AIO callback is called, but after this, the next allocating request is immediately processed instead of leaving the coroutine. Resuming another allocation request in the same request coroutine means that the request now runs in the wrong coroutine. The following is one of the possible effects of this: The completed request will generally reenter its request coroutine in a bottom half, expecting that it completes the request in bdrv_driver_pwritev(). However, if the second request actually yielded before leaving the coroutine, the reused request coroutine is in an entirely different place and is reentered prematurely. Not a good idea. Let's make sure that we exit the coroutine after completing the first request by resuming the next allocating request only with a bottom half. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-06-26 14:51:14 +02:00
Alberto Garcia	24990c5b95	qcow2: Use offset_into_cluster() and offset_to_l2_index() We already have functions for doing these calculations, so let's use them instead of doing everything by hand. This makes the code a bit more readable. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-06-26 14:51:13 +02:00
Alberto Garcia	ee22a9d869	qcow2: Merge the writing of the COW regions with the guest data If the guest tries to write data that results on the allocation of a new cluster, instead of writing the guest data first and then the data from the COW regions, write everything together using one single I/O operation. This can improve the write performance by 25% or more, depending on several factors such as the media type, the cluster size and the I/O request size. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-06-26 14:51:13 +02:00
Alberto Garcia	86b862c431	qcow2: Pass a QEMUIOVector to do_perform_cow_{read,write}() Instead of passing a single buffer pointer to do_perform_cow_write(), pass a QEMUIOVector. This will allow us to merge the write requests for the COW regions and the actual data into a single one. Although do_perform_cow_read() does not strictly need to change its API, we're doing it here as well for consistency. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-06-26 14:51:13 +02:00
Alberto Garcia	b3cf1c7cf8	qcow2: Allow reading both COW regions with only one request Reading both COW regions requires two separate requests, but it's perfectly possible to merge them and perform only one. This generally improves performance, particularly on rotating disk drives. The downside is that the data in the middle region is read but discarded. This patch takes a conservative approach and only merges reads when the size of the middle region is <= 16KB. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-06-26 14:51:13 +02:00
Alberto Garcia	672f0f2c4b	qcow2: Split do_perform_cow() into _read(), _encrypt() and _write() This patch splits do_perform_cow() into three separate functions to read, encrypt and write the COW regions. perform_cow() can now read both regions first, then encrypt them and finally write them to disk. The memory allocation is also done in this function now, using one single buffer large enough to hold both regions. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-06-26 14:51:13 +02:00
Alberto Garcia	99450c6fb9	qcow2: Make perform_cow() call do_perform_cow() twice Instead of calling perform_cow() twice with a different COW region each time, call it just once and make perform_cow() handle both regions. This patch simply moves code around. The next one will do the actual reordering of the COW operations. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-06-26 14:51:13 +02:00
Alberto Garcia	e034f5bcbc	qcow2: Use unsigned int for both members of Qcow2COWRegion Qcow2COWRegion has two attributes: - The offset of the COW region from the start of the first cluster touched by the I/O request. Since it's always going to be positive and the maximum request size is at most INT_MAX, we can use a regular unsigned int to store this offset. - The size of the COW region in bytes. This is guaranteed to be >= 0, so we should use an unsigned type instead. In x86_64 this reduces the size of Qcow2COWRegion from 16 to 8 bytes. It will also help keep some assertions simpler now that we know that there are no negative numbers. The prototype of do_perform_cow() is also updated to reflect these changes. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-06-26 14:51:13 +02:00
Alberto Garcia	026ac1586b	qcow2: Remove unused Error variable in do_perform_cow() We are using the return value of qcow2_encrypt_sectors() to detect problems but we are throwing away the returned Error since we have no way to report it to the user. Therefore we can simply get rid of the local Error variable and pass NULL instead. Alternatively we could try to figure out a way to pass the original error instead of simply returning -EIO, but that would be more invasive, so let's keep the current approach. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-06-26 14:51:13 +02:00
Alberto Garcia	0d2fac8ede	throttle: Update throttle-groups.c documentation There used to be throttle_timers_{detach,attach}_aio_context() calls in bdrv_set_aio_context(), but since `7ca7f0f6db` they are now in blk_set_aio_context(). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-06-26 14:51:13 +02:00
Stefan Hajnoczi	ea17c9d20d	block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate() Calling aio_poll() directly may have been fine previously, but this is the future, man! The difference between an aio_poll() loop and BDRV_POLL_WHILE() is that BDRV_POLL_WHILE() releases the AioContext around aio_poll(). This allows the IOThread to run fd handlers or BHs to complete the request. Failure to release the AioContext causes deadlocks. Using BDRV_POLL_WHILE() partially fixes a 'savevm' hang with -object iothread. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-06-26 14:51:13 +02:00
Stefan Hajnoczi	dc88a467ec	block: count bdrv_co_rw_vmstate() requests Call bdrv_inc/dec_in_flight() for vmstate reads/writes. This seems unnecessary at first glance because vmstate reads/writes are done synchronously while the guest is stopped. But we need the bdrv_wakeup() in bdrv_dec_in_flight() so the main loop sees request completion. Besides, it's cleaner to count vmstate reads/writes like ordinary read/write requests. The bdrv_wakeup() partially fixes a 'savevm' hang with -object iothread. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-06-26 14:51:13 +02:00
Kevin Wolf	4f78a16fee	commit: Fix completion with extra reference commit_complete() can't assume that after its block_job_completed() the job is actually immediately freed; someone else may still be holding references. In this case, the op blockers on the intermediate nodes make the graph reconfiguration in the completion code fail. Call block_job_remove_all_bdrv() manually so that we know for sure that any blockers on intermediate nodes are given up. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-06-26 14:51:12 +02:00
Peter Maydell	84e3d0725b	QAPI patches for 2017-06-09 -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJZSRWrAAoJEDhwtADrkYZTVOQP/RK8br2A1Cn7LeVG6jnKz5hJ OqyII77x8I2RachWvnwxQeHPMEVDuz3y2WjL+80s6peXlpR6y/13w7oX0f6aiJuo +T9khTqMv2I7HsM5UCsXAJpFPHT7r90b4x8nstY80YLGe7lA7L6yk6PGyCxHThwA mOiTKDw6/Xb/yZGrS2Favrun7juNpAs0Ec1IAkaA8xsEgVkd6tDv281rmHqvibl/ //90VfJp3nHFZ12FCQ1HzA42Eigtmo/fIk9LnAzBoYG0zw0cnzjuv0BNzs/JwuUZ /VskeD1cViQ4yzFnPpjOavjYjTN854/JTJzm7gZ7dTQ6/l3ykoY6NDE8p1BLuHlC p2RKkg20EeZlpOEtMQ4g6iyG6EUxaKcEiXmQ31LqN/LJwxTYbo5B5nCHMjrt4gxe MqFBJQSNsJ7QjZ7Qa7pADMCi/G0m7/0dN8vBqSr4vcbLVvdbw/yb/9s33wXGrUj1 PyXM2ymi+vvSqcXtNXKshsJLxJSJxO1tm2tRIANDTabQ00yxs8dOYnQnbQFR94fp 6nrE2PnjZqgqk69aNDJEbngj6Tgx44nyTr1+Q17juZf9nTCE5QmBE1J0IRoykCJn E8+T63ZxtIxVV2yLi5xBjmZaZtPyJRGGeUXunA10SuWrHzupEcBuhFhFYd2MFM5L fsojALN2K3Gdx2+CmAo2 =O9Vv -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2017-06-09-v2' into staging QAPI patches for 2017-06-09 # gpg: Signature made Tue 20 Jun 2017 13:31:39 BST # gpg: using RSA key 0x3870B400EB918653 # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-qapi-2017-06-09-v2: (41 commits) tests/qdict: check more get_try_int() cases console: use get_uint() for "head" property i386/cpu: use get_uint() for "min-level"/"min-xlevel" properties numa: use get_uint() for "size" property pnv-core: use get_uint() for "core-pir" property pvpanic: use get_uint() for "ioport" property auxbus: use get_uint() for "addr" property arm: use get_uint() for "mp-affinity" property xen: use get_uint() for "max-ram-below-4g" property pc: use get_uint() for "hpet-intcap" property pc: use get_uint() for "apic-id" property pc: use get_uint() for "iobase" property acpi: use get_uint() for "pci-hole" properties acpi: use get_uint() for various acpi properties acpi: use get_uint() for "acpi-pcihp-io" properties platform-bus: use get_uint() for "addr" property bcm2835_fb: use {get, set}_uint() for "vcram-size" and "vcram-base" aspeed: use {set, get}_uint() for "ram-size" property pcihp: use get_uint() for "bsel" property pc-dimm: make "size" property uint64 ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-06-22 11:34:39 +01:00
Peter Maydell	65a0e3e842	-----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEtBAABCAAXBQJZQyPmEBxmYW16QHJlZGhhdC5jb20ACgkQyjViTGqRccaWrgf/ SCAHpi4gzWbr7AN03jP16Qy/kqNik6F7LTNSqrRbvBPb3TNchDd4z44SAghK5m/r +IlYQc20sBZ60tRHIHAUSF2WNcea2pj1v3ZVgjrI7hiJ3DXPiqqt/dAR/W/BLIDO tAHAVF6Pnrjm9DC4d2zATLDHvcHMzWOsnePh7XcOm44REbwUr3GDg6bf2+j+5yfS 9ewmXfh8z4w1IvSn+f5B+IeCvGvJNA1D55dqcGo8Ivlg9PnElziXFaXO2s7UiLIM mF3eTSIbJQNNN+E+0lpRpnqQiq+Txxggu61Q4f8bOTBhEOPa3etj1ydnXMVbvX25 6SUuBfGh51tyOIZOJz3GtA== =9b+J -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/famz/tags/docker-and-block-pull-request' into staging # gpg: Signature made Fri 16 Jun 2017 01:18:46 BST # gpg: using RSA key 0xCA35624C6A9171C6 # gpg: Good signature from "Fam Zheng <famz@redhat.com>" # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6 * remotes/famz/tags/docker-and-block-pull-request: (23 commits) block: make accounting thread-safe block: split BlockAcctStats creation and setup block: introduce block_account_one_io block: protect modification of dirty bitmaps with a mutex migration/block: reset dirty bitmap before reading block: introduce dirty_bitmap_mutex block: protect tracked_requests and flush_queue with reqs_lock block: access write_gen with atomics block: use Stat64 for wr_highest_offset util: add stats64 module throttle-groups: protect throttled requests with a CoMutex throttle-groups: do not use qemu_co_enter_next throttle-groups: only start one coroutine from drained_begin block: access io_plugged with atomic ops block: access wakeup with atomic ops block: access serialising_in_flight with atomic ops block: access io_limits_disabled with atomic ops block: access quiesce_counter with atomic ops block: access copy_on_read with atomic ops docker: Add flex and bison to centos6 image ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-06-20 16:01:15 +01:00
Peter Maydell	7e56accdaf	* nbd and qemu-nbd fixes (Eric, Max) * nbd refactoring (Vladimir) * vhost-user-scsi, take N+1 (Felipe) * replace memory_region_set_fd with memory_region_init_ram_from_fd (Marc-André) * docs/ movement (Paolo) * megasas TOCTOU fixes (Paolo) * make async_safe_run_on_cpu work on kvm/hax accelerators (Paolo) * Build system and poison.h improvements (Thomas) * -accel thread=xxx fix (Thomas) * move files to accel/ (Yang Zhong) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJZQli7AAoJEL/70l94x66DYAUH/icBoGSyY70G033BxbWW+jPW pjqobQw3YnFmiHSkXy1Xd4LKa0dyrVdmVECOAvB0embiN6pIk2UpeT4aQNHffG3f Y9wMRKnp6RMLpzwRYbde4rAlitn52ecGIANtFUC6RTnL7wKCuh3beWruKmkudG8V ZAsmX30pPFJxtQYJ0/q3c2jB6V87h2GocRLAmy1lF5/X1pXQ2fiKOYOJvV3HDY1U UQ+Ixuf4tTbyBwTm4Lx2tD3+olkVTUvcATqxhI+1JKu6lE0Dgb58urYDOfvVPFDl 98s6bU53f0gpOcb1/wKOAXKdZ2tvqRtwsP8IoPDnUpk8HGxbNWdofoxsO37vVIU= =zeKT -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * nbd and qemu-nbd fixes (Eric, Max) * nbd refactoring (Vladimir) * vhost-user-scsi, take N+1 (Felipe) * replace memory_region_set_fd with memory_region_init_ram_from_fd (Marc-André) * docs/ movement (Paolo) * megasas TOCTOU fixes (Paolo) * make async_safe_run_on_cpu work on kvm/hax accelerators (Paolo) * Build system and poison.h improvements (Thomas) * -accel thread=xxx fix (Thomas) * move files to accel/ (Yang Zhong) # gpg: Signature made Thu 15 Jun 2017 10:51:55 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (41 commits) vhost-user-scsi: Introduce a vhost-user-scsi sample application vhost-user-scsi: Introduce vhost-user-scsi host device qemu-doc: include version number docs: create interop/ subdirectory include/exec/poison: Mark some CONFIG defines as poisoned, too include/exec/poison: Add missing TARGET defines nbd/server: refactor nbd_trip nbd/server: rename rc to ret nbd/server: get rid of fail: return rc nbd/server: nbd_negotiate: fix error path nbd/server: remove NBDClientNewData nbd/server: refactor nbd_co_receive_request nbd/server: get rid of EAGAIN dead code nbd/server: refactor nbd_co_send_reply nbd/server: get rid of ssize_t nbd/server: get rid of nbd_negotiate_read and friends nbd: make nbd_drop public nbd: rename read_sync and friends accel: move kvm related accelerator files into accel/ tcg: move tcg backend files into accel/tcg/ ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-06-20 14:20:34 +01:00
Marc-André Lureau	01b2ffcedd	qapi: merge QInt and QFloat in QNum We would like to use a same QObject type to represent numbers, whether they are int, uint, or floats. Getters will allow some compatibility between the various types if the number fits other representations. Add a few more tests while at it. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20170607163635.17635-7-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [parse_stats_intervals() simplified a bit, comment in test_visitor_in_int_overflow() tidied up, suppress bogus warnings] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-06-20 14:31:31 +02:00
Paolo Bonzini	5b50bf77ce	block: make accounting thread-safe I'm not trying too hard yet. Later, with multiqueue support, this may cause mutex contention or cacheline bouncing. Cc: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-20-pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	9caa6f3dbe	block: split BlockAcctStats creation and setup block_acct_destroy is called unconditionally in blk_delete, but there is no BlockAcctStats function that is called unconditionally in blk_new. Split block_acct_init in two, so that it will be possible to create a QemuMutex in block_acct_init and destroy it in block_acct_cleanup. Cc: Alberto Garcia <berto@igalia.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-19-pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	39c1b4254e	block: introduce block_account_one_io This is the common code to account operations that produced actual I/O. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-18-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	b64bd51efa	block: protect modification of dirty bitmaps with a mutex Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-17-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	2119882c7e	block: introduce dirty_bitmap_mutex It protects only the list of dirty bitmaps; in the next patch we will also protect their content. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-15-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	3783fa3dd3	block: protect tracked_requests and flush_queue with reqs_lock Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-14-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	47fec59941	block: access write_gen with atomics Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-13-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	f7946da274	block: use Stat64 for wr_highest_offset Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-12-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	93001e9d87	throttle-groups: protect throttled requests with a CoMutex Another possibility is to use tg->lock, which we're holding anyway in both schedule_next_request and throttle_group_co_io_limits_intercept. This would require open-coding the CoQueue however, so I've chosen this alternative. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-10-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	3b170dc867	throttle-groups: do not use qemu_co_enter_next Prepare for removing this function; always restart throttled requests from coroutine context. This will matter when restarting throttled requests will have to acquire a CoMutex. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-9-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	7258ed930c	throttle-groups: only start one coroutine from drained_begin Starting all waiting coroutines from bdrv_drain_all is unnecessary; throttle_group_co_io_limits_intercept calls schedule_next_request as soon as the coroutine restarts, which in turn will restart the next request if possible. If we only start the first request and let the coroutines dance from there the code is simpler and there is more reuse between throttle_group_config, throttle_group_restart_blk and timer_cb. The next patch will benefit from this. We also stop accessing from throttle_group_restart_blk the blkp->throttled_reqs CoQueues even when there was no attached throttling group. This worked but is not pretty. The only thing that can interrupt the dance is the QEMU_CLOCK_VIRTUAL timer when switching from one block device to the next, because the timer is set to "now + 1" but QEMU_CLOCK_VIRTUAL might not be running. Set that timer to point in the present ("now") rather than the future and things work. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-8-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	850d54a2a9	block: access io_plugged with atomic ops Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-7-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	e2a6ae7fe5	block: access wakeup with atomic ops Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-6-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	20fc71b25c	block: access serialising_in_flight with atomic ops Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-5-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	d993b85804	block: access io_limits_disabled with atomic ops Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-4-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	414c2ec358	block: access quiesce_counter with atomic ops Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-3-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	d3faa13e5f	block: access copy_on_read with atomic ops Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-2-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Vladimir Sementsov-Ogievskiy	d1fdf257d5	nbd: rename read_sync and friends Rename nbd_wr_syncv -> nbd_rwv read_sync -> nbd_read read_sync_eof -> nbd_read_eof write_sync -> nbd_write drop_sync -> nbd_drop 1. nbd_ prefix read_sync and write_sync are already shared, so it is good to have a namespace prefix. drop_sync will be shared, and read_sync_eof is related to read_sync, so let's rename them all. 2. _sync suffix _sync is related to the fact that nbd_wr_syncv doesn't return if a write to socket returns EAGAIN. The first implementation of nbd_wr_syncv (was wr_sync in `7a5ca8648b`) just loops while getting EAGAIN, the current implementation yields in this case. Why we want to get rid of it: - it is normal for r/w functions to be synchronous, so having an additional suffix for it looks redundant (contrariwise, we have _aio suffix for async functions) - _sync suffix in block layer is used when function does flush (so using it for other thing is confusing a bit) - keep function names short after adding nbd_ prefix 3. for nbd_wr_syncv let's use more common notation 'rw' Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20170602150150.258222-2-vsementsov@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-06-15 11:04:06 +02:00
Jeff Cody	5c3ad1a6a8	block/iscsi: enable filename option and parsing When enabling option parsing and blockdev-add for iscsi, we removed the 'filename' option. Unfortunately, this was a bit optimistic, as previous versions of QEMU allowed the use of the option in backing filenames via json. This means that without parsing this option, we cannot open existing images that used to work fine. See bug: https://bugzilla.redhat.com/show_bug.cgi?id=1457088 Tested-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Message-id: 0789ab6c32814ab4b6896707d378804bd4424c65.1497444637.git.jcody@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-06-14 17:39:46 -04:00
Jeff Cody	91589d9e5c	block/rbd: enable filename option and parsing When enabling option parsing and blockdev-add for rbd, we removed the 'filename' option. Unfortunately, this was a bit optimistic, as previous versions of QEMU allowed the use of the option in backing filenames via json. This means that without parsing this option, we cannot open existing images that used to work fine. See bug: https://bugzilla.redhat.com/show_bug.cgi?id=1457088 Tested-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Message-id: 937dc9fde348d13311eb8e23444df3bc3190b612.1497444637.git.jcody@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-06-14 17:39:46 -04:00
Peter Maydell	e0b4891ae6	-----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJZOpeXAAoJEL2+eyfA3jBXKqYP/jXv/ehVADW3X30CXSPoZsGo rdsdd8pqNwuEYHpFrGclMNbsDWxUsiv8t3MleyVEtl+N0pBVnlZisR3PCONShxxj w0S0/UniAib+j+Y3D7xSHKKrin6CjNVDltrJdX7VOAZsOTj93pVDhd8SlPc3YDSt DrUQoJI96DOwtQofXLTzZkzytbd/9ZN7VzF3j4wJ/72do7thCBybizRLUBEiKM+w RQgcb3Xvob1H9pKSjaNnmH8CKqrISJRfzpXCAzowQy+X8RTTBedn/ZHtiHFYb2XZ 6J/OYM1p4W2X69KMUVJ8wAZsNVNlVWaXsD+lKcBcLtlYIUac7CB/Hcg4mrJDqhzq 752KXvencRKP52HfVLaySXBE2JguTvUrkBDw3/u9W2qUZw3tUexPhH/CDH6wf8XU JhbZk7fm722pfCs/Gmt+9Muz7YkGbT91EWPWZYxbZ1avebmTlq6mCarqTTz9DmXZ psJe7a6D0M0bcQL0//rOH3USbOSUcFascJvuRoll28H+z3oCfC9ILbBZE8xQ9uOM W5mmCL8SoljY+cSIb9NSy0/vG4gqvy4TigaIRu8VT8EJG3Yel8y9YXb4kX5NFV6Z ZuEcgykUnBNzBmiXC+M57r9Qa1J/9GciSOVfH8PdOzhuwWimBANg5WD+B6XcrknD OCOVaV58DMh3ApwfVCkj =XJ/8 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging # gpg: Signature made Fri 09 Jun 2017 13:41:59 BST # gpg: using RSA key 0xBDBE7B27C0DE3057 # gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>" # gpg: aka "Jeffrey Cody <jeff@codyprime.org>" # gpg: aka "Jeffrey Cody <codyprime@gmail.com>" # Primary key fingerprint: 9957 4B4D 3474 90E7 9D98 D624 BDBE 7B27 C0DE 3057 * remotes/cody/tags/block-pull-request: block/gluster.c: Handle qdict_array_entries() failure Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-06-13 12:55:47 +01:00
Peter Maydell	475df9d809	Block layer patches -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJZOorTAAoJEH8JsnLIjy/W+MAP/iOH2GtgtUfH8CLxqWqZUJ5p rmlLLTrooBHUF09BUSCkwD5E0s1El9phvldt9t2l6RaxxSlMwZl1nB0ww9q+z2Aq NY5VBYUov28JnqFqRkOPiJHMo7oRkzgyGVn3RCbevwWudEK3DcSRT5UftDK5UPH3 HwxYY09uxpDcMkXajC1pRG/RJFrfsmbM3pXFYMsOgj8QT5nNdqebaqcYnuTKX9F4 5Hn5O0tRdZZWWvP1QXRAPszmihdh4GN4I1EEfAgjV8Y3TqBApZS4vdVOJXm07Vlx FB6FKEWvK8zMU/0kS8jH5v4A03lRtokESL9Io5K4mwuxiTfNjvb4ZlftGVgUzuFi X5Yn6pPYU8t+rvRNSHF/8UYxX6lcuGL2DlVn/a+47phmQrW46fd8srp1A1c+dtzS jjMvw1tJNLYGP29QWzhaROPVn3Uk4wPbZIU472HB6ZTSC76g+jvGzfLVspnZeUjY 96l3721vv8oa7ZtFvRNT9zTYxAoGXEcgFR7f+E9yVWdISZ0AKdONNIUV0w9AHKNc rDM4XdnG7ouiVaGeT3cdDNjjbb/v3+sR9yUzf9DUd41purmkrRPtGcetd8esf8Jr plIiBvf+mCRDACSYbIF3JKEHGYFasgs+X/OkhBVeeIBJUkl89mFpca5Qs4AF1d6v l7IW3y56umBa00qDgl/R =ey0C -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches # gpg: Signature made Fri 09 Jun 2017 12:47:31 BST # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: block: fix external snapshot abort permission error block/qcow.c: Fix memory leak in qcow_create() qemu-iotests: Test automatic commit job cancel on hot unplug commit: Fix use after free in completion qemu-iotests: Block migration test migration/block: Clean up BBs in block_save_complete() migration: Inactivate images after .save_live_complete_precopy() block: Fix anonymous BBs in blk_root_inactivate() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-06-12 10:43:32 +01:00
Peter Maydell	56faeb9bb6	block/gluster.c: Handle qdict_array_entries() failure In qemu_gluster_parse_json(), the call to qdict_array_entries() could return a negative error code, which we were ignoring because we assigned the result to an unsigned variable. Fix this by using the 'int' type instead, which matches the return type of qdict_array_entries() and also the type we use for the loop enumeration variable 'i'. (Spotted by Coverity, CID 1360960.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1496682098-1540-1-git-send-email-peter.maydell@linaro.org Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-06-09 08:41:29 -04:00
Peter Maydell	272545cf21	block/qcow.c: Fix memory leak in qcow_create() Coverity points out that the code path in qcow_create() for the magic "fat:" backing file name leaks the memory used to store the filename (CID 1307771). Free the memory before we overwrite the pointer. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-06-09 13:46:20 +02:00
Kevin Wolf	19ebd13ed4	commit: Fix use after free in completion The final bdrv_set_backing_hd() could be working on already freed nodes because the commit job drops its references (through BlockBackends) to both overlay_bs and top already a bit earlier. One way to trigger the bug is hot unplugging a disk for which blockdev_mark_auto_del() cancels the block job. Fix this by taking BDS-level references while we're still using the nodes. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2017-06-09 13:46:13 +02:00
Kevin Wolf	93c26503e0	block: Fix anonymous BBs in blk_root_inactivate() blk->name isn't an array, but a pointer that can be NULL. Checking for an anonymous BB must involve a NULL check first, otherwise we get crashes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2017-06-09 11:45:03 +02:00
Paolo Bonzini	6bdcc018a6	nbd: make it thread-safe, fix qcow2 over nbd NBD is not thread safe, because it accesses s->in_flight without a CoMutex. Fixing this will be required for multiqueue. CoQueue doesn't have spurious wakeups but, when another coroutine can run between qemu_co_queue_next's wakeup and qemu_co_queue_wait's re-locking of the mutex, the wait condition can become false and a loop is necessary. In fact, it turns out that the loop is necessary even without this multi-threaded scenario. A particular sequence of coroutine wakeups is happening ~80% of the time when starting a guest with qcow2 image served over NBD (i.e. qemu-nbd --format=raw, and QEMU's -drive option has -format=qcow2). This patch fixes that issue too. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-06-07 18:22:02 +02:00
Vladimir Sementsov-Ogievskiy	be41c100c0	nbd/client.c: use errp instead of LOG Move to modern errp scheme from just LOGging errors. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20170526110913.89098-1-vsementsov@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-06-06 20:18:36 +02:00
Vladimir Sementsov-Ogievskiy	f260956536	nbd: add errp parameter to nbd_wr_syncv() Will be used in following patch to provide actual error message in some cases. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20170516094533.6160-4-vsementsov@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-06-06 20:18:36 +02:00
Niels de Vos	df3a429ae8	gluster: add support for PREALLOC_MODE_FALLOC Add missing support for "preallocation=falloc" to the Gluster block driver. This change bases its logic on that of block/file-posix.c and removed the gluster_supports_zerofill() and qemu_gluster_zerofill() functions in favour of #ifdef checks in an easy to read switch-statement. Both glfs_zerofill() and glfs_fallocate() have been introduced with GlusterFS 3.5.0 (pkg-config glusterfs-api = 6). A #define for the availability of glfs_fallocate() has been added to ./configure. Reported-by: Satheesaran Sundaramoorthi <sasundar@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Message-id: 20170528063114.28691-1-ndevos@redhat.com URL: https://bugzilla.redhat.com/1450759 Signed-off-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-06-02 10:51:47 -04:00
Stefan Hajnoczi	0748b3526e	Block layer patches -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJZLDGTAAoJEH8JsnLIjy/WjY0P/jtXF+SQKs9H695Uy+U+EUr5 9w+lYTfjHXTUx9F5AKT4OAMww/uWSG5ywG8FZD8AJZyt1B8piDrSA7sW5YYgWCV4 XD1HMzN6pasVGrchzgfBSW/K9BgUlvBEjqFrLCNVii0osa68J7vX3uJsLyRA+yPo ijFseJaB/b0KnAPG9JqHXc4DVQgU2IhZH3ZsgE6Utb+KUrm8/veiORhWQboBbfCW eyTF+YWee/rI+up4nKn9+Le57EUXWr3Y50BBxFZw1/4wQbv0WEhOIbl/Z4ggxNRR VoyXGMqOIZFlGQ7bGxDawuEwGKfcOFFEHj3rhPrs7RPnod/6X8Vxm0rAr2GNZoOW 9tLMQKe4HLo3kVwX65AhzWd/WMukAd0MC/0oaULOjiAEdlWjflyEhx9H9uYefl5x kn77ZqQqIEL4aCYRBLJn9oJV3CIZbSd6cuKC9UPuXAwD8XDWTXUzT/aDMJUnfij7 vhS1qks1Wyk37wIjTuuERwrr0K6y8e3De30NeU6LqQuzXvJ1MYSp49BA1FCgHfmT x5RWbTSjLdjZiIo7biE2dSwdOtsxsnuxX19utqfGqOmegNU7LykRtu6mFh9kJhNe 5ladt1Mu//puZLQ/q4RH/J0pwkuS/OVrS3LBobcggPGHhUIiHhdNUI73bb7BK0+9 ME5DGLm4/c/REB3Lidr9 =kkU0 -----END PGP SIGNATURE----- Merge remote-tracking branch 'kwolf/tags/for-upstream' into staging Block layer patches # gpg: Signature made Mon 29 May 2017 03:34:59 PM BST # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * kwolf/tags/for-upstream: block/file-: _parse_filename() and colons block: Fix backing paths for filenames with colons block: Tweak error message related to qemu-img amend qemu-img: Fix leakage of options on error qemu-img: copy *key-secret opts when opening newly created files qemu-img: introduce --target-image-opts for 'convert' command qemu-img: fix --image-opts usage with dd command qemu-img: add support for --object with 'dd' command qemu-img: Fix documentation of convert qcow2: remove extra local_error variable mirror: Drop permissions on s->target on completion nvme: Add support for Controller Memory Buffers iotests: 147: Don't test inet6 if not available qemu-iotests: Test streaming with missing job ID stream: fix crash in stream_start() when block_job_create() fails Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-30 14:15:15 +01:00
Max Reitz	03c320d803	block/file-: _parse_filename() and colons The file drivers' *_parse_filename() implementations just strip the optional protocol prefix off the filename. However, for e.g. "file:foo:bar", this would lead to "foo:bar" being stored as the BDS's filename which looks like it should be managed using the "foo" protocol. This is especially troublesome if you then try to resolve a backing filename based on "foo:bar". This issue can only occur if the stripped part is a relative filename ("file:/foo:bar" will be shortened to "/foo:bar" and having a slash before the first colon means that "/foo" is not recognized as a protocol part). Therefore, we can easily fix it by prepending "./" to such filenames. Before this patch: $ ./qemu-img create -f qcow2 backing.qcow2 64M Formatting 'backing.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 $ ./qemu-img create -f qcow2 -b backing.qcow2 file🔝image.qcow2 Formatting 'file🔝image.qcow2', fmt=qcow2 size=67108864 backing_file=backing.qcow2 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 $ ./qemu-io file🔝image.qcow2 can't open device file🔝image.qcow2: Could not open backing file: Unknown protocol 'top' After this patch: $ ./qemu-io file🔝image.qcow2 [no error] Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170522195217.12991-3-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-29 15:39:54 +02:00
Eric Blake	bcb07dba92	block: Tweak error message related to qemu-img amend When converting a 1.1 image down to 0.10, qemu-iotests 060 forces a contrived failure where allocating a cluster used to replace a zero cluster reads unaligned data. Since it is a zero cluster rather than a data cluster being converted, changing the error message to match our earlier change in 'qcow2: Make distinction between zero cluster types obvious' is worthwhile. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170508171302.17805-1-eblake@redhat.com [mreitz: Commit message fixes] Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-29 15:39:54 +02:00
Alberto Garcia	a7a6a2bffc	qcow2: remove extra local_error variable Commit `d7086422b1` added a local_err variable global to the qcow2_amend_options() function, so there's no need to have this other one. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 20170511150337.21470-1-berto@igalia.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-29 15:39:53 +02:00
Kevin Wolf	63c8ef2890	mirror: Drop permissions on s->target on completion This fixes an assertion failure that was triggered by qemu-iotests 129 on some CI host, while the same test case didn't seem to fail on other hosts. Essentially the problem is that the blk_unref(s->target) in mirror_exit() doesn't necessarily mean that the BlockBackend goes away immediately. It is possible that the job completion was triggered nested in mirror_drain(), which looks like this: BlockBackend *target = s->target; blk_ref(target); blk_drain(target); blk_unref(target); In this case, the write permissions for s->target are retained until after blk_drain(), which makes removing mirror_top_bs fail for the active commit case (can't have a writable backing file in the chain without the filter driver). Explicitly dropping the permissions first means that the additional reference doesn't hurt and the job can complete successfully even if called from the nested blk_drain(). Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-05-29 15:37:26 +02:00
Alberto Garcia	525989a50a	stream: fix crash in stream_start() when block_job_create() fails The code that tries to reopen a BlockDriverState in stream_start() when the creation of a new block job fails crashes because it attempts to dereference a pointer that is known to be NULL. This is a regression introduced in `a170a91fd3`, likely because the code was copied from stream_complete(). Cc: qemu-stable@nongnu.org Reported-by: Kashyap Chamarthy <kchamart@redhat.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Tested-by: Kashyap Chamarthy <kchamart@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-26 16:48:21 +02:00
Jeff Cody	223a23c198	block/gluster: glfs_lseek() workaround On current released versions of glusterfs, glfs_lseek() will sometimes return invalid values for SEEK_DATA or SEEK_HOLE. For SEEK_DATA and SEEK_HOLE, the returned value should be >= the passed offset, or < 0 in the case of error: LSEEK(2): off_t lseek(int fd, off_t offset, int whence); [...] SEEK_HOLE Adjust the file offset to the next hole in the file greater than or equal to offset. If offset points into the middle of a hole, then the file offset is set to offset. If there is no hole past offset, then the file offset is adjusted to the end of the file (i.e., there is an implicit hole at the end of any file). [...] RETURN VALUE Upon successful completion, lseek() returns the resulting offset location as measured in bytes from the beginning of the file. On error, the value (off_t) -1 is returned and errno is set to indicate the error However, occasionally glfs_lseek() for SEEK_HOLE/DATA will return a value less than the passed offset, yet greater than zero. For instance, here are example values observed from this call: offs = glfs_lseek(s->fd, start, SEEK_HOLE); if (offs < 0) { return -errno; /* D1 and (H3 or H4) */ } start == 7608336384 offs == 7607877632 This causes QEMU to abort on the assert test. When this value is returned, errno is also 0. This is a reported and known bug to glusterfs: https://bugzilla.redhat.com/show_bug.cgi?id=1425293 Although this is being fixed in gluster, we still should work around it in QEMU, given that multiple released versions of gluster behave this way. This patch treats the return case of (offs < start) the same as if an error value other than ENXIO is returned; we will assume we learned nothing, and there are no holes in the file. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Message-id: 87c0140e9407c08f6e74b04131b610f2e27c014c.1495560397.git.jcody@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:44:46 -04:00
Paolo Bonzini	f321dcb57f	blockjob: introduce block_job_pause/resume_all Remove use of block_job_pause/resume from outside blockjob.c, thus making them static. The new functions are used by the block layer, so place them in blockjob_int.h. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170508141310.8674-5-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Paolo Bonzini	05b0d8e3b8	blockjob: introduce block_job_early_fail Outside blockjob.c, block_job_unref is only used when a block job fails to start, and block_job_ref is not used at all. The reference counting thus is pretty well hidden. Introduce a separate function to be used by block jobs; because block_job_ref and block_job_unref now become static, move them earlier in blockjob.c. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170508141310.8674-4-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-24 16:38:51 -04:00
Juan Quintela	68ba3b0743	migration: migration.h was not needed This files don't use any function from migration.h, so drop it. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-05-18 19:20:59 +02:00
Stefan Hajnoczi	2ccbd47c1d	migration/next for 20170517 -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJZHCoMAAoJEPSH7xhYctcjyOcQAN82GDYgXj93k40rU/SmZTP7 blelisGsY5UNo33bLZq07fVwwdk1vIR0OIZvjMyGVWptAX49QJ6BVwX2E5zmb9LW AT3rVeyqz8nnC6OwWBxN9bu+sPJ13ibGs1l2j5Kn9jZ6a9rJCC7LOKdo4Dxbs3Uk Obw4f7swsozTQPxeHfrsBgFIvcB8qXLjdxsVhj+IWkmp1KDKVg+TWfNFJx30dK0G ktVsV0Xu6exEzcnzpTf93Bcv8vt49JRrCka9N5YryPTZmFuGgW291lqviPWiZg/W 39F3cga5QfDzcs4Z6Lrz3Qeo/q+2n5G5O23UmrJccZ//UQMdeW9sd5udj211aMeq I7UdrarIHWRCCVTVdVL7AGJ8xmMIKHsvKRWstw7FEMHQ+lD/sFSfpWBtYdGhAotF mf/yncMKb52QbNyIuanoKi8UjU+RCvuslCac87U3fPqz/qYGvhnmO145S/wai1mR +FQQXORJOhdsWDqRRz9q8/uXqPwm173+rHHzMgFa3P1X9u1jfLhjJk0g9sDFtyAb If4IzOwfuCLJyelcuzzy9SSOzDsGu1LcrBoRgqTugX+MSJXFjWOKKfA1wxnAKkPf T2fQIqny2N7VCfpDB1iaCfxnkizIwrYEI3YRkMuJpYU3489x/BJQIILoLo1yEj4G vNhq+qJ9V/Uj8X+X5/cL =A5DU -----END PGP SIGNATURE----- Merge remote-tracking branch 'quintela/tags/migration/20170517' into staging migration/next for 20170517 # gpg: Signature made Wed 17 May 2017 11:46:36 AM BST # gpg: using RSA key 0xF487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * quintela/tags/migration/20170517: migration: Move check_migratable() into qdev.c migration: Move postcopy stuff to postcopy-ram.c migration: Move page_cache.c to migration/ migration: Create migration/blocker.h ram: Rename RAM_SAVE_FLAG_COMPRESS to RAM_SAVE_FLAG_ZERO migration: Pass Error ** argument to {save,load}_vmstate migration: Fix regression with compression threads Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-18 10:05:52 +01:00
Stefan Hajnoczi	fefb28a471	-----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJZGx79AAoJEL2+eyfA3jBXYmwP/0hyCtpaXU8eLFNarnedv6M+ iDEFULUuYtiY2Jd3brxoVifIkP5Z4P6XEtCwoPKUZhu7yq+OySs/f/ewgskQOr0F YRYX2CSiVFIuT816mnYnpkrUwOzHPzewOF+99CgQD6IAEg1cXVgzscOpHnWW7Eaz vkrvT0C0maxm3ClCLcRFJWPwUsFAZR+2e3lf/1DWjxgkPx5KJW6tMatdQeuBJCMe 5y/A4+bMOZyRHDppp1YMh2ijeJ230mvRnjNadUhfXtIKKvHDpulFVZFaw/Bh/3+C FSZYKNrgQw4lzBc2Bao2q5nP69fDlfZiKABjumbyib4yfkgOcaCRnseB1xHdVYsz WHKeLufFvwsnXgDVHz/DoYB5wyJG9I7oHNj4VE6BHWsszR6UhlN+XsbtK6Sl5CKG wv8z3D8U3e0ki/jufta2+Yh4bvMwrEofjzPh+g1T1Gme+BVubDtAV16w0UcNbAAl O3quUnQA1PWA2jPIdqE28Z2gdAywun7nLEV8fxOL5NNnWrHDEH4EZvHb1l/Lcjgn W1Dx/vRDpd1cJFwfh3NP+n+zIhaT36ZJ2rnpJfWvsJTJn346gW5XPcgYChqqEMle dOnjPqXceOZHFBsvWSBPFX0qfNPOsTBtwtTqOggZsF5W9aCkvUPi3SRLe8fSlUkH 8p06ejEr4VTZkeE2s5WM =joxq -----END PGP SIGNATURE----- Merge remote-tracking branch 'jtc/tags/block-pull-request' into staging # gpg: Signature made Tue 16 May 2017 04:47:09 PM BST # gpg: using RSA key 0xBDBE7B27C0DE3057 # gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>" # gpg: aka "Jeffrey Cody <jeff@codyprime.org>" # gpg: aka "Jeffrey Cody <codyprime@gmail.com>" # Primary key fingerprint: 9957 4B4D 3474 90E7 9D98 D624 BDBE 7B27 C0DE 3057 * jtc/tags/block-pull-request: curl: do not do aio_poll when waiting for a free CURLState curl: convert readv to coroutines curl: convert CURLAIOCB to byte values curl: split curl_find_state/curl_init_state curl: avoid recursive locking of BDRVCURLState mutex curl: never invoke callbacks with s->mutex held curl: strengthen assertion in curl_clean_state block: curl: Allow passing cookies via QCryptoSecret Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-17 13:52:07 +01:00
Juan Quintela	795c40b8bd	migration: Create migration/blocker.h This allows us to remove lots of includes of migration/migration.h Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-05-17 12:04:59 +02:00
Paolo Bonzini	2bb5c936c5	curl: do not do aio_poll when waiting for a free CURLState Instead, put the CURLAIOCB on a wait list and yield; curl_clean_state will wake the corresponding coroutine. Because of CURL's callback-based structure, we cannot easily convert everything to CoMutex/CoQueue; keeping the QemuMutex is simpler. However, CoQueue is a simple wrapper around a linked list, so we can easily use QSIMPLEQ and open-code a CoQueue, protected by the BDRVCURLState QemuMutex instead of a CoMutex. Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170515100059.15795-8-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-16 10:34:50 -04:00
Paolo Bonzini	28256d8246	curl: convert readv to coroutines This is pretty simple. The bottom half goes away because, unlike bdrv_aio_readv, coroutine-based read can return immediately without yielding. However, for simplicity I kept the former bottom half handler in a separate function. Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170515100059.15795-7-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-16 10:34:50 -04:00
Paolo Bonzini	2125e5ea6e	curl: convert CURLAIOCB to byte values This is in preparation for the conversion from bdrv_aio_readv to bdrv_co_preadv, and it also requires changing some of the size_t values to uint64_t. This was broken before for disks > 2TB, but now it would break at 4GB. Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170515100059.15795-6-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-16 10:34:50 -04:00
Paolo Bonzini	3ce6a729b5	curl: split curl_find_state/curl_init_state If curl_easy_init fails, a CURLState is left with s->in_use = 1. Split curl_init_state in two, so that we can distinguish the two failures and call curl_clean_state if needed. While at it, simplify curl_find_state, removing a dummy loop. The aio_poll loop is moved to the sole caller that needs it. Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170515100059.15795-5-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-16 10:34:45 -04:00
Gerd Hoffmann	cdece0467c	block/win32: fix 'ret not initialized' warning Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170516074256.24731-1-kraxel@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-16 15:34:18 +01:00
Paolo Bonzini	456af34629	curl: avoid recursive locking of BDRVCURLState mutex The curl driver has a ugly hack where, if it cannot find an empty CURLState, it just uses aio_poll to wait for one to be empty. This is probably buggy when used together with dataplane, and the simplest way to fix it is to use coroutines instead. A more immediate effect of the bug however is that it can cause a recursive call to curl_readv_bh_cb and recursively taking the BDRVCURLState mutex. This causes a deadlock. The fix is to unlock the mutex around aio_poll, but for cleanliness we should also take the mutex around all calls to curl_init_state, even if reaching the unlock/lock pair is impossible. The same is true for curl_clean_state. Reported-by: Kun Wei <kuwei@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170515100059.15795-4-pbonzini@redhat.com Cc: qemu-stable@nongnu.org Cc: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-16 10:34:17 -04:00
Paolo Bonzini	34db05e7ff	curl: never invoke callbacks with s->mutex held All curl callbacks go through curl_multi_do, and hence are called with s->mutex held. Note that with comments, and make curl_read_cb drop the lock before invoking the callback. Likewise for curl_find_buf, where the callback can be invoked by the caller. Cc: qemu-stable@nongnu.org Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170515100059.15795-3-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-16 10:34:17 -04:00
Paolo Bonzini	675a775633	curl: strengthen assertion in curl_clean_state curl_clean_state should only be called after all AIOCBs have been completed. This is not so obvious for the call from curl_detach_aio_context, so assert that. Cc: qemu-stable@nongnu.org Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170515100059.15795-2-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-16 10:34:03 -04:00
Peter Krempa	327c8ebd70	block: curl: Allow passing cookies via QCryptoSecret Since cookies can contain sensitive data (session ID, etc ...) it is desired to hide them from the prying eyes of users. Add a possibility to pass them via the secret infrastructure. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1447413 Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: f4a22cdebdd0bca6a13a43a2a6deead7f2ec4bb3.1493906281.git.pkrempa@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-05-16 10:31:08 -04:00
Stefan Hajnoczi	b54933eed5	-----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJZFciYAAoJEJykq7OBq3PINCgH/1IXhfLR6AtPKAa0b2oABWEW lD90HiHIxeqYIDiVnppD6CEnIZ6VBmImSyRr6V9Fk7OTkCucVRLOXFZF63d7ETDo PBF8+FJ8YgdClJ80f3HkmPu6vzvnRzcs5mUBrrKhZPAy2adxNmIpfir/78p+Aomd HxCYl3jF6aVUtFEtzTwAmDjXU2NUNPQABAAKiGT5jwSUcksxogg7LSIM/ZlFzNhc ZMI9h6doRP0v3kLyh429qVEgXpCh7VaD9Swv6H3/sSHJdCBP5MILrTADrK+45tmL nIxHruOyt63Jgd8acGDdvYz1oHjPXb2Cn/87SiONsqwoyFJmkqbVXvOg4gqPuJ8= =yXgr -----END PGP SIGNATURE----- Merge tag 'block-pull-request' into staging # gpg: Signature made Fri 12 May 2017 10:37:12 AM EDT # gpg: using RSA key 0x9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * tag 'block-pull-request': aio: add missing aio_notify() to aio_enable_external() block: Simplify BDRV_BLOCK_RAW recursion coroutine: remove GThread implementation Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-12 10:39:23 -04:00
Stefan Hajnoczi	3753e255da	Block layer patches -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJZFHXJAAoJEH8JsnLIjy/WvdsP/3mTGDZJO4M0HXWBWYQBwdGE EuMyGxn8Y0IUIBNx8sogAY9yQ/aQUh1b3bLWg7MAvZvQpi2S448Ak/2agaRFTWXJ vVv1nP0uigfgElSZZKTyj2SyhghrGkBou98hyYOd05YAApXmxoBKvTgC1di0Ec5K 86Mc91Cr3udh4Q5w5Ss2IxiN+fNoHhHdOOm8gkWM4iQl6fmDJ8lBBxoiU+K2uXX+ /flAVRgogFglKb0EAq8eSdy4qFRyp1G43yzOJKNiM3BuYeXpipnyO5mY3ynrEv5Y 3213hJ/rriY15xHcKtkPxEqd32iX2qCRGFXWMMiq4r527JG/B40SEJQaxfHu33U2 Et3+0MxoDHFseXE2sSjs8c0DKQiU5/hf7RkzqOMPMNluIX4ykrSyOdKzj4EFEowr GGLobm4Hr/rv8YDhRhllyKZ6Xzrjihy5WH5T9+HVfOlf35AxjJGZYqfCouhCiP15 0Nxz3qWfDH/U4zvhY+PIfQ51kvGl02oMZF6uu48kvzwdbIKgnImKg5HbG5R85mYQ BFqCjaEWjSUWxrqbZ/uRBUqavCLJm9C3mEWLCzGBmXNNs4Th8kXftuo1br+QHtr5 aT7xoxUJs/nmBoCKYhlCSrxjqCa0clenvNeiO4Q44UxWmOFQi9SM5myeSuCIaa+S qnYHZLZ74gVZpQaoPkjE =SA0O -----END PGP SIGNATURE----- Merge remote-tracking branch 'kwolf/tags/for-upstream' into staging Block layer patches # gpg: Signature made Thu 11 May 2017 10:31:37 AM EDT # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * kwolf/tags/for-upstream: (58 commits) MAINTAINERS: Add qemu-progress to the block layer qcow2: Discard/zero clusters by byte count qcow2: Assert that cluster operations are aligned qcow2: Optimize write zero of unaligned tail cluster iotests: Add test 179 to cover write zeroes with unmap iotests: Improve _filter_qemu_img_map qcow2: Optimize zero_single_l2() to minimize L2 churn qcow2: Make distinction between zero cluster types obvious qcow2: Name typedef for cluster type qcow2: Correctly report status of preallocated zero clusters block: Update comments on BDRV_BLOCK_* meanings qcow2: Use consistent switch indentation qcow2: Nicer variable names in qcow2_update_snapshot_refcount() tests: Add coverage for recent block geometry fixes blkdebug: Add ability to override unmap geometries blkdebug: Simplify override logic blkdebug: Add pass-through write_zero and discard support blkdebug: Refactor error injection blkdebug: Sanity check block layer guarantees qemu-io: Switch 'map' output to byte-based reporting ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-12 10:39:08 -04:00
Eric Blake	ee29d6adef	block: Simplify BDRV_BLOCK_RAW recursion Since we are already in coroutine context during the body of bdrv_co_get_block_status(), we can shave off a few layers of wrappers when recursing to query the protocol when a format driver returned BDRV_BLOCK_RAW. Note that we are already using the correct recursion later on in the same function, when probing whether the protocol layer is sparse in order to find out if we can add BDRV_BLOCK_ZERO to an existing BDRV_BLOCK_DATA\|BDRV_BLOCK_OFFSET_VALID. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170504173745.27414-1-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-12 10:36:46 -04:00
Eric Blake	d2cb36af2b	qcow2: Discard/zero clusters by byte count Passing a byte offset, but sector count, when we ultimately want to operate on cluster granularity, is madness. Clean up the external interfaces to take both offset and count as bytes, while still keeping the assertion added previously that the caller must align the values to a cluster. Then rename things to make sure backports don't get confused by changed units: instead of qcow2_discard_clusters() and qcow2_zero_clusters(), we now have qcow2_cluster_discard() and qcow2_cluster_zeroize(). The internal functions still operate on clusters at a time, and return an int for number of cleared clusters; but on an image with 2M clusters, a single L2 table holds 256k entries that each represent a 2M cluster, totalling well over INT_MAX bytes if we ever had a request for that many bytes at once. All our callers currently limit themselves to 32-bit bytes (and therefore fewer clusters), but by making this function 64-bit clean, we have one less place to clean up if we later improve the block layer to support 64-bit bytes through all operations (with the block layer auto-fragmenting on behalf of more-limited drivers), rather than the current state where some interfaces are artificially limited to INT_MAX at a time. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-13-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	f10ee139ad	qcow2: Assert that cluster operations are aligned We already audited (in commit `0c1bd469`) that qcow2_discard_clusters() is only passed cluster-aligned start values; but we can further tighten the assertion that the only unaligned end value is at EOF. Recent commits have taken advantage of an unaligned tail cluster, for both discard and write zeroes. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-12-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	fbaa6bb3d3	qcow2: Optimize write zero of unaligned tail cluster We've already improved discards to operate efficiently on the tail of an unaligned qcow2 image; it's time to make a similar improvement to write zeroes. The special case is only valid at the tail cluster of a file, where we must recognize that any sectors beyond the image end would implicitly read as zero, and therefore should not penalize our logic for widening a partial cluster into writing the whole cluster as zero. However, note that for now, the special case of end-of-file is only recognized if there is no backing file, or if the backing file has the same length; that's because when the backing file is shorter than the active layer, we don't have code in place to recognize that reads of a sector unallocated at the top and beyond the backing end-of-file are implicitly zero. It's not much of a real loss, because most people don't use images that aren't cluster-aligned, or where the active layer is a different size than the backing layer (especially where the difference falls within a single cluster). Update test 154 to cover the new scenarios, using two images of intentionally differing length. While at it, fix the test to gracefully skip when run as ./check -qcow2 -o compat=0.10 154 since the older format lacks zero clusters already required earlier in the test. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-11-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	06cc5e2b2d	qcow2: Optimize zero_single_l2() to minimize L2 churn Similar to discard_single_l2(), we should try to avoid dirtying the L2 cache when the cluster we are changing already has the right characteristics. Note that by the time we get to zero_single_l2(), BDRV_REQ_MAY_UNMAP is a requirement to unallocate a cluster (this is because the block layer clears that flag if discard.* flags during open requested that we never punch holes - see the conversation around commit `170f4b2e`, https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07306.html). Therefore, this patch can only reuse a zero cluster as-is if either unmapping is not requested, or if the zero cluster was not associated with an allocation. Technically, there are some cases where an unallocated cluster already reads as all zeroes (namely, when there is no backing file [easy: check bs->backing], or when the backing file also reads as zeroes [harder: we can't check bdrv_get_block_status since we are already holding the lock]), where the guest would not immediately see a difference if we left that cluster unallocated. But if the user did not request unmapping, leaving an unallocated cluster is wrong; and even if the user DID request unmapping, keeping a cluster unallocated risks a subtle semantic change of guest-visible contents if a backing file is later added, and it is not worth auditing whether all internal uses such as mirror properly avoid an unmap request. Thus, this patch is intentionally limited to just clusters that are already marked as zero. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-8-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	fdfab37dfe	qcow2: Make distinction between zero cluster types obvious Treat plain zero clusters differently from allocated ones, so that we can simplify the logic of checking whether an offset is present. Do this by splitting QCOW2_CLUSTER_ZERO into two new enums, QCOW2_CLUSTER_ZERO_PLAIN and QCOW2_CLUSTER_ZERO_ALLOC. I tried to arrange the enum so that we could use 'ret <= QCOW2_CLUSTER_ZERO_PLAIN' for all unallocated types, and 'ret >= QCOW2_CLUSTER_ZERO_ALLOC' for allocated types, although I didn't actually end up taking advantage of the layout. In many cases, this leads to simpler code, by properly combining cases (sometimes, both zero types pair together, other times, plain zero is more like unallocated while allocated zero is more like normal). Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170507000552.20847-7-eblake@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:07 +02:00
Eric Blake	3ef9521893	qcow2: Name typedef for cluster type Although it doesn't add all that much type safety (this is C, after all), it does add a bit of legibility to use the name QCow2ClusterType instead of a plain int. In particular, qcow2_get_cluster_offset() has an overloaded return type; a QCow2ClusterType on success, and -errno on failure; keeping the cluster type in a separate variable makes it slightly easier for the next patch to make further computations based on the type. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170507000552.20847-6-eblake@redhat.com [mreitz: Use the new type in two more places (one of them pulled from the next patch)] Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	4341df8a83	qcow2: Correctly report status of preallocated zero clusters We were throwing away the preallocation information associated with zero clusters. But we should be matching the well-defined semantics in bdrv_get_block_status(), where (BDRV_BLOCK_ZERO \| BDRV_BLOCK_OFFSET_VALID) informs the user which offset is reserved, while still reminding the user that reading from that offset is likely to read garbage. count_contiguous_clusters_by_type() is now used only for unallocated cluster runs, hence it gets renamed and tightened. Making this change lets us see which portions of an image are zero but preallocated, when using qemu-img map --output=json. The --output=human side intentionally ignores all zero clusters, whether or not they are preallocated. The fact that there is no change to qemu-iotests './check -qcow2' merely means that we aren't yet testing this aspect of qemu-img; a later patch will add a test. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-5-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	bbd995d830	qcow2: Use consistent switch indentation Fix a couple of inconsistent indentations, before an upcoming patch further tweaks the switch statements. (best viewed with 'git diff -b'). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-3-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	b32cbae111	qcow2: Nicer variable names in qcow2_update_snapshot_refcount() In order to keep checkpatch happy when the next patch changes indentation, we first have to shorten some long lines. The easiest approach is to use a new variable in place of 'offset & L2E_OFFSET_MASK', except that 'offset' is the best name for that variable. Change '[old_]offset' to '[old_]entry' to make room. While touching things, also fix checkpatch warnings about unusual 'for' statements. Suggested by Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170507000552.20847-2-eblake@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	430b26a82d	blkdebug: Add ability to override unmap geometries Make it easier to simulate various unusual hardware setups (for example, recent commits `3482b9b` and `b8d0a98` affect the Dell Equallogic iSCSI with its 15M preferred and maximum unmap and write zero sizing, or `b2f95fe` deals with the Linux loopback block device having a max_transfer of 64k), by allowing blkdebug to wrap any other device with further restrictions on various alignments. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170429191419.30051-9-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	3dc834f879	blkdebug: Simplify override logic Rather than store into a local variable, then copy to the struct if the value is valid, then reporting errors otherwise, it is simpler to just store into the struct and report errors if the value is invalid. This however requires that the struct store a 64-bit number, rather than a narrower type. Likewise, setting a sane errno value in ret prior to the sequence of parsing and jumping to out: on error makes it easier for the next patch to add a chain of similar checks. Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170429191419.30051-8-eblake@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	63188c2450	blkdebug: Add pass-through write_zero and discard support In order to test the effects of artificial geometry constraints on operations like write zero or discard, we first need blkdebug to manage these actions. It also allows us to inject errors on those operations, just like we can for read/write/flush. We can also test the contract promised by the block layer; namely, if a device has specified limits on alignment or maximum size, then those limits must be obeyed (for now, the blkdebug driver merely inherits limits from whatever it is wrapping, but the next patch will further enhance it to allow specific limit overrides). This patch intentionally refuses to service requests smaller than the requested alignments; this is because an upcoming patch adds a qemu-iotest to prove that the block layer is correctly handling fragmentation, but the test only works if there is a way to tell the difference at artificial alignment boundaries when blkdebug is using a larger-than-default alignment. If we let the blkdebug layer always defer to the underlying layer, which potentially has a smaller granularity, the iotest will be thwarted. Tested by setting up an NBD server with export 'foo', then invoking: $ ./qemu-io qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo qemu-io> d 0 15M qemu-io> w -z 0 15M Pre-patch, the server never sees the discard (it was silently eaten by the block layer); post-patch it is passed across the wire. Likewise, pre-patch the write is always passed with NBD_WRITE (with 15M of zeroes on the wire), while post-patch it can utilize NBD_WRITE_ZEROES (for less traffic). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170429191419.30051-7-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	d157ed5f72	blkdebug: Refactor error injection Rather than repeat the logic at each caller of checking if a Rule exists that warrants an error injection, fold that logic into inject_error(); and rename it to rule_check() for legibility. This will help the next patch, which adds two more callers that need to check rules for the potential of injecting errors. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170429191419.30051-6-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Eric Blake	e0ef439588	blkdebug: Sanity check block layer guarantees Commits `04ed95f4` and `1a62d0ac` updated the block layer to auto-fragment any I/O to fit within device boundaries. Additionally, when using a minimum alignment of 4k, we want to ensure the block layer does proper read-modify-write rather than requesting I/O on a slice of a sector. Let's enforce that the contract is obeyed when using blkdebug. For now, blkdebug only allows alignment overrides, and just inherits other limits from whatever device it is wrapping, but a future patch will further enhance things. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170429191419.30051-5-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-05-11 14:28:06 +02:00
Kevin Wolf	22d5cd82e9	file-posix: Remove .bdrv_inactivate/invalidate_cache Now that the block layer takes care to request a lot less permissions for inactive nodes, the special-casing in file-posix isn't necessary any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Kevin Wolf	cfa1a5723f	block: Drop permissions when migration completes With image locking, permissions affect other qemu processes as well. We want to be sure that the destination can run, so let's drop permissions on the source when migration completes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Kevin Wolf	4417ab7adf	block: New BdrvChildRole.activate() for blk_resume_after_migration() Instead of manually calling blk_resume_after_migration() in migration code after doing bdrv_invalidate_cache_all(), integrate the BlockBackend activation with cache invalidation into a single function. This is achieved with a new callback in BdrvChildRole that is called by bdrv_invalidate_cache_all(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Max Reitz	293073a56c	qcow2: Discard preallocated zero clusters In discard_single_l2(), we completely discard normal clusters instead of simply turning them into preallocated zero clusters. That means we should probably do the same with such preallocated zero clusters: Discard them instead of keeping them allocated. Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 12:08:24 +02:00
Max Reitz	564a6b6938	qcow2: Reuse preallocated zero clusters Instead of just freeing preallocated zero clusters and completely allocating them from scratch, reuse them. We cannot do this in handle_copied(), however, since this is a COW operation. Therefore, we have to add the new logic to handle_alloc() and simply return the existing offset if it exists. The only catch is that we have to convince qcow2_alloc_cluster_link_l2() not to free the old clusters (because we have reused them). Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 12:08:24 +02:00
Max Reitz	92413c16be	qcow2: Fix preallocation size formula When calculating the number of reftable entries, we should actually use the number of refblocks and not (wrongly[1]) re-calculate it. [1] "Wrongly" means: Dividing the number of clusters by the number of entries per refblock and rounding down instead of up. Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 12:08:24 +02:00
Fam Zheng	244a566810	file-posix: Add image locking to perm operations This extends the permission bits of op blocker API to external using Linux OFD locks. Each permission in @perm and @shared_perm is represented by a locked byte in the image file. Requesting a permission in @perm is translated to a shared lock of the corresponding byte; rejecting to share the same permission is translated to a shared lock of a separate byte. With that, we use 2x number of bytes of distinct permission types. virtlockd in libvirt locks the first byte, so we do locking from a higher offset. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 11:15:32 +02:00
Fam Zheng	1c3a555c35	file-win32: Error out if locking=on We share the same set of QAPI options with file-posix, but locking is not supported here. So error out if it is specified as 'on' for now. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 11:08:41 +02:00
Fam Zheng	16b48d5d66	file-posix: Add 'locking' option Making this option available even before implementing it will let converting tests easier: in coming patches they can specify the option already when necessary, before we actually write code to lock the images. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-05-11 11:08:40 +02:00
Stefan Hajnoczi	f465706e59	trivial patches for 2017-05-10 -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEe3O61ovnosKJMUsicBtPaxppPlkFAlkSvwIPHG1qdEB0bHMu bXNrLnJ1AAoJEHAbT2saaT5Zys4IAMZLWicv1c7O3m1ajmmg7iGfRbsajcx9FSBi NxdrqG3zgV10gz8/R7goMYGkeFs8MAoDfagbBkXgwFgA31M+ecOe93XyoOQLpe9/ 43fx2u8exVdruIb60F5yDEd51RLwK2C4Iz7SVNRoVWMqDcMOCuC+WBog+AbTB0V+ 19RjhKStMyXMXPYVO0bLhQIcH+ixFLUljbpwDvz5FKor5NqGG+FzHjmwYciiTbr3 o7Z3OIMWT7rDr9V5/553miiNP9ufG3fJreMyXDrTkFRVmDZaqRBp+tvdrYcb77ed /DDxC5vafgCRzwsrmCIsIQXV0janFGDQiqbR+hzBMBG1RTRoBiM= =AAfU -----END PGP SIGNATURE----- Merge remote-tracking branch 'mjt/tags/trivial-patches-fetch' into staging trivial patches for 2017-05-10 # gpg: Signature made Wed 10 May 2017 03:19:30 AM EDT # gpg: using RSA key 0x701B4F6B1A693E59 # gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>" # gpg: aka "Michael Tokarev <mjt@corpit.ru>" # gpg: aka "Michael Tokarev <mjt@debian.org>" # Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D 4324 457C E0A0 8044 65C5 # Subkey fingerprint: 7B73 BAD6 8BE7 A2C2 8931 4B22 701B 4F6B 1A69 3E59 * mjt/tags/trivial-patches-fetch: (23 commits) tests: Remove redundant assignment MAINTAINERS: Update paths for AioContext implementation MAINTAINERS: Update paths for main loop jazz_led: fix bad snprintf tests: Ignore another built executable (test-hmp) scripts: Switch to more portable Perl shebang scripts/qemu-binfmt-conf.sh: Fix shell portability issue virtfs: allow a device id to be specified in the -virtfs option hw/core/generic-loader: Fix crash when running without CPU virtio-blk: Remove useless condition around g_free() qemu-doc: Fix broken URLs of amnhltm.zip and dosidle210.zip use _Static_assert in QEMU_BUILD_BUG_ON channel-file: fix wrong parameter comments block: Make 'replication_state' an enum util: Use g_malloc/g_free in envlist.c qga: fix compiler warnings (clang 5) device_tree: fix compiler warnings (clang 5) usb-ccid: make ccid_write_data_block() cope with null buffers tests: Ignore more test executables Add 'none' as type for drive's if option ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-10 12:31:19 -04:00
Stefan Hajnoczi	1effe6ad5e	Merge qcrypto 2017/05/09 v1 -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJZEceTAAoJEL6G67QVEE/fNHUQAJ4iSzs2SsHSk/4TXensnC8s ySRNRrn13wx7NmUdus8zYeGL2nSEs4D4z5uu7xLJ+5TlBHKhWoekO+twnIj3y82P VbjDWiB26JnY/nkhZE1UzqE5Ix3iDBKuTJybGeql0059RGu34Itof76rNRq9Tyhz xeoNXd/6v5plk093B9ZJjXTcCUpDHxdhvpE3C7Dc1D6Gihx4yPn6VbM68ROCqFc/ /1exUFrhGdQvo3GpeGztU6nLruJqT5AtHNohxtNMBrWOQDinSwJl75BSh8UmhzuG pBFqWMMHaEGq2DhHuFqmTJQxvfz/NKH7NaWRwv5aCZxXB1qd7HweGHgc+XRSqA88 /2O3+jwNb4UC1BZjvAOWa4t87FhRxnqLs2byZtGt3hYLUJ7vDuZ+OTGwSezY9xy9 B9ruwY0RF090wnLgx52xOE9QoLNKX6KPZic2gK74QKkMlpiUZMFuAU6KQhhMfQ4H IsxhsTsPeQJ672FtprnvNq6yWQYDFsO8vSl00GzJ9kIcwv5Kak/bTSSN1yLRkFzr Dc+ymeAKqqBoTWl5qN9Rn/yagjInxPsdnGjVP3TktIaBzW7D2mhnP6vckoAw1gJF 2mOh4K5CiWombX1e4sxBzzrUb9x6FmkK0GqPpExlapm2rZfGfbTWUha698eNRsoY eXHSXmLNuZ6R7nyo5wBH =R5f2 -----END PGP SIGNATURE----- Merge remote-tracking branch 'danpb/tags/pull-qcrypto-2017-05-09-1' into staging Merge qcrypto 2017/05/09 v1 # gpg: Signature made Tue 09 May 2017 09:43:47 AM EDT # gpg: using RSA key 0xBE86EBB415104FDF # gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>" # gpg: aka "Daniel P. Berrange <berrange@redhat.com>" # Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF * danpb/tags/pull-qcrypto-2017-05-09-1: crypto: qcrypto_random_bytes() now works on windows w/o any other crypto libs crypto: move 'opaque' parameter to (nearly) the end of parameter list List SASL config file under the cryptography maintainer's realm Default to GSSAPI (Kerberos) instead of DIGEST-MD5 for SASL Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-10 11:22:13 -04:00
Daniel P. Berrange	e4a3507e86	crypto: move 'opaque' parameter to (nearly) the end of parameter list Previous commit moved 'opaque' to be the 2nd parameter in the list: commit `375092332e` Author: Fam Zheng <famz@redhat.com> Date: Fri Apr 21 20:27:02 2017 +0800 crypto: Make errp the last parameter of functions Move opaque to 2nd instead of the 2nd to last, so that compilers help check with the conversion. this puts it back to the 2nd to last position. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2017-05-09 14:41:47 +01:00
Markus Armbruster	bd269ebc82	sockets: Limit SocketAddressLegacy to external interfaces SocketAddressLegacy is a simple union, and simple unions are awkward: they have their variant members wrapped in a "data" object on the wire, and require additional indirections in C. SocketAddress is the equivalent flat union. Convert all users of SocketAddressLegacy to SocketAddress, except for existing external interfaces. See also commit fce5d53..9445673 and 85a82e8..c5f1ae3. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1493192202-3184-7-git-send-email-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [Minor editing accident fixed, commit message and a comment tweaked] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-05-09 09:14:40 +02:00
Markus Armbruster	62cf396b5d	sockets: Rename SocketAddressFlat to SocketAddress Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1493192202-3184-6-git-send-email-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2017-05-09 09:14:40 +02:00
Markus Armbruster	dfd100f242	sockets: Rename SocketAddress to SocketAddressLegacy The next commit will rename SocketAddressFlat to SocketAddress, and the commit after that will replace most uses of SocketAddressLegacy by SocketAddress, replacing most of this commit's renames right back. Note that checkpatch emits a few "line over 80 characters" warnings. The long lines are all temporary; the SocketAddressLegacy replacement will shorten them again. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1493192202-3184-5-git-send-email-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-05-09 09:14:40 +02:00
Markus Armbruster	0785bd7a7c	sockets: Prepare inet_parse() for flattened SocketAddress I'm going to flatten SocketAddress: rename SocketAddress to SocketAddressLegacy, SocketAddressFlat to SocketAddress, eliminate SocketAddressLegacy except in external interfaces. inet_parse() returns a newly allocated InetSocketAddress. Lift the allocation from inet_parse() into its caller socket_parse() to prepare for flattening SocketAddress. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1493192202-3184-3-git-send-email-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [Straightforward rebase]	2017-05-09 09:14:40 +02:00
Eric Blake	46f5ac205a	qobject: Use simpler QDict/QList scalar insertion macros We now have macros in place to make it less verbose to add a scalar to QDict and QList, so use them. Patch created mechanically via: spatch --sp-file scripts/coccinelle/qobject.cocci \ --macro-file scripts/cocci-macro-file.h --dir . --in-place then touched up manually to fix a couple of '?:' back to original spacing, as well as avoiding a long line in monitor.c. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20170427215821.19397-7-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-05-09 09:13:51 +02:00
Eric Blake	de6e7951fe	qobject: Drop useless QObject casts We have macros in place to make it less verbose to add a subtype of QObject to both QDict and QList. While we have made cleanups like this in the past (see commit `fcfcd8ffc`, for example), having it be automated by Coccinelle makes it easier to maintain. Patch created mechanically via: spatch --sp-file scripts/coccinelle/qobject.cocci \ --macro-file scripts/cocci-macro-file.h --dir . --in-place then I verified that no manual touchups were required. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20170427215821.19397-5-eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-05-08 20:32:14 +02:00
Fam Zheng	3c76c606da	block: Make 'replication_state' an enum BDRVReplicationState.replication_state is a name with a bit of duplication, plus it could be an enum like BDRVReplicationState.mode, which is more readable and also more straightforward in a debugger. Rename it, and improve the type while at it. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-05-07 09:57:51 +03:00
Eric Blake	048c5fd1bf	qcow2: Allow discard of final unaligned cluster As mentioned in commit `0c1bd46`, we ignored requests to discard the trailing cluster of an unaligned image. While discard is an advisory operation from the guest standpoint, (and we are therefore free to ignore any request), our qcow2 implementation exploits the fact that a discarded cluster reads back as 0. As long as we discard on cluster boundaries, we are fine; but that means we could observe non-zero data leaked at the tail of an unaligned image. Enhance iotest 66 to cover this case, and fix the implementation to honor a discard request on the final partial cluster. Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170407013709.18440-1-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-28 16:02:03 +02:00
Max Reitz	f59adb3256	block: Add .bdrv_truncate() error messages Add missing error messages for the block driver implementations of .bdrv_truncate(); drop the generic one from block.c's bdrv_truncate(). Since one of these changes touches a mis-indented block in block/file-posix.c, this patch fixes that coding style issue along the way. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170328205129.15138-5-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-28 16:02:03 +02:00
Max Reitz	4bff28b81a	block: Add errp to BD.bdrv_truncate() Add an Error parameter to the block drivers' bdrv_truncate() interface. If a block driver does not set this in case of an error, the generic bdrv_truncate() implementation will do so. Where it is obvious, this patch also makes some block drivers set this value. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170328205129.15138-4-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-28 16:02:03 +02:00
Max Reitz	ed3d2ec98a	block: Add errp to b{lk,drv}_truncate() For one thing, this allows us to drop the error message generation from qemu-img.c and blockdev.c and instead have it unified in bdrv_truncate(). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170328205129.15138-3-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-28 16:02:02 +02:00
Max Reitz	55b9392b98	block/vhdx: Make vhdx_create() always set errp This patch makes vhdx_create() always set errp in case of an error. It also adds errp parameters to vhdx_create_bat() and vhdx_create_new_region_table() so we can pass on the error object generated by blk_truncate() as of a future commit. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20170328205129.15138-2-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-28 16:02:02 +02:00
Denis V. Lunev	f13ce1be35	block: fix alignment calculations in bdrv_co_do_zero_pwritev tail_padding_bytes is calculated wrong. F.e. for offset = 0 bytes = 2048 align = 512 we will have tail_padding_bytes = 512 which is definitely wrong. The patch fixes that arithmetics. Fortunately this problem is harmless, we will have 1 extra allocation and free thus there is no need to put this into stable. The problem is here from the very beginning. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-04-27 16:24:01 +02:00
Max Reitz	de234897b6	block: Do not unref bs->file on error in BD's open The block layer takes care of removing the bs->file child if the block driver's bdrv_open()/bdrv_file_open() implementation fails. The block driver therefore does not need to do so, and indeed should not unless it sets bs->file to NULL afterwards -- because if this is not done, the bdrv_unref_child() in bdrv_open_inherit() will dereference the freed memory block at bs->file afterwards, which is not good. We can now decide whether to add a "bs->file = NULL;" after each of the offending bdrv_unref_child() invocations, or just drop them altogether. The latter is simpler, so let's do that. Cc: qemu-stable <qemu-stable@nongnu.org> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-04-27 16:12:13 +02:00
Fam Zheng	e914404efb	block: Remove NULL check in bdrv_co_flush Reported by Coverity. We already use bs in bdrv_inc_in_flight before checking for NULL. It is unnecessary as all callers pass non-NULL bs, so drop it. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-04-27 15:39:50 +02:00
Max Reitz	362b3786eb	Revert "block/io: Comment out permission assertions" This reverts commit `e3e0003a8f`. This commit was necessary for the 2.9 release because we were unable to fix the underlying issue(s) in time. However, we will be for 2.10. Signed-off-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-04-27 15:39:49 +02:00
Kevin Wolf	0d5e0bb2f7	file-win32: Remove unnecessary include Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-04-27 15:39:49 +02:00
Kevin Wolf	ad02b7af0c	file-posix: Remove unnecessary includes Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-04-27 15:39:49 +02:00
Krzysztof Kozlowski	0731a50feb	block: Constify data passed by pointer to blk_name blk_name() is not modifying data passed to it through pointer and it returns also a pointer to const so the argument can be made const for code safeness. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-04-27 15:39:49 +02:00
Jeff Cody	56e7cf8df0	block/rbd: Add support for reopen() This adds support for reopen in rbd, for changing between r/w and r/o. Note, that this is only a flag change, but we will block a change from r/o to r/w if we are using an RBD internal snapshot. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: d4e87539167ec6527d44c97b164eabcccf96e4f3.1491597120.git.jcody@redhat.com	2017-04-24 15:09:33 -04:00
Jeff Cody	80b61a27c6	block/rbd - update variable names to more apt names Update 'clientname' to be 'user', which tracks better with both the QAPI and rados variable naming. Update 'name' to be 'image_name', as it indicates the rbd image. Naming it 'image' would have been ideal, but we are using that for the rados_image_t value returned by rbd_open(). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: b7ec1fb2e1cf36f9b6911631447a5b0422590b7d.1491597120.git.jcody@redhat.com	2017-04-24 15:09:33 -04:00
Jeff Cody	e2b8247a32	block: do not set BDS read_only if copy_on_read enabled A few block drivers will set the BDS read_only flag from their .bdrv_open() function. This means the bs->read_only flag could be set after we enable copy_on_read, as the BDRV_O_COPY_ON_READ flag check occurs prior to the call to bdrv->bdrv_open(). This adds an error return to bdrv_set_read_only(), and an error will be return if we try to set the BDS to read_only while copy_on_read is enabled. This patch also changes the behavior of vvfat. Before, vvfat could override the drive 'readonly' flag with its own, internal 'rw' flag. For instance, this -drive parameter would result in a writable image: "-drive format=vvfat,dir=/tmp/vvfat,rw,if=virtio,readonly=on" This is not correct. Now, attempting to use the above -drive parameter will result in an error (i.e., 'rw' is incompatible with 'readonly=on'). Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 0c5b4c1cc2c651471b131f21376dfd5ea24d2196.1491597120.git.jcody@redhat.com	2017-04-24 15:09:33 -04:00
Jeff Cody	fe5241bfe3	block: add bdrv_set_read_only() helper function We have a helper wrapper for checking for the BDS read_only flag, add a helper wrapper to set the read_only flag as well. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 9b18972d05f5fa2ac16c014f0af98d680553048d.1491597120.git.jcody@redhat.com	2017-04-24 15:09:33 -04:00
Ashish Mittal	da92c3ff60	block/vxhs.c: Add support for a new block device type called "vxhs" Source code for the qnio library that this code loads can be downloaded from: https://github.com/VeritasHyperScale/libqnio.git Sample command line using JSON syntax: ./x86_64-softmmu/qemu-system-x86_64 -name instance-00000008 -S -vnc 0.0.0.0:0 -k en-us -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on 'json:{"driver":"vxhs","vdisk-id":"c3e9095a-a5ee-4dce-afeb-2a59fb387410", "server":{"host":"172.172.17.4","port":"9999"}}' Sample command line using URI syntax: qemu-img convert -f raw -O raw -n /var/lib/nova/instances/_base/0c5eacd5ebea5ed914b6a3e7b18f1ce734c386ad vxhs://192.168.0.1:9999/c6718f6b-0401-441d-a8c3-1f0064d75ee0 Sample command line using TLS credentials (run in secure mode): ./qemu-io --object tls-creds-x509,id=tls0,dir=/etc/pki/qemu/vxhs,endpoint=client -c 'read -v 66000 2.5k' 'json:{"server.host": "127.0.0.1", "server.port": "9999", "vdisk-id": "/test.raw", "driver": "vxhs", "tls-creds":"tls0"}' [Jeff: Modified trace-events with the correct string formatting] Signed-off-by: Ashish Mittal <Ashish.Mittal@veritas.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Message-id: 1491277689-24949-2-git-send-email-Ashish.Mittal@veritas.com	2017-04-24 15:08:42 -04:00
Fam Zheng	cb8d4bf677	nfs: Make errp the last parameter of nfs_client_open Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170421122710.15373-10-famz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-04-24 09:13:44 +02:00
Fam Zheng	78bbd910bb	block: Make errp the last parameter of commit_active_start Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170421122710.15373-9-famz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-04-24 09:13:44 +02:00
Fam Zheng	51ccfa2dbf	mirror: Make errp the last parameter of mirror_start_job Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170421122710.15373-8-famz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-04-24 09:13:44 +02:00
Fam Zheng	375092332e	crypto: Make errp the last parameter of functions Move opaque to 2nd instead of the 2nd to last, so that compilers help check with the conversion. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170421122710.15373-7-famz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [Commit message typo corrected] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-04-24 09:13:22 +02:00
Fam Zheng	6dffc1f670	socket: Make errp the last parameter of inet_connect_saddr Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170421122710.15373-3-famz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-04-24 09:12:59 +02:00
Fam Zheng	226799cec5	socket: Make errp the last parameter of socket_connect Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170421122710.15373-2-famz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-04-24 09:12:59 +02:00
Fam Zheng	178bd438af	block: Walk bs->children carefully in bdrv_drain_recurse The recursive bdrv_drain_recurse may run a block job completion BH that drops nodes. The coming changes will make that more likely and use-after-free would happen without this patch Stash the bs pointer and use bdrv_ref/bdrv_unref in addition to QLIST_FOREACH_SAFE to prevent such a case from happening. Since bdrv_unref accesses global state that is not protected by the AioContext lock, we cannot use bdrv_ref/bdrv_unref unconditionally. Fortunately the protection is not needed in IOThread because only main loop can modify a graph with the AioContext lock held. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170418143044.12187-2-famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Tested-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-04-18 22:56:28 +08:00
Max Reitz	e3e0003a8f	block/io: Comment out permission assertions In case of block migration, there may be writes to BlockBackends that do not have the write permission taken. Before this issue is fixed (which is not going to happen in 2.9), we therefore cannot assert that this is the case. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20170411145050.31290-1-mreitz@redhat.com Tested-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-04-11 16:09:31 +01:00
Kevin Wolf	5eceb01adf	sheepdog: Fix crash in co_read_response() This fixes a regression introduced in commit `9d456654`. aio_co_wake() can only be used to reenter a coroutine that was already previously entered, otherwise co->ctx is uninitialised and we access garbage. Using it immediately after qemu_coroutine_create() like in co_read_response() is wrong and causes segfaults. Replace the call with aio_co_enter(), which gets an explicit AioContext parameter and works even for new coroutines. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Kashyap Chamarthy <kchamart@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1491919733-21065-1-git-send-email-kwolf@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-04-11 16:08:29 +01:00
Fam Zheng	2ec9a782d1	iscsi: Fix iscsi_create Since `d5895fcb` (iscsi: Split URL into individual options), creating qcow2 image on an iscsi LUN fails: qemu-img create -f qcow2 iscsi://$SERVER/$IQN/0 1G qemu-img: iscsi://$SERVER/$IQN/0: Could not create image: Invalid argument The problem is iscsi_open now expects that transport_name, portal and target are already parsed into structured options by iscsi_parse_filename, but it is not called in iscsi_create. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20170410075451.21329-1-famz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> [mreitz: Dropped now superfluous qdict_put(bs_options, "filename", ...)] Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-11 15:33:00 +02:00
Eric Blake	1606e4cf8a	throttle: Remove block from group on hot-unplug When a block device that is part of a throttle group is hot-unplugged, we forgot to remove it from the throttle group. This leaves stale memory around, and causes an easily reproducible crash: $ ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp stdio \ -device virtio-scsi-pci,bus=pci.0 -drive \ id=drive_image2,if=none,format=raw,file=file2,bps=512000,iops=100,group=foo \ -device scsi-hd,id=image2,drive=drive_image2 -drive \ id=drive_image3,if=none,format=raw,file=file3,bps=512000,iops=100,group=foo \ -device scsi-hd,id=image3,drive=drive_image3 {'execute':'qmp_capabilities'} {'execute':'device_del','arguments':{'id':'image3'}} {'execute':'system_reset'} Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1428810 Suggested-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170406190847.29347-1-eblake@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-11 15:33:00 +02:00
Dong Jia Shi	7a9e51198c	block: pass the right options for BlockDriver.bdrv_open() raw_open() expects the caller always passing in the right actual @options parameter. But when trying to applying snapshot on a RBD image, bdrv_snapshot_goto() calls raw_open() (by calling the bdrv_open callback on the BlockDriver) with a NULL @options, and that will result in a Segmentation fault. For the other non-raw format drivers, it also makes sense to passing in the actual options, althought they don't trigger the problem so far. Let's prepare a @options by adding the "file" key-value pair to a copy of the actual options that were given for the node (i.e. bs->options), and pass it to the callback. BlockDriver.bdrv_open() expects bs->file to be NULL and just overwrites it with the result from bdrv_open_child(). That means we should actually make sure it's NULL because otherwise the child BDS will have a reference count that is 1 too high. So we unconditionally invoke bdrv_unref_child() before calling BlockDriver.bdrv_open(), and we wrap everything in bdrv_ref()/bdrv_unref() so the BDS isn't deleted in the meantime. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Message-id: 20170405091909.36357-2-bjsdjshi@linux.vnet.ibm.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-11 15:33:00 +02:00
Fam Zheng	76296dff97	sheepdog: Use bdrv_coroutine_enter before BDRV_POLL_WHILE When called from main thread, the coroutine should run in the context of bs. Use bdrv_coroutine_enter to ensure that. Signed-off-by: Fam Zheng <famz@redhat.com>	2017-04-11 20:07:15 +08:00
Fam Zheng	49ca625913	block: Fix bdrv_co_flush early return bdrv_inc_in_flight and bdrv_dec_in_flight are mandatory for BDRV_POLL_WHILE to work, even for the shortcut case where flush is unnecessary. Move the if block to below bdrv_dec_in_flight, and BTW fix the variable declaration position. Signed-off-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>	2017-04-11 20:07:15 +08:00
Fam Zheng	e92f0e1910	block: Use bdrv_coroutine_enter to start I/O coroutines BDRV_POLL_WHILE waits for the started I/O by releasing bs's ctx then polling the main context, which relies on the yielded coroutine continuing on bs->ctx before notifying qemu_aio_context with bdrv_wakeup(). Thus, using qemu_coroutine_enter to start I/O is wrong because if the coroutine is entered from main loop, co->ctx will be qemu_aio_context, as a result of the "release, poll, acquire" loop of BDRV_POLL_WHILE, race conditions happen when both main thread and the iothread access the same BDS: main loop iothread ----------------------------------------------------------------------- blockdev_snapshot aio_context_acquire(bs->ctx) virtio_scsi_data_plane_handle_cmd bdrv_drained_begin(bs->ctx) bdrv_flush(bs) bdrv_co_flush(bs) aio_context_acquire(bs->ctx).enter ... qemu_coroutine_yield(co) BDRV_POLL_WHILE() aio_context_release(bs->ctx) aio_context_acquire(bs->ctx).return ... aio_co_wake(co) aio_poll(qemu_aio_context) ... co_schedule_bh_cb() ... qemu_coroutine_enter(co) ... /* (A) bdrv_co_flush(bs) /* (B) I/O on bs / continues... / aio_context_release(bs->ctx) aio_context_acquire(bs->ctx) Note that in above case, bdrv_drained_begin() doesn't do the "release, poll, acquire" in BDRV_POLL_WHILE, because bs->in_flight == 0. Fix this by using bdrv_coroutine_enter and enter coroutine in the right context. iotests 109 output is updated because the coroutine reenter flow during mirror job complete is different (now through co_queue_wakeup, instead of the unconditional qemu_coroutine_switch before), making the end job len different. Signed-off-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2017-04-11 20:07:15 +08:00
Fam Zheng	14e9559f46	block: Make bdrv_parent_drained_begin/end public Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2017-04-11 20:07:15 +08:00
Fam Zheng	19dd29e8a7	mirror: Fix aio context of mirror_top_bs It should be moved to the same context as source, before inserting to the graph. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-04-07 14:44:06 +02:00
Kevin Wolf	1bf03e66fd	block: Don't check permissions for copy on read The assertion is currently failing. We can't require callers to have write permissions when all they are doing is a read, so comment it out. Add a FIXME comment in the code so that the check is re-enabled when copy on read is refactored into its own filter driver. Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com>	2017-04-07 14:44:06 +02:00
Max Reitz	7a25fcd056	block/mirror: Fix use-after-free If @bs does not have any parents, the only reference to @mirror_top_bs will be held by the BlockJob object after the bdrv_unref() following block_job_create(). However, if block_job_create() fails, this reference will not exist and @mirror_top_bs will have been deleted when we goto fail. The issue comes back at all later entries to the fail label: We delete the BlockJob object before rolling back our changes to the node graph. This means that we will delete @mirror_top_bs in the process. All in all, whenever @bs does not have any parents and we go down the fail path we will dereference @mirror_top_bs after it has been deleted. Fix this by invoking bdrv_unref() only when block_job_create() was successful and by bdrv_ref()'ing @mirror_top_bs in the fail path before deleting the BlockJob object. Finally, bdrv_unref() it at the end of the fail path after we actually no longer need it. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-04-07 14:44:05 +02:00
Kevin Wolf	0d0676a104	commit: Set commit_top_bs->total_sectors Like in the mirror filter driver, we also need to set the image size for the commit filter driver. This is less likely to be a problem in practice than for the mirror because we're not at the active layer here, but attaching new parents to a node in the middle of the chain is possible, so the size needs to be correct anyway. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2017-04-07 14:44:05 +02:00
Kevin Wolf	02be4aeb93	commit: Set commit_top_bs->aio_context The filter driver that is inserted by the commit job needs to use the same AioContext as its parent and child nodes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2017-04-07 14:44:05 +02:00
Kevin Wolf	d35ff5e6b3	block: Ignore guest dev permissions during incoming migration Usually guest devices don't like other writers to the same image, so they use blk_set_perm() to prevent this from happening. In the migration phase before the VM is actually running, though, they don't have a problem with writes to the image. On the other hand, storage migration needs to be able to write to the image in this phase, so the restrictive blk_set_perm() call of qdev devices breaks it. This patch flags all BlockBackends with a qdev device as blk->disable_perm during incoming migration, which means that the requested permissions are stored in the BlockBackend, but not actually applied to its root node yet. Once migration has finished and the VM should be resumed, the permissions are applied. If they cannot be applied (e.g. because the NBD server used for block migration hasn't been shut down), resuming the VM fails. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Kashyap Chamarthy <kchamart@redhat.com>	2017-04-07 14:44:05 +02:00
Peter Maydell	87cc4c6102	* MemoryRegionCache revert * glib optimization workaround * fix "info lapic" segfault on isapc * fix QIOChannel memory leak -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQExBAABCAAbBQJY4oOMFBxwYm9uemluaUByZWRoYXQuY29tAAoJEL/70l94x66D AsIH/i52nJw41utJCs5AevnQyqNs9RnyMkZLHiVoi6a+pdJqX+0mCw8gV/5FsbPZ dtyt1tEuYBSu72adr+/ExE4aIEjwzeyRmnUdOkB+iYPxirHKuf4K/JTuLuvMtaQQ Tqj+FU5tx3wx0jlGOm5A7pzjZ680JUex+oaz3d1bZziv3zCyFCIgiZ2m2UAaaPQe fsd3fksJvc0gKOUKmdLUpu2m/xP3hAQAfQ4P/ozOfbVh9V2CVNaQ/cl935tNtdFK aYN3KleW3/ovb+YSexeNoW7QQH/3ZsjronCW5OmbF4FgHoeoV8MUROfNgu1S2bRU Bne9K/6boPzhD8NDEuSy8SXvf7s= =EdXr -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * MemoryRegionCache revert * glib optimization workaround * fix "info lapic" segfault on isapc * fix QIOChannel memory leak # gpg: Signature made Mon 03 Apr 2017 18:17:00 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: main-loop: Acquire main_context lock around os_host_main_loop_wait. exec: revert MemoryRegionCache nbd: fix memory leak on socket_connect failed ipmi: Fix macro issues target-i386: fix "info lapic" segfault on isapc iscsi: drop unused IscsiAIOCB.qiov field Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-04-04 11:40:55 +01:00
Max Reitz	86d1bd7098	block/parallels: Avoid overflows Change the types of variables in allocate_clusters() to int64_t so we do not have to worry about potential overflows. Add an assertion that our accesses to s->bat[] do not result in a buffer overflow and that the implicit conversion performed when invoking bat_entry_off() does not result in an integer overflow. Coverity-id: 1307776 Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170331170512.10381-1-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-03 17:11:40 +02:00
Eric Blake	0c1bd4692f	qcow2: Discard unaligned tail when wiping image There is a subtle difference between the fast (qcow2v3 with no extra data) and slow path (qcow2v2 format [aka 0.10], or when a snapshot is present) of qcow2_make_empty(). The slow path fails to discard the final (partial) cluster of an unaligned image. The problem stems from the fact that qcow2_discard_clusters() was silently ignoring sub-cluster head and tail on unaligned requests. A quick audit of all callers shows that qcow2_snapshot_create() has always passed a cluster-aligned request since the call was added in commit 1ebf561; qcow2_co_pdiscard() has passed a cluster-aligned request since commit `ecdbead` taught the block layer about preferred discard alignment; and qcow2_make_empty() was fixed to pass an aligned start (but not necessarily end) in commit `a3e1505`. Asserting that the start is always aligned also points out that we now have a dead check: rounding the end offset down can never result in a value less than the aligned start offset (the check was rendered dead with commit `ecdbead`). Meanwhile, we do not want to round the end cluster down in the one case of the end offset matching the (unaligned) file size - that final partial cluster should still be discarded. With those fixes in place, the fast and slow paths are back in sync at discarding an entire image; the next patch will update qemu-iotests to ensure we don't regress. Note that bdrv_co_pdiscard ignores ALL partial cluster requests, including the partial cluster at the end of an image; it can be argued that the partial cluster at the end should be special-cased so that a guest issuing discard requests at proper alignments everywhere else can likewise empty the entire image. But that optimization is left for another day. Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170331185356.2479-3-eblake@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-03 17:11:40 +02:00
Markus Armbruster	d1c136885b	sheepdog: Fix blockdev-add Commit `831acdc` "sheepdog: Implement bdrv_parse_filename()" and commit `d282f34` "sheepdog: Support blockdev-add" have different ideas on how the QemuOpts parameters for the server address are named. Fix that. While there, rename BlockdevOptionsSheepdog member addr to server, for consistency with BlockdevOptionsSsh, BlockdevOptionsGluster, BlockdevOptionsNbd. Commit 831acdc's example becomes --drive driver=sheepdog,server.type=inet,server.host=fido,server.port=7000,vdi=dolly instead of --drive driver=sheepdog,host=fido,vdi=dolly Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Kashyap Chamarthy <kchamart@redhat.com> Message-id: 1490895797-29094-10-git-send-email-armbru@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-03 17:11:39 +02:00
Markus Armbruster	9445673ea6	nbd: Tidy up blockdev-add interface SocketAddress is a simple union, and simple unions are awkward: they have their variant members wrapped in a "data" object on the wire, and require additional indirections in C. I intend to limit its use to existing external interfaces, and convert all internal interfaces to SocketAddressFlat. BlockdevOptionsNbd is an external interface using SocketAddress. We already use SocketAddressFlat elsewhere in blockdev-add. Replace it by SocketAddressFlat while we can (it's new in 2.9) for simplicity and consistency. For example, { "execute": "blockdev-add", "arguments": { "node-name": "foo", "driver": "nbd", "server": { "type": "inet", "data": { "host": "localhost", "port": "12345" } } } } becomes { "execute": "blockdev-add", "arguments": { "node-name": "foo", "driver": "nbd", "server": { "type": "inet", "host": "localhost", "port": "12345" } } } Since the internal interfaces still take SocketAddress, this requires conversion function socket_address_crumple(). It'll go away when I update the interfaces. Unfortunately, SocketAddress is also visible in -drive since 2.8: -drive if=none,driver=nbd,server.type=inet,server.data.host=127.0.0.1,server.data.port=12345 Nobody should be using it, as it's fairly new and has never been documented, so adding still more compatibility gunk to keep it working isn't worth the trouble. You now have to use -drive if=none,driver=nbd,server.type=inet,server.host=127.0.0.1,server.port=12345 Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 1490895797-29094-9-git-send-email-armbru@redhat.com [mreitz: Change iotest 147 accordingly] Because of this interface change, iotest 147 has to be adapted. Unfortunately, we cannot just flatten all of the addresses because nbd-server-start still takes a plain SocketAddress. Therefore, we need both and this is most easily achieved by writing the SocketAddress into the code and flattening it where necessary. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170330221243.17333-1-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-03 17:11:39 +02:00
Markus Armbruster	8bc0673f6d	qapi-schema: SocketAddressFlat variants 'vsock' and 'fd' Note that the new variants are impossible in qemu_gluster_glfs_init(), because the gconf->server can only come from qemu_gluster_parse_uri() or qemu_gluster_parse_json(), and neither can create anything but 'inet' or 'unix'. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1490895797-29094-7-git-send-email-armbru@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-03 17:11:39 +02:00
Markus Armbruster	fce5d5386d	gluster: Prepare for SocketAddressFlat extension qemu_gluster_glfs_init() and qemu_gluster_parse_json() rely on the fact that SocketAddressFlatType has only two members SOCKET_ADDRESS_FLAT_TYPE_INET and SOCKET_ADDRESS_FLAT_TYPE_UNIX. Correct, but won't stay correct. Make them more robust. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1490895797-29094-6-git-send-email-armbru@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-03 17:11:39 +02:00
Markus Armbruster	129c7d1c53	block: Document -drive problematic code and bugs -blockdev and blockdev_add convert their arguments via QObject to BlockdevOptions for qmp_blockdev_add(), which converts them back to QObject, then to a flattened QDict. The QDict's members are typed according to the QAPI schema. -drive converts its argument via QemuOpts to a (flat) QDict. This QDict's members are all QString. Thus, the QType of a flat QDict member depends on whether it comes from -drive or -blockdev/blockdev_add, except when the QAPI type maps to QString, which is the case for 'str' and enumeration types. The block layer core extracts generic configuration from the flat QDict, and the block driver extracts driver-specific configuration. Both commonly do so by converting (parts of) the flat QDict to QemuOpts, which turns all values into strings. Not exactly elegant, but correct. However, A few places access the flat QDict directly: * Most of them access members that are always QString. Correct. * bdrv_open_inherit() accesses a boolean, carefully. Correct. * nfs_config() uses a QObject input visitor. Correct only because the visited type contains nothing but QStrings. * nbd_config() and ssh_config() use a QObject input visitor, and the visited types contain non-QStrings: InetSocketAddress members @numeric, @to, @ipv4, @ipv6. -drive works as long as you don't try to use them (they're all optional). @to is ignored anyway. Reproducer: -drive driver=ssh,server.host=h,server.port=22,server.ipv4,path=p -drive driver=nbd,server.type=inet,server.data.host=h,server.data.port=22,server.data.ipv4 both fail with "Invalid parameter type for 'data.ipv4', expected: boolean" Add suitable comments to all these places. Mark the buggy ones FIXME. "Fortunately", -drive's driver-specific options are entirely undocumented. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 1490895797-29094-5-git-send-email-armbru@redhat.com [mreitz: Fixed two typos] Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-03 17:11:39 +02:00
Markus Armbruster	ca0b64e5ed	nbd sockets vnc: Mark problematic address family tests TODO Certain features make sense only with certain address families. For instance, passing file descriptors requires AF_UNIX. Testing SocketAddress's saddr->type == SOCKET_ADDRESS_KIND_UNIX is obvious, but problematic: it can't recognize AF_UNIX when type == SOCKET_ADDRESS_KIND_FD. Mark such tests of saddr->type TODO. We may want to check the address family with getsockname() there. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1490895797-29094-2-git-send-email-armbru@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-03 17:11:39 +02:00
yaolujing	cc1e139139	nbd: fix memory leak on socket_connect failed When TCP connection fails between nbd server and client, the local var, sioc, memory leak. This patch fixes the memory leak. Signed-off-by: yaolujing <yaolujing@huawei.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1491005709-29989-1-git-send-email-yaolujing@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-04-02 21:18:21 +02:00
Stefan Hajnoczi	b9afaba891	iscsi: drop unused IscsiAIOCB.qiov field The IscsiAIOCB.qiov field has been unused since commit `063c3378a9` ("block/iscsi: introduce bdrv_co_{readv, writev, flush_to_disk}") back in 2013. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20170327165005.22038-1-stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-04-02 21:17:47 +02:00
Max Reitz	34634ca286	block/curl: Check protocol prefix If the user has explicitly specified a block driver and thus a protocol, we have to make sure the URL's protocol prefix matches. Otherwise the latter will silently override the former which might catch some users by surprise. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20170331120431.1767-3-mreitz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-31 15:53:22 -04:00
Eric Blake	e98c6961c8	rbd: Fix regression in legacy key/values containing escaped : Commit `c7cacb3` accidentally broke legacy key-value parsing through pseudo-filename parsing of -drive file=rbd://..., for any key that contains an escaped ':'. Such a key is surprisingly common, thanks to mon_host specifying a 'host:port' string. The break happens because passing things from QDict through QemuOpts back to another QDict requires that we pack our parsed key/value pairs into a string, and then reparse that string, but the intermediate string that we created ("key1=value1:key2=value2") lost the \: escaping that was present in the original, so that we could no longer see which : were used as separators vs. those used as part of the original input. Fix it by collecting the key/value pairs through a QList, and sending that list on a round trip through a JSON QString (as in '["key1","value1","key2","value2"]') on its way through QemuOpts, rather than hand-rolling our own string. Since the string is only handled internally, this was faster than creating a full-blown struct of '[{"key1":"value1"},{"key2":"value2"}]', and safer at guaranteeing order compared to '{"key1":"value1","key2":"value2"}'. It would be nicer if we didn't have to round-trip through QemuOpts in the first place, but that's a much bigger task for later. Reproducer: ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp stdio \ -drive 'file=rbd:volumes/volume-ea141b5c-cdb3-4765-910d-e7008b209a70'\ ':id=compute:key=AQAVkvxXAAAAABAA9ZxWFYdRmV+DSwKr7BKKXg=='\ ':auth_supported=cephx\;none:mon_host=192.168.1.2\:6789'\ ',format=raw,if=none,id=drive-virtio-disk0,'\ 'serial=ea141b5c-cdb3-4765-910d-e7008b209a70,cache=writeback' Even without an RBD setup, this serves a test of whether we get the incorrect parser error of: qemu-system-x86_64: -drive file=rbd:...cache=writeback: conf option 6789 has no value or the correct behavior of hanging while trying to connect to the requested mon_host of 192.168.1.2:6789. Reported-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170331152730.12514-1-eblake@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-31 13:53:57 -04:00
Markus Armbruster	2836284db6	rbd: Fix bugs around -drive parameter "server" qemu_rbd_open() takes option parameters as a flattened QDict, with keys of the form server.%d.host, server.%d.port, where %d counts up from zero. qemu_rbd_array_opts() extracts these values as follows. First, it calls qdict_array_entries() to find the list's length. For each list element, it formats the list's key prefix (e.g. "server.0."), then creates a new QDict holding the options with that key prefix, then converts that to a QemuOpts, so it can finally get the member values from there. If there's one surefire way to make code using QDict more awkward, it's creating more of them and mixing in QemuOpts for good measure. The extraction of keys starting with server.%d into another QDict makes us ignore parameters like server.0.neither-host-nor-port silently. The conversion to QemuOpts abuses runtime_opts, as described a few commits ago. Rewrite to simply get the values straight from the options QDict. Fixes -drive not to crash when server.. are present, but server..host is absent. Fixes -drive to reject invalid server..*. Permits cleaning up runtime_opts. Do that, and fix -drive to reject bogus parameters host and port instead of silently ignoring them. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1490691368-32099-11-git-send-email-armbru@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-28 10:01:21 -04:00
Markus Armbruster	464444fcc1	rbd: Revert -blockdev and -drive parameter auth-supported This reverts half of commit `0a55679`. We're having second thoughts on the QAPI schema (and thus the external interface), and haven't reached consensus, yet. Issues include: * The implementation uses deprecated rados_conf_set() key "auth_supported". No biggie. * The implementation makes -drive silently ignore invalid parameters "auth" and "auth-supported..X" where X isn't "auth". Fixable (in fact I'm going to fix similar bugs around parameter server), so again no biggie. BlockdevOptionsRbd member @password-secret applies only to authentication method cephx. Should it be a variant member of RbdAuthMethod? * BlockdevOptionsRbd member @user could apply to both methods cephx and none, but I'm not sure it's actually used with none. If it isn't, should it be a variant member of RbdAuthMethod? * The client offers a set of authentication methods, not a list. Should the methods be optional members of BlockdevOptionsRbd instead of members of list @auth-supported? The latter begs the question what multiple entries for the same method mean. Trivial question now that RbdAuthMethod contains nothing but @type, but less so when RbdAuthMethod acquires other members, such the ones discussed above. * How BlockdevOptionsRbd member @auth-supported interacts with settings from a configuration file specified with @conf is undocumented. I suspect it's untested, too. Let's avoid painting ourselves into a corner now, and revert the feature for 2.9. Note that users can still configure authentication methods with a configuration file. They probably do that anyway if they use Ceph outside QEMU as well. Further note that this doesn't affect use of key "auth-supported" in -drive file=rbd:...:key=value. qemu_rbd_array_opts()'s parameter @type now must be RBD_MON_HOST, which is silly. This will be cleaned up shortly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1490691368-32099-9-git-send-email-armbru@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-28 10:01:21 -04:00
Markus Armbruster	078463977a	rbd: Clean up qemu_rbd_create()'s detour through QemuOpts The conversion from QDict to QemuOpts is pointless. Simply get the stuff straight from the QDict. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1490691368-32099-8-git-send-email-armbru@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-28 10:00:57 -04:00
Markus Armbruster	cbf036b4f3	rbd: Clean up runtime_opts, fix -drive to reject filename runtime_opts is used for three different purposes: * qemu_rbd_open() uses it to accept options it recognizes, such as "pool" and "image". Other .bdrv_open() methods do it similarly. * qemu_rbd_open() accepts additional list-valued options auth-supported and server, with the help of qemu_rbd_array_opts(). The list elements are again dictionaries. qemu_rbd_array_opts() uses runtime_opts to accept their members. Thus, runtime_opts contains recognized sub-sub-options "auth", "host", "port" in addition to recognized options. No other block driver does that. * qemu_rbd_create() uses it to convert the QDict produced by qemu_rbd_parse_filename() to QemuOpts. No other block driver does that. The keys produced by qemu_rbd_parse_filename() are "pool", "image", "snapshot", "conf", "user" and "keyvalue-pairs". qemu_rbd_open() accepts these, so no additional ones here. This is a confusing mess. Dates back to commit `0f9d252`. First step to clean it up is documenting runtime_opts.desc[]: * Reorder entries to match the QAPI schema, like we do in other block drivers. * Document why the schema's "server" and "auth-supported" aren't in .desc[]. * Document why "keyvalue-pairs", "host", "port" and "auth" are in .desc[], but not the schema. * Delete "filename", because none of the three users actually uses it. This fixes -drive to reject parameter filename instead of silently ignoring it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1490691368-32099-7-git-send-email-armbru@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-28 09:53:16 -04:00
Markus Armbruster	82f20e8547	rbd: Don't accept -drive driver=rbd, keyvalue-pairs=... The way we communicate extra key-value pairs from qemu_rbd_parse_filename() to qemu_rbd_open() exposes option parameter "keyvalue-pairs" on the command line. It's not wanted there. Hack: rename the parameter to "=keyvalue-pairs" to make it inaccessible. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1490691368-32099-6-git-send-email-armbru@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-28 09:53:16 -04:00
Markus Armbruster	8efb339dd4	rbd: Clean up after the previous commit This code in qemu_rbd_parse_filename() found_str = qemu_rbd_next_tok(p, '\0', &p); p = found_str; has no effect. Drop it, and simplify qemu_rbd_next_tok(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1490691368-32099-5-git-send-email-armbru@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-28 09:53:16 -04:00
Markus Armbruster	730b00bbfd	rbd: Don't limit length of parameter values We laboriously enforce that parameter values are between one and some arbitrary limit in length. Only RBD_MAX_IMAGE_NAME_SIZE comes from librbd.h, and I'm not sure it applies. Where the other limits come from is unclear. Drop the length checking. The limits librbd actually imposes must be checked by librbd anyway. There's one minor complication: BDRVRBDState member name is a fixed-size array. Depends on the length limit. Make it a pointer to a dynamically allocated string. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1490691368-32099-4-git-send-email-armbru@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-28 09:53:16 -04:00
Markus Armbruster	f51c363c2b	rbd: Fix to cleanly reject -drive without pool or image qemu_rbd_open() neglects to check pool and image are present. Missing image is caught by rbd_open(), but missing pool crashes. Reproducer: $ qemu-system-x86_64 -nodefaults -drive driver=rbd,id=rbd,image=i,... terminate called after throwing an instance of 'std::logic_error' what(): basic_string::_M_construct null not valid Aborted (core dumped) where ... is a working server.0.{host,port} configuration. Doesn't affect -drive with file=..., because qemu_rbd_parse_filename() always sets both pool and image. Doesn't affect -blockdev, because pool and image are mandatory in the QAPI schema. Fix by adding the missing checks. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1490691368-32099-3-git-send-email-armbru@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-28 09:53:16 -04:00
Denis V. Lunev	dc62da88b5	parallels: wrong call to bdrv_truncate Parallels driver should not call bdrv_truncate if the image was opened in the read-only mode. Without the patch qemu-img check harddisk.hds asserts with bdrv_truncate: Assertion `child->perm & BLK_PERM_RESIZE' failed. Parameters used on the write path are not needed if the image is opened in the read-only mode. Signed-off-by: Denis V. Lunev <den@openvz.org> Reported-by: Edgar Kaziahmedov <edos@virtuozzo.mipt.ru> Message-id: 1490625488-7980-1-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-03-28 11:06:00 +01:00
Peter Maydell	eb06c9e2d3	* MTTCG fix for win32 * virtio-scsi assertion failure * mem-prealloc coverity fix * x86 migration revert which requires more thought * x86 instruction limit (avoids >2 page translation blocks) * nbd dead code cleanup * small memory.c logic fix -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQExBAABCAAbBQJY2Te4FBxwYm9uemluaUByZWRoYXQuY29tAAoJEL/70l94x66D GyIH/jMpl0w5cdW2hxzEba5alqALKx8fz8LMFy47lSndifyr74Nbk7fq9u89m9/6 3dz92sOq4ixUt8+eWEHcy0lJqucrStdMWcA7LsSIioXfgbBN39e9NfJFshXKTSQU RSL3M5f5XvYHZqHWhk/GjzlkA2l+Dq2v7FM+DT4HISnP0fjcmGXEfadfUZi6KLao 94xXGs73pTkln9jm8N1pwn3JuJ4+FbEatrvok01nmTbA7VrrBz0zVbTZjhWz7Tu/ sqBuIBAnPNKhYZFhF8GnNrXUaIciCbw13QdT047JSfpdkSQ7IUfGt7mW48X0+q9z JCHTiTZ35d7/lqeMojgl9ANUDpk= =iED8 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * MTTCG fix for win32 * virtio-scsi assertion failure * mem-prealloc coverity fix * x86 migration revert which requires more thought * x86 instruction limit (avoids >2 page translation blocks) * nbd dead code cleanup * small memory.c logic fix # gpg: Signature made Mon 27 Mar 2017 17:03:04 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: scsi-generic: Fill in opt_xfer_len in INQUIRY reply if it is zero Revert "apic: save apic_delivered flag" nbd: drop unused NBDClientSession.is_unix field win32: replace custom mutex and condition variable with native primitives mem-prealloc: fix sysconf(_SC_NPROCESSORS_ONLN) failure case. tcg/i386: Check the size of instruction being translated virtio-scsi: Fix acquire/release in dataplane handlers virtio-scsi: Make virtio_scsi_acquire/release public clear pending status before calling memory commit Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-03-27 17:34:50 +01:00
Peter Maydell	700f9ce0f9	block/file-posix.c: Fix unused variable warning on OpenBSD On OpenBSD none of the ioctls probe_logical_blocksize() tries exist, so the variable sector_size is unused. Refactor the code to avoid this (and reduce the duplicated code). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1490279788-12995-1-git-send-email-peter.maydell@linaro.org Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-03-27 17:28:34 +02:00
Kevin Wolf	e5bcf967fb	file-posix: Make bdrv_flush() failure permanent without O_DIRECT Success for bdrv_flush() means that all previously written data is safe on disk. For fdatasync(), the best semantics we can hope for on Linux (without O_DIRECT) is that all data that was written since the last call was successfully written back. Therefore, and because we can't redo all writes after a flush failure, we have to give up after a single fdatasync() failure. After this failure, we would never be able to make the promise that a successful bdrv_flush() makes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20170322210005.16533-1-kwolf@redhat.com Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-03-27 16:53:42 +02:00
Paolo Bonzini	a12a712a7d	nbd-client: fix handling of hungup connections After the switch to reading replies in a coroutine, nothing is reentering pending receive coroutines if the connection hangs. Move nbd_recv_coroutines_enter_all to the reply read coroutine, which is the place where hangups are detected. nbd_teardown_connection can simply wait for the reply read coroutine to detect the hangup and clean up after itself. This wouldn't be enough though because nbd_receive_reply returns 0 (rather than -EPIPE or similar) when reading from a hung connection. Fix the return value check in nbd_read_reply_entry. This fixes qemu-iotests 083. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170314111157.14464-1-pbonzini@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-03-27 16:50:36 +02:00
Stefan Hajnoczi	e4548bb640	nbd: drop unused NBDClientSession.is_unix field Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20170327123223.1199-1-stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-03-27 14:41:01 +02:00
Eric Blake	67adf4b398	trace: Fix backwards mirror_yield parameters block/trace-events lists the parameters for mirror_yield consistently with other mirror events (cnt just after s, like in mirror_before_sleep; in_flight last, like in mirror_yield_in_flight). But the callers were passing parameters in the wrong order, leading to poor trace messages, including type truncation when there are more than 4G dirty sectors involved. Broken since its introduction in commit `bd48bde`. While touching this, ensure that all callers use the same type (uint64_t) for cnt, as a later patch will enable the compiler to do stricter type-checking. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-03-24 09:21:42 +00:00
Peter Maydell	e6ebebc204	-----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJY0rRYAAoJEL2+eyfA3jBXEvkQAJSNErQOEdqBoX/gSjeYSX85 PGp+fUrIux0HIYeKySShnsJ3Z1AuIHCogtcfEyHzTo8cDljZssgS4BRKy41ZnNaM 91Q91MgVyAEtwzApg5WNwWhTB7QDkbz7J75mTk74KPN6y9uKNbjSBRSnH4ZbIH/p L3tk6eGpHWf3N0UvoifoKpExlq0A+AYkisuZn7D9C+bBDEnEUWYRcvfEk3sKrZD/ XikclGwNSPKmdBeYenzlLHFA8WyGe85pkys6QRPeRL1l8dDBBPt1so2y4PLzaEBO fImh+ivrHHbKI5TD0RoRVsY4qi9bbH8dK1gDp0oT8uZpwIsO4EWRHA1GZRq6lVIw j7a+p/ZFBiVa2WFvWpicZppRwnkuuswqqm4NVsC1djSMoDvPeO2T24YlcRPYeYrp FVlY04HpP195mw3e7VVWlirRQY+Jo5IwJkSOUKM4xOZpKY/prS2kqT+KQq2bYK5a t3MTKwT04q/7eBtPFoJFf3gwI4q8hyizPtf4X0AN5/YREwJh7J4azQSLEJSjlo2F 37TbMqGVNQPBtwXWnfK2mi12NIHCaP/clh8hqqrQE6EdjFQQcdD5j5df5syLalTK qy+IbxpvoyNt0niXstXI62RnKDbfwsz8YtYYjVIUfv9VkpyQU1gHak7VeodRyHjz zuINtr0Jrmr47n8d9qTj =Gi6q -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging # gpg: Signature made Wed 22 Mar 2017 17:28:56 GMT # gpg: using RSA key 0xBDBE7B27C0DE3057 # gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>" # gpg: aka "Jeffrey Cody <jeff@codyprime.org>" # gpg: aka "Jeffrey Cody <codyprime@gmail.com>" # Primary key fingerprint: 9957 4B4D 3474 90E7 9D98 D624 BDBE 7B27 C0DE 3057 * remotes/cody/tags/block-pull-request: blockjob: add devops to blockjob backends block-backend: add drained_begin / drained_end ops blockjob: add block_job_start_shim blockjob: avoid recursive AioContext locking Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-03-23 11:39:53 +00:00
John Snow	f4d9cc88ee	block-backend: add drained_begin / drained_end ops Allow block backends to forward drain requests to their devices/users. The initial intended purpose for this patch is to allow BBs to forward requests along to BlockJobs, which will want to pause if their associated BB has entered a drained region. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170316212351.13797-3-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-22 13:26:27 -04:00
Edgar Kaziahmedov	ff5bbe56c6	parallels: fix default options parsing parallels block driver is completely broken since commit commit `75cdcd1553` Author: Markus Armbruster <armbru@redhat.com> Date: Tue Feb 21 21:14:08 2017 +0100 option: Fix checking of sizes for overflow and trailing crap Right now even simple qemu-io -c "read 512 64k" 1.hds ends up with Unexpected error in parse_option_size() at util/qemu-option.c:188: Parameter 'prealloc-size' expects a non-negative number below 2^64 Aborted (core dumped) The cure is simple - we should use 'M' as a suffix in default option value instead of 'MiB'. Signed-off-by: Edgar Kaziahmedov <edos@virtuozzo.mipt.ru> Signed-off-by: Denis V. Lunev <den@openvz.org> Message-id: 1490002022-22653-1-git-send-email-den@openvz.org CC: Markus Armbruster <armbru@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-03-21 10:02:36 +00:00
Paolo Bonzini	eb048026aa	curl: fix compilation on OpenBSD EPROTO is not found in OpenBSD. We usually use EIO when no better errno is available, do that here too. Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170317152412.8472-1-pbonzini@redhat.com Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-03-17 18:27:14 +00:00
Fam Zheng	c1cef67251	block: Always call bdrv_child_check_perm first bdrv_child_set_perm alone is not very usable because the caller must call bdrv_child_check_perm first. This is already encapsulated conveniently in bdrv_child_try_set_perm, so remove the other prototypes from the header and fix the one wrong caller, block/mirror.c. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-17 12:54:06 +01:00
Fam Zheng	fed414df9d	file-posix: Don't leak fd in hdev_get_max_segments This fixes a leaked fd introduced in commit `9103f1ce`. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-17 12:54:06 +01:00
Changlong Xie	37a9051cc7	replication: clarify permissions Even if hidden_disk, secondary_disk are backing files, they all need write permissions in replication scenario. Otherwise we will encouter below exceptions on secondary side during adding nbd server: {'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk', 'writable': true } } {"error": {"class": "GenericError", "desc": "Conflicts with use by hidden-qcow2-driver as 'backing', which does not allow 'write' on sec-qcow2-driver-for-nbd"}} CC: Zhang Hailiang <zhang.zhanghailiang@huawei.com> CC: Zhang Chen <zhangchen.fnst@cn.fujitsu.com> CC: Wen Congyang <wencongyang2@huawei.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-17 12:54:06 +01:00
Stefan Hajnoczi	6958349085	file-posix: clean up max_segments buffer termination The following pattern is unsafe: char buf[32]; ret = read(fd, buf, sizeof(buf)); ... buf[ret] = 0; If read(2) returns 32 then a byte beyond the end of the buffer is zeroed. In practice this buffer overflow does not occur because the sysfs max_segments file only contains an unsigned short + '\n'. The string is always shorter than 32 bytes. Regardless, avoid this pattern because static analysis tools might complain and it could lead to real buffer overflows if copy-pasted elsewhere in the codebase. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-17 12:54:06 +01:00
Kevin Wolf	dcbf37ce41	commit: Implement .bdrv_refresh_filename We want query-block to return the right filename, even if a commit job put a bdrv_commit_top on top of the actual image format driver. Let bdrv_commit_top.bdrv_refresh_filename get the filename from its backing file. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-03-13 12:49:33 +01:00
Kevin Wolf	fd4a6493bb	mirror: Implement .bdrv_refresh_filename We want query-block to return the right filename, even if a mirror job put a bdrv_mirror_top on top of the actual image format driver. Let bdrv_mirror_top.bdrv_refresh_filename get the filename from its backing file. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-03-13 12:49:33 +01:00
Kevin Wolf	9196565866	commit: Implement bdrv_commit_top.bdrv_co_get_block_status In some cases, bdrv_co_get_block_status() is called recursively for the whole backing chain. The automatically inserted bdrv_commit_top filter driver must not stop the recursion, so implement a callback that simply forwards the request to bs->backing. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-03-13 12:49:33 +01:00
Kevin Wolf	b64aa44195	block: Request block status from *file for BDRV_BLOCK_RAW This fixes bdrv_co_get_block_status() for the bdrv_mirror_top block driver, which must fall through to bs->backing instead of bs->file. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-03-13 12:49:33 +01:00

... 9 10 11 12 13 ...

4005 Commits