openkylin/qemu - qemu - 红山开源项目托管

Commit Graph

Author	SHA1	Message	Date
Markus Armbruster	7c81e4e9db	block: Don't bother asserting type of output visitor's output After a visit of a complex QAPI type FOO ov = qobject_output_visitor_new(&foo); visit_type_FOO(ov, NULL, expr, &error_abort); visit_complete(ov, &foo); we can safely assume qobject_type(foo) is QTYPE_QDICT. We do in many places, but occasionally assert qobject_type(obj) == QTYPE_QDICT. Don't. The appropriate place to check such fundamental properties of QAPI visitors is the test suite. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1487363905-9480-15-git-send-email-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-02-22 19:52:20 +01:00
Anton Nefedov	90ab48eb07	mirror: do not increase offset during initial zero_or_discard phase If explicit zeroing out before mirroring is required for the target image, it moves the block job offset counter to EOF, then offset and len counters count the image size twice. There is no harm but stats are confusing, specifically the progress of the operation is always reported as 99% by management tools. The patch skips offset increase for the first "technical" pass over the image. This should not cause any further harm. Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1486045515-8009-1-git-send-email-den@openvz.org CC: Jeff Cody <jcody@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-21 10:38:00 -05:00
Kevin Wolf	31eb1202d3	iscsi: Add blockdev-add support This adds blockdev-add support for iscsi devices. Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-21 10:37:34 -05:00
Kevin Wolf	1d56010482	iscsi: Add timeout option This was previously only available with -iscsi. Again, after this patch, the -iscsi option only takes effect if an URL is given. New users are supposed to use the new driver-specific option. All -iscsi options have a corresponding driver-specific option for the iscsi block driver now. Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-21 10:37:26 -05:00
Kevin Wolf	81aa2a0fb5	iscsi: Add header-digest option This was previously only available with -iscsi. Again, after this patch, the -iscsi option only takes effect if an URL is given. New users are supposed to use the new driver-specific option. Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-21 10:37:16 -05:00
Kevin Wolf	d4e799292c	iscsi: Add initiator-name option This was previously only available with -iscsi. Again, after this patch, the -iscsi option only takes effect if an URL is given. New users are supposed to use the new driver-specific option. Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-21 10:37:08 -05:00
Kevin Wolf	4317142020	iscsi: Handle -iscsi user/password in bdrv_parse_filename() This splits the logic in the old parse_chap() function into a part that parses the -iscsi options into the new driver-specific options, and another part that actually applies those options (called apply_chap() now). Note that this means that username and password specified with -iscsi only take effect when a URL is provided. This is intentional, -iscsi is a legacy interface only supported for compatibility, new users should use the proper driver-specific options. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-21 10:36:57 -05:00
Kevin Wolf	d5895fcb1d	iscsi: Split URL into individual options This introduces a .bdrv_parse_filename handler for iscsi which parses an URL if given and translates it to individual options. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-21 10:36:34 -05:00
Paolo Bonzini	1ace7ceac5	coroutine-lock: add mutex argument to CoQueue APIs All that CoQueue needs in order to become thread-safe is help from an external mutex. Add this to the API. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213181244.16297-6-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:40 +00:00
Paolo Bonzini	b9e413dd37	block: explicitly acquire aiocontext in aio callbacks that need it Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-16-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:39 +00:00
Paolo Bonzini	1919631e6b	block: explicitly acquire aiocontext in bottom halves that need it Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-15-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:39 +00:00
Paolo Bonzini	9d45665448	block: explicitly acquire aiocontext in callbacks that need it This covers both file descriptor callbacks and polling callbacks, since they execute related code. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-14-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:36 +00:00
Paolo Bonzini	2f47da5f7f	block: explicitly acquire aiocontext in timers that need it Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-13-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:14:08 +00:00
Paolo Bonzini	b20123a28b	qed: introduce qed_aio_start_io and qed_aio_next_io_cb qed_aio_start_io and qed_aio_next_io will not have to acquire/release the AioContext, while qed_aio_next_io_cb will. Split the functionality and gain a little type-safety in the process. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-11-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:14:08 +00:00
Paolo Bonzini	e5c67ab552	blkdebug: reschedule coroutine on the AioContext it is running on Keep the coroutine on the same AioContext. Without this change, there would be a race between yielding the coroutine and reentering it. While the race cannot happen now, because the code only runs from a single AioContext, this will change with multiqueue support in the block layer. While doing the change, replace custom bottom half with aio_co_schedule. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-10-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:14:08 +00:00
Paolo Bonzini	ff82911cd3	nbd: convert to use qio_channel_yield In the client, read the reply headers from a coroutine, switching the read side between the "read header" coroutine and the I/O coroutine that reads the body of the reply. In the server, if the server can read more requests it will create a new "read request" coroutine as soon as a request has been read. Otherwise, the new coroutine is created in nbd_request_put. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-8-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:14:08 +00:00
Paolo Bonzini	35f106e684	block-backend: allow blk_prw from coroutine context qcow2_create2 calls this. Do not run a nested event loop, as that breaks when aio_co_wake tries to queue the coroutine on the co_queue_wakeup list of the currently running one. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213135235.12274-4-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:14:07 +00:00
Paolo Bonzini	c2b38b277a	block: move AioContext, QEMUTimer, main-loop to libqemuutil AioContext is fairly self contained, the only dependency is QEMUTimer but that in turn doesn't need anything else. So move them out of block-obj-y to avoid introducing a dependency from io/ to block-obj-y. main-loop and its dependency iohandler also need to be moved, because later in this series io/ will call iohandler_get_aio_context. [Changed copyright "the QEMU team" to "other QEMU contributors" as suggested by Daniel Berrange and agreed by Paolo. --Stefan] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213135235.12274-2-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:14:07 +00:00
Alberto Garcia	7061a07898	qcow2: Optimize the refcount-block overlap check The metadata overlap checks introduced in `a40f1c2add` help detect corruption in the qcow2 image by verifying that data writes don't overlap with existing metadata sections. The 'refcount-block' check in particular iterates over the refcount table in order to get the addresses of all refcount blocks and check that none of them overlap with the region where we want to write. The problem with the refcount table is that since it always occupies complete clusters its size is usually very big. With the default values of cluster_size=64KB and refcount_bits=16 this table holds 8192 entries, each one of them enough to map 2GB worth of host clusters. So unless we're using images with several TB of allocated data this table is going to be mostly empty, and iterating over it is a waste of CPU. If the storage backend is fast enough this can have an effect on I/O performance. This patch keeps the index of the last used (i.e. non-zero) entry in the refcount table and updates it every time the table changes. The refcount-block overlap check then uses that index instead of reading the whole table. In my tests with a 4GB qcow2 file stored in RAM this doubles the amount of write IOPS. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 20170201123828.4815-1-berto@igalia.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-02-12 00:47:43 +01:00
Peter Lieven	f67409a5bb	block/nfs: fix naming of runtime opts commit `94d6a7a` accidentally left the naming of runtime opts and QAPI scheme inconsistent. As one consequence passing of parameters in the URI is broken. Sync the naming of the runtime opts to the QAPI scheme. Please note that this is technically backwards incompatible with the 2.8 release, but the 2.8 release is the only version that had the wrong naming. Furthermore release 2.8 suffered from a NULL pointer dereference during URI parsing. Fixes: `94d6a7a76e` Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Message-id: 1485942829-10756-3-git-send-email-pl@kamp.de [mreitz: Fixed commit message] Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-02-12 00:47:42 +01:00
Peter Lieven	8d20abe87a	block/nfs: fix NULL pointer dereference in URI parsing parse_uint_full wants to put the parsed value into the variable passed via its second argument which is NULL. Fixes: `94d6a7a76e` Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1485942829-10756-2-git-send-email-pl@kamp.de Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-02-12 00:47:42 +01:00
Dou Liyang	a6baa60807	block/qapi: reduce the execution time of qmp_query_blockstats In order to reduce the execution time, this patch optimize the qmp_query_blockstats(): Remove the next_query_bds function. Remove the bdrv_query_stats function. Remove some judgement sentence. The original qmp_query_blockstats calls next_query_bds to get the next objects in each loops. In the next_query_bds, it checks the query_nodes and blk. It also call bdrv_query_stats to get the stats, In the bdrv_query_stats, it checks blk and bs each times. This waste more times, which may stall the main loop a bit. And if the disk is too many and donot use the dataplane feature, this may affect the performance in main loop thread. This patch removes that two functions, and makes the structure clearly. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Message-id: 1484467275-27919-3-git-send-email-douly.fnst@cn.fujitsu.com Reviewed-by: Markus Armbruster <armbru@redhat.com> [mreitz: Removed duplicate info->value assignment] Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-02-12 00:47:42 +01:00
Dou Liyang	20a6d768f5	block/qapi: reduce the coupling between the bdrv_query_stats and bdrv_query_bds_stats The bdrv_query_stats and bdrv_query_bds_stats functions need to call each other, that increases the coupling. it also makes the program complicated and makes some unnecessary tests. Remove the call from bdrv_query_bds_stats to bdrv_query_stats, just take some recursion to make it clearly. Avoid testing whether the blk is NULL during querying the bds stats. It is unnecessary. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Message-id: 1484467275-27919-2-git-send-email-douly.fnst@cn.fujitsu.com Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-02-12 00:47:42 +01:00
QingFeng Hao	4545d4f4af	block/vmdk: Fix the endian problem of buf_len and lba The problem was triggered by qemu-iotests case 055. It failed when it was comparing the compressed vmdk image with original test.img. The cause is that buf_len in vmdk_write_extent wasn't converted to little-endian before it was stored to disk. But later vmdk_read_extent read it and converted it from little-endian to cpu endian. If the cpu is big-endian like s390, the problem will happen and the data length read by vmdk_read_extent will become invalid! The fix is to add the conversion in vmdk_write_extent, meanwhile, repair the endianness problem of lba field which shall also be converted to little-endian before storing to disk. Cc: qemu-stable@nongnu.org Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com> Signed-off-by: Jing Liu <liujbjl@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20161216052040.53067-2-haoqf@linux.vnet.ibm.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-02-12 00:47:42 +01:00
Fam Zheng	9adceb0213	qapi: Tweak error message of bdrv_query_image_info @bs doesn't always have a device name, such as when it comes from "qemu-img info". Report file name instead. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20170119130759.28319-2-famz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-02-12 00:47:41 +01:00
Peter Maydell	4e9f5244e1	-----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJYkeZAAAoJEJykq7OBq3PI6oUH/3qlRvQrWmhWLR+XCtwU0gON HRApL57Of+B1YbqJzb8wzjLMLfzZQYLoT7kf3FDRON751Iwpv2Qyl6j79kbmOQwy txvtgUTtPZrOZ9HMk6M1VboiKrkM1t0I1QiRYy/af2f1gD3KTqIt8YN1ic3xatKD Fgmx+oD+6EkrNilthemvDyaXtGsdTl4GC9ZbGcJB2VJzzWkksRUfeZWysIu9p2zP l6viegW/1+o5wYgBt6DxMalfNGbEiuBgXgx6PVFPbkw0xNURC52qDHhQ91xTSWt1 pvFrIhYWR/ETN0twJh+jtmCjkawKWSsx2nrLlrSh4H0EpwFoRfFqH/ZrOFSg0wg= =QnCX -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging # gpg: Signature made Wed 01 Feb 2017 13:44:32 GMT # gpg: using RSA key 0x9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/tracing-pull-request: trace: clean up trace-events files qapi: add missing trace_visit_type_enum() call trace: improve error reporting when parsing simpletrace header trace: update docs to reflect new code generation approach trace: switch to modular code generation for sub-directories trace: move setting of group name into Makefiles trace: move hw/i386/xen events to correct subdir trace: move hw/xen events to correct subdir trace: move hw/block/dataplane events to correct subdir make: move top level dir to end of include search path # Conflicts: # Makefile Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-02-02 16:08:28 +00:00
Paolo Bonzini	acf6e5f096	sheepdog: reorganize check for overlapping requests Wrap the code that was copied repeatedly in the two functions, sd_aio_setup and sd_aio_complete. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161129113245.32724-6-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-01 00:17:20 -05:00
Paolo Bonzini	c4080e9391	sheepdog: simplify inflight_aio_head management Add to the list in add_aio_request and, indirectly, resend_aioreq. Inline free_aio_req in the caller, it does not simply undo alloc_aio_req's job. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161129113245.32724-5-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-01 00:17:20 -05:00
Paolo Bonzini	28ddd08cd6	sheepdog: do not use BlockAIOCB Sheepdog's AIOCB are completely internal entities for a group of requests and do not need dynamic allocation. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161129113245.32724-4-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-01 00:17:20 -05:00
Paolo Bonzini	e80ab33dc0	sheepdog: reorganize coroutine flow Delimit co_recv's lifetime clearly in aio_read_response. Do a simple qemu_coroutine_enter in aio_read_response, letting sd_co_writev call sd_write_done. Handle nr_pending in the same way in sd_co_rw_vector, sd_write_done and sd_co_flush_to_disk. Remove sd_co_rw_vector's return value; just leave with no pending requests. [Jeff: added missing 'return' back, spotted by Paolo after series was applied.] Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-01 00:17:20 -05:00
Paolo Bonzini	a71264f9f5	sheepdog: remove unused cancellation support SheepdogAIOCB is internal to sheepdog.c, hence it is never canceled. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161129113245.32724-2-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-02-01 00:17:20 -05:00
Stefan Hajnoczi	7f4076c1bb	trace: clean up trace-events files There are a number of unused trace events that scripts/cleanup-trace-events.pl finds. The "hw/vfio/pci-quirks.c" filename was typoed and "qapi/qapi-visit-core.c" was missing the qapi/ directory prefix. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20170126171613.1399-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-01-31 17:12:15 +00:00
Peter Lieven	2deb63c2da	block/iscsi: statically link qemu_iscsi_opts commit `f57b4b5f` moved qemu_iscsi_opts into vl.c. This made them invisible for qemu-img, qemu-nbd etc. Fixes: `f57b4b5fb1` Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <1485262161-18543-1-git-send-email-pl@kamp.de> [Drop useless #ifdef. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-01-27 18:07:58 +01:00
Eric Farman	c4c41a0a65	block: get max_transfer limit for char (scsi-generic) devices We can get the maximum number of bytes for a single I/O transfer from the BLKSECTGET ioctl, but we only perform this for block devices. scsi-generic devices are represented as character devices, and so do not issue this today. Update this, so that virtio-scsi devices using the scsi-generic interface can return the same data. Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com> Message-Id: <20170120162527.66075-4-farman@linux.vnet.ibm.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-01-27 18:07:31 +01:00
Eric Farman	482652502e	block: Fix target variable of BLKSECTGET ioctl Commit `6f6071745b` ("raw-posix: Fetch max sectors for host block device") introduced a routine to call the kernel BLKSECTGET ioctl, which stores the result back to user space. However, the size of the data returned depends on the routine handling the ioctl. The (compat_)blkdev_ioctl returns a short, while sg_ioctl returns an int. Thus, on big-endian systems, we can find ourselves accidentally shifting the result to a much larger value. (On s390x, a short is 16 bits while an int is 32 bits.) Also, the two ioctl handlers return values in different scales (block returns sectors, while sg returns bytes), so some tweaking of the outputs is required such that hdev_get_max_transfer_length returns a value in a consistent set of units. Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com> Message-Id: <20170120162527.66075-3-farman@linux.vnet.ibm.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-01-27 18:07:31 +01:00
Peter Lieven	1da45e0c4c	block/iscsi: avoid data corruption with cache=writeback nb_cls_shrunk in iscsi_allocmap_update can become -1 if the request starts and ends within the same cluster. This results in passing -1 to bitmap_set and bitmap_clear and they don't handle negative values properly. In the end this leads to data corruption. Fixes: `e1123a3b40` Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <1484579832-18589-1-git-send-email-pl@kamp.de> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-01-27 18:07:31 +01:00
Ashijeet Acharya	fe44dc9180	migration: disallow migrate_add_blocker during migration If a migration is already in progress and somebody attempts to add a migration blocker, this should rightly fail. Add an errp parameter and a retcode return value to migrate_add_blocker. Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> Message-Id: <1484566314-3987-5-git-send-email-ashijeetacharya@gmail.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Acked-by: Greg Kurz <groug@kaod.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Merged with recent 'Allow invtsc migration' change	2017-01-24 18:00:30 +00:00
Ashijeet Acharya	05551e58ee	block/vvfat: Remove the undesirable comment Remove the "// assert(is_consistent(s))" comment in block/vvfat.c Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> Message-Id: <1484566314-3987-2-git-send-email-ashijeetacharya@gmail.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-01-24 17:54:47 +00:00
Paolo Bonzini	8f90b5e91d	block: get rid of bdrv_io_unplugged_begin/end bdrv_io_plug and bdrv_io_unplug are only called (via their BlockBackend equivalents) after starting asynchronous I/O. bdrv_drain is not going to be called while they are running, because---even if a coroutine runs for some reason---it will only drain in the next iteration of the event loop through bdrv_co_yield_to_drain. So this mechanism is unnecessary, get rid of it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161129113334.605-1-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-01-16 13:25:17 +00:00
Eric Blake	c1bb86cd8a	block: Rename raw-{posix,win32} to file-*.c These files deal with the file protocol, not the raw format (the file protocol is often used with other formats, and the raw format is not forced to use the file protocol). Rename things to make it a bit easier to follow. Suggested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-01-09 13:30:53 +01:00
Eric Blake	2e6fc7eb1a	block: Rename raw_bsd to raw-format.c Given that we have raw-win32.c and raw-posix.c, my initial guess at raw_bsd.c was that it was for dealing with raw files using code specific to the BSD operating system (beyond what raw-posix could do). Not so - this name was chosen back in commit `e1c66c6` to distinguish that it was a BSD licensed file, in contrast to the then-existing raw.c with an unclear and potentially unusable license. But since it has been more than three years since the rewrite, it's time to pick a more useful name for this file to avoid this type of confusion to future contributors that don't know the backstory, as none of our other files are named solely by the license they use. In reality, this file deals with the raw format, which is useful with any number of protocols, while raw-{win32,posix} deal with the file protocol (and in turn, that protocol is not limited to use with the raw format). So rename raw_bsd to raw-format.c. We could have also used the shorter name raw.c, except that collides with the earlier use of that filename for a different license, and it's better to be safe than risk license pollution. The next patch will also rename raw-win32.c and raw-posix.c to further distinguish the difference in roles. It doesn't hurt that this gets rid of an underscore in the filename, thereby making tab-completion on 'ra<TAB>' easier (now I don't have to type the shift key, which slows things down :) Suggested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-01-09 13:30:52 +01:00
Kevin Wolf	44b6789299	blkverify: Implement bdrv_co_preadv/pwritev/flush This enables byte granularity requests for blkverify, and at the same time gets us rid of another user of the BDS-level AIO emulation. The reference output of a test case must be changed because the verification failure message reports byte offsets instead of sectors now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-01-09 13:30:52 +01:00
Kevin Wolf	7c3a998531	blkdebug: Implement bdrv_co_preadv/pwritev/flush This enables byte granularity requests for blkdebug, and at the same time gets us rid of another user of the BDS-level AIO emulation. Note that unless align=512 is specified, this can behave subtly different from the old behaviour because bdrv_co_preadv/pwritev don't have to perform alignment adjustments any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-01-09 13:30:52 +01:00
Kevin Wolf	7c37f941d0	quorum: Clean up quorum_aio_get() Make sure that all fields of the new QuorumAIOCB are zeroed when the function returns even without explicitly setting them. This will protect us when new fields are added, removes some explicit zero assignment and makes the code a little nicer to read. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2017-01-09 13:30:52 +01:00
Kevin Wolf	a7e159025e	quorum: Inline quorum_fifo_aio_cb() Inlining the function removes some boilerplace code and replaces recursion by a simple loop, so the code becomes somewhat easier to understand. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-01-09 13:30:52 +01:00
Kevin Wolf	6847da3808	quorum: Implement .bdrv_co_preadv/pwritev() This enables byte granularity requests on quorum nodes. Note that the QMP events emitted by the driver are an external API that we were careless enough to define as sector based. The offset and length of requests reported in events are rounded therefore. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2017-01-09 13:30:52 +01:00
Kevin Wolf	dee66e2882	quorum: Avoid bdrv_aio_writev() for rewrites Replacing it with bdrv_co_pwritev() prepares us for byte granularity requests and gets us rid of the last bdrv_aio_*() user in quorum. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-01-09 13:30:52 +01:00
Kevin Wolf	7cd9b3964e	quorum: Inline quorum_aio_cb() This is a conversion to a more natural coroutine style and improves the readability of the driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-01-09 13:30:52 +01:00
Kevin Wolf	0f31977d9d	quorum: Do cleanup in caller coroutine Instead of calling quorum_aio_finalize() deeply nested in what used to be an AIO callback, do it in the same functions that allocated the AIOCB. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-01-09 13:30:52 +01:00
Kevin Wolf	ce15dc08ef	quorum: Implement .bdrv_co_readv/writev This converts the quorum block driver from implementing callback-based interfaces for read/write to coroutine-based ones. This is the first step that will allow us further simplification of the code. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2017-01-09 13:30:52 +01:00
Kevin Wolf	10c8551968	quorum: Remove s from quorum_aio_get() arguments There is no point in passing the value of bs->opaque in order to overwrite it with itself. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2017-01-09 13:30:52 +01:00
Stefan Hajnoczi	ee68697551	linux-aio: poll ring for completions The Linux AIO userspace ABI includes a ring that is shared with the kernel. This allows userspace programs to process completions without system calls. Add an AioContext poll handler to check for completions in the ring. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161201192652.9509-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-01-03 16:38:48 +00:00
Stefan Hajnoczi	f6a51c84cd	aio: add AioPollFn and io_poll() interface The new AioPollFn io_poll() argument to aio_set_fd_handler() and aio_set_event_handler() is used in the next patch. Keep this code change separate due to the number of files it touches. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161201192652.9509-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-01-03 16:38:48 +00:00
Stefan Hajnoczi	68701de136	Block layer patches for 2.8.0-rc3 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJYRs7XAAoJEH8JsnLIjy/W5ScP/0ukNPAUUYGnU5dIj6q20kRk kIhEDZXNRMqT9qhSypi5ivOc/HDIXBUM5jsuAPHTK/JxMWeAFCwOwKXzKLPEZ9iG nA5UGMST4uwwB0bANUAyweGIHdTIfhgN2dgLKvKIsbeTNmCyMaJieZ29fkVbNKyS msOFmaVn+nm4rgE/q7HXtg/hUdjuoaEOsWsJ7YY3Bj8kgQR/H8iPCvNCl3YWGwIW 9vXQL25QUaQbBWinA+rHhHiGPAK2GzitVry3fQGch5j4OqpXYt3IQTSqXZMQLHcT zUwcx99C16hG9R7sjhNuto+lMuw6qtK75s/7PGpjw8aQFwYR5ITAyB369dxmrGqc 1bBPsRCXfhWku/4wMzrj4fO7iszMadBIzChwk+IsCRNAWHFGoc9VHvk3mdT4puBB 2W4JlOzGY/FD/rQRetGGmGN09HheRZ5sW7o9DoUoGBLCk1llIrVs/fKtHLDtx1T9 sNe5e7EYdNufBTpy7p/75nRMyYlVlENJW/A+nw2pvsEZjU9LAdWqEEfmU1OU1rdl TEZxBZQtPq0ZkfQIV6mPUFBhE3e8EbB573jJaD2P8StpR6jCFRZlU2hkW4c7FnZ1 pUfc8PJFP4Bo6CaNy7PMUCUHD7z8eZP/BFndODAgBFm2WAxN2tsRrfPDWh23huZV k9+VZFjeT2jz+mHtRCbA =9y67 -----END PGP SIGNATURE----- Merge remote-tracking branch 'kwolf/tags/for-upstream' into staging Block layer patches for 2.8.0-rc3 # gpg: Signature made Tue 06 Dec 2016 02:44:39 PM GMT # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * kwolf/tags/for-upstream: qcow2: Don't strand clusters near 2G intervals during commit Message-id: 1481037418-10239-1-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-12-06 17:35:29 +00:00
Eric Blake	a3e1505dae	qcow2: Don't strand clusters near 2G intervals during commit The qcow2_make_empty() function is reached during 'qemu-img commit', in order to clear out ALL clusters of an image. However, if the image cannot use the fast code path (true if the image is format 0.10, or if the image contains a snapshot), the cluster size is larger than 512, and the image is larger than 2G in size, then our choice of sector_step causes problems. Since it is not cluster aligned, but qcow2_discard_clusters() silently ignores an unaligned head or tail, we are leaving clusters allocated. Enhance the testsuite to expose the flaw, and patch the problem by ensuring our step size is aligned. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-12-06 15:37:02 +01:00
Prasanna Kumar Kalever	7103d9165b	block/nfs: fix QMP to match debug option The QMP definition of BlockdevOptionsNfs: { 'struct': 'BlockdevOptionsNfs', 'data': { 'server': 'NFSServer', 'path': 'str', 'user': 'int', 'group': 'int', 'tcp-syn-count': 'int', 'readahead-size': 'int', 'page-cache-size': 'int', 'debug-level': 'int' } } To make this consistent with other block protocols like gluster, lets change s/debug-level/debug/ Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-12-05 16:30:21 -05:00
Prasanna Kumar Kalever	1a417e46ae	block/gluster: fix QMP to match debug option The QMP definition of BlockdevOptionsGluster: { 'struct': 'BlockdevOptionsGluster', 'data': { 'volume': 'str', 'path': 'str', 'server': ['GlusterServer'], 'debug-level': 'int', 'logfile': 'str' } } But instead of 'debug-level we have exported 'debug' as the option for choosing debug level of gluster protocol driver. This patch fix QMP definition BlockdevOptionsGluster s/debug-level/debug/ Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-12-05 16:30:15 -05:00
Stefan Hajnoczi	f05234df63	Block layer patches for 2.8.0-rc2 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJYPZu6AAoJEH8JsnLIjy/W1IAP/AwV0sWafsSMnWiz/4NVqeh3 Yk2cBtxCBmnq1y+PilLoZBdHui/RumwVuZKaShs3JA1n5CB1AjsVtEVl/6rQM7lv yymLr32pODuf4eaGwGY09FqTiL0Erlm846zbSDjkiKbTYoKpzRv0PT2iiA6yTnjO Mrs5nG7kEWdXPZ0ZsJyEyU3+vs7rNg+4N/VfTdPmCrV5DVBvAeCawM6JXHQNc7LV ER6Y8W9PAu5mYqwekjAW07lPCudytAsOTrbTTO9Sv/+kZUdKEmv7ZHJrPdECCb6N vcPOYOzKsEvvR8E0YZtuJDK9W4RTakxdlTste+TtW3VSt1Cs0zpvCFytaGuC+Kmq mhlA4lYLDvaiNOMl09SvIjjxGI7+FO+1XsY7e4rI5PJzOKWZMFOIwQMNxE3B2qUI dxd6izf7fzF4V5uDDwHTJ8TAiJDSAe6Bkz+vzipQtu5NARl/isbQuIPIGXPkxZln fkCYA8/7EXrLXqd3khiRqEHS60ZtNgfm4ss8euMlWAgJAz0RLC1d/XhOIxaCQOg3 R/F9UdJAon6mfOgamZs5yzJgaPU6M90g/QipMB3Ub00VODacTiA81QUjZdEgELBB zvhgeja7qdIvOh9r9heCCuUTmkWRRppmkrKdqFLowZ2aWosISy/UjiPTGdjDdq7Z LsfYiRXsW94FmNXdKCqg =3WTz -----END PGP SIGNATURE----- Merge remote-tracking branch 'kwolf/tags/for-upstream' into staging Block layer patches for 2.8.0-rc2 # gpg: Signature made Tue 29 Nov 2016 03:16:10 PM GMT # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * kwolf/tags/for-upstream: docs: Specify that cache-clean-interval is only supported in Linux qcow2: Remove stale comment qcow2: Allow 'cache-clean-interval' in Linux only qcow2: Make qcow2_cache_table_release() work only in Linux Message-id: 1480436227-2211-1-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-11-29 17:06:39 +00:00
Alberto Garcia	a8b99dd516	qcow2: Remove stale comment We haven't been using CONFIG_MADVISE since `02d0e09503` Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-11-25 13:51:30 +01:00
Alberto Garcia	91203f08f0	qcow2: Allow 'cache-clean-interval' in Linux only The cache-clean-interval option of qcow2 only works on Linux. However we allow setting it in other systems regardless of whether it works or not. In those systems this option is not simply a no-op: it actually invalidates perfectly valid cache tables for no good reason without freeing their memory. This patch forbids using that option in non-Linux systems. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-11-25 13:51:30 +01:00
Alberto Garcia	2f2c8d6b37	qcow2: Make qcow2_cache_table_release() work only in Linux We are using QEMU_MADV_DONTNEED to discard the memory of individual L2 cache tables. The problem with this is that those semantics are specific to the Linux madvise() system call. Other implementations of madvise() (including the very Linux implementation of posix_madvise()) don't do that, so we cannot use them for the same purpose. This patch makes the code Linux-specific and uses madvise() directly since there's no point in going through qemu_madvise() for this. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-11-25 13:51:30 +01:00
Stefan Hajnoczi	f0c10c392f	Small fixes for rc1. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJYNMYwAAoJEL/70l94x66DBrUIAKeNK59lTbUm1WVl15nyB2qM jE2804Kcp+EGTwFHeo5GGsb+CplK54uMzHq2wzN6G3EmnaV3xbbdiZ7cmNl5Q6Tr qq7/pAer/T+xvQ3iDOTkAvJcqiMUZIx+MXrFED46KBUtqANJ2tAg2uEEqbI0RbOU +qtMZlPxo3IOuYnVROug1PPdNQDluBvZjrCYtb7VfZNo13u2UGYmRjZttobVfihF AQjv57uiawPs2e3VmUvIH8fjjEgV4MlPLiilL1eYsLaszjIBgdfrQOO7bdfetLo8 THkNJEZTpS9T9ChcbcTKS7yovI3OiIxPMwyftELClacX3wVtSie2WNx0sj/3Xpw= =DPxR -----END PGP SIGNATURE----- Merge remote-tracking branch 'bonzini/tags/for-upstream' into staging Small fixes for rc1. # gpg: Signature made Tue 22 Nov 2016 10:26:56 PM GMT # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * bonzini/tags/for-upstream: scsi/esp: do not raise an interrupt when reading the FIFO register nbd: Allow unmap and fua during write zeroes cpu_ldst.h: use correct guest address parameter Message-id: 1479853676-35995-1-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-11-23 11:44:29 +00:00
Eric Blake	169407e1f7	nbd: Allow unmap and fua during write zeroes Commit `fa778fff` wired up support to send the NBD_CMD_WRITE_ZEROES, but forgot to inform the block layer that FUA unmapping of zeroes is supported. Without BDRV_REQ_MAY_UNMAP listed as a supported flag, the block layer will always insist on the NBD layer passing NBD_CMD_FLAG_NO_HOLE, resulting in the server always allocating things even when it was desired to let the server punch holes. Similarly, failing to set BDRV_REQ_FUA means that the client may send unnecessary NBD_CMD_FLUSH when it could have instead used the NBD_CMD_FLAG_FUA bit. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1479413642-22463-2-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-11-22 23:26:51 +01:00
Eric Blake	3482b9bc41	block: Pass unaligned discard requests to drivers Discard is advisory, so rounding the requests to alignment boundaries is never semantically wrong from the data that the guest sees. But at least the Dell Equallogic iSCSI SANs has an interesting property that its advertised discard alignment is 15M, yet documents that discarding a sequence of 1M slices will eventually result in the 15M page being marked as discarded, and it is possible to observe which pages have been discarded. Between commits `9f1963b` and `b8d0a980`, we converted the block layer to a byte-based interface that ultimately ignores any unaligned head or tail based on the driver's advertised discard granularity, which means that qemu 2.7 refuses to pass any discard request smaller than 15M down to the Dell Equallogic hardware. This is a slight regression in behavior compared to earlier qemu, where a guest executing discards in power-of-2 chunks used to be able to get every page discarded, but is now left with various pages still allocated because the guest requests did not align with the hardware's 15M pages. Since the SCSI specification says nothing about a minimum discard granularity, and only documents the preferred alignment, it is best if the block layer gives the driver every bit of information about discard requests, rather than rounding it to alignment boundaries early. Rework the block layer discard algorithm to mirror the write zero algorithm: always peel off any unaligned head or tail and manage that in isolation, then do the bulk of the request on an aligned boundary. The fallback when the driver returns -ENOTSUP for an unaligned request is to silently ignore that portion of the discard request; but for devices that can pass the partial request all the way down to hardware, this can result in the hardware coalescing requests and discarding aligned pages after all. Reported by: Peter Lieven <pl@kamp.de> CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-11-22 15:59:23 +01:00
Eric Blake	49228d1e95	block: Return -ENOTSUP rather than assert on unaligned discards Right now, the block layer rounds discard requests, so that individual drivers are able to assert that discard requests will never be unaligned. But there are some ISCSI devices that track and coalesce multiple unaligned requests, turning it into an actual discard if the requests eventually cover an entire page, which implies that it is better to always pass discard requests as low down the stack as possible. In isolation, this patch has no semantic effect, since the block layer currently never passes an unaligned request through. But the block layer already has code that silently ignores drivers that return -ENOTSUP for a discard request that cannot be honored (as well as drivers that return 0 even when nothing was done). But the next patch will update the block layer to fragment discard requests, so that clients are guaranteed that they are either dealing with an unaligned head or tail, or an aligned core, making it similar to the block layer semantics of write zero fragmentation. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-11-22 15:59:22 +01:00
Eric Blake	b2f95feec5	block: Let write zeroes fallback work even with small max_transfer Commit `443668ca` rewrote the write_zeroes logic to guarantee that an unaligned request never crosses a cluster boundary. But in the rewrite, the new code assumed that at most one iteration would be needed to get to an alignment boundary. However, it is easy to trigger an assertion failure: the Linux kernel limits loopback devices to advertise a max_transfer of only 64k. Any operation that requires falling back to writes rather than more efficient zeroing must obey max_transfer during that fallback, which means an unaligned head may require multiple iterations of the write fallbacks before reaching the aligned boundaries, when layering a format with clusters larger than 64k atop the protocol of file access to a loopback device. Test case: $ qemu-img create -f qcow2 -o cluster_size=1M file 10M $ losetup /dev/loop2 /path/to/file $ qemu-io -f qcow2 /dev/loop2 qemu-io> w 7m 1k qemu-io> w -z 8003584 2093056 In fairness to Denis (as the original listed author of the culprit commit), the faulty logic for at most one iteration is probably all my fault in reworking his idea. But the solution is to restore what was in place prior to that commit: when dealing with an unaligned head or tail, iterate as many times as necessary while fragmenting the operation at max_transfer boundaries. Reported-by: Ed Swierk <eswierk@skyportsystems.com> CC: qemu-stable@nongnu.org CC: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-11-22 15:59:22 +01:00
Eric Blake	ecdbead659	qcow2: Inform block layer about discard boundaries At the qcow2 layer, discard is only possible on a per-cluster basis; at the moment, qcow2 silently rounds any unaligned requests to this granularity. However, an upcoming patch will fix a regression in the block layer ignoring too much of an unaligned discard request, by changing the block layer to break up a discard request at alignment boundaries; for that to work, the block layer must know about our limits. However, we can't go one step further by changing qcow2_discard_clusters() to assert that requests are always aligned, since that helper function is reached on paths outside of the block layer. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-11-22 15:59:22 +01:00
Kevin Wolf	668c0e441d	gluster: Fix use after free in glfs_clear_preopened() This fixes a use-after-free bug introduced in commit `6349c154`. We need to use QLIST_FOREACH_SAFE() when freeing elements in the loop. Spotted by Coverity. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1479378608-11962-1-git-send-email-kwolf@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-21 17:04:43 -05:00
Paolo Bonzini	bdffb31d8e	mirror: do not flush every time the disks are synced This puts a huge strain on the disks when there are many concurrent migrations. With this patch we only flush twice: just before issuing the event, and just before pivoting to the destination. If management will complete the job close to the BLOCK_JOB_READY event, the cost of the second flush should be small anyway. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161109162008.27287-2-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:49:26 -05:00
Max Reitz	4e504535c1	block/curl: Do not wait for data beyond EOF libcurl will only give us as much data as there is, not more. The block layer will deny requests beyond the end of file for us; but since this block driver is still using a sector-based interface, we can still get in trouble if the file size is not a multiple of 512. While we have already made sure not to attempt transfers beyond the end of the file, we are currently still trying to receive data from there if the original request exceeds the file size. This patch fixes this issue and invokes qemu_iovec_memset() on the iovec's tail. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20161025025431.24714-5-mreitz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00
Max Reitz	ff5ca1664a	block/curl: Remember all sockets For some connection types (like FTP, generally), more than one socket may be used (in FTP's case: control vs. data stream). As of commit `838ef60249` ("curl: Eliminate unnecessary use of curl_multi_socket_all"), we have to remember all of the sockets used by libcurl, but in fact we only did that for a single one. Since one libcurl connection may use multiple sockets, however, we have to remember them all. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20161025025431.24714-4-mreitz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00
Max Reitz	4e7676571b	block/curl: Fix return value from curl_read_cb While commit `38bbc0a580` is correct in that the callback is supposed to return the number of bytes handled; what it does not mention is that libcurl will throw an error if the callback did not "handle" all of the data passed to it. Therefore, if the callback receives some data that it cannot handle (either because the receive buffer has not been set up yet or because it would not fit into the receive buffer) and we have to ignore it, we still have to report that the data has been handled. Obviously, this should not happen normally. But it does happen at least for FTP connections where some data (that we do not expect) may be generated when the connection is established. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20161025025431.24714-3-mreitz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00
Max Reitz	9054d9f6b0	block/curl: Use BDRV_SECTOR_SIZE Currently, curl defines its own constant SECTOR_SIZE. There is no advantage over using the global BDRV_SECTOR_SIZE, so drop it. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20161025025431.24714-2-mreitz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00
Max Reitz	23dce3873f	block/curl: Drop TFTP "support" Because TFTP does not support byte ranges, it was never usable with our curl block driver. Since apparently nobody has ever complained loudly enough for someone to take care of the issue until now, it seems reasonable to assume that nobody has ever actually used it. Therefore, it should be safe to just drop it from curl's protocol list. [Jeff Cody: Below is additional summary pulled, with some rewording, from followup emails between Max and Markus, to explain what worked and what didn't] TFTP would sometimes work, to a limited extent, for images <= the curl "readahead" size, so long as reads started at offset zero. By default, that readahead size is 256KB. Reads starting at a non-zero offset would also have returned data from a zero offset. It can become more complicated still, with mixed reads at zero offset and non-zero offsets, due to data buffering. In short, TFTP could only have worked before in very specific scenarios with unrealistic expectations and constraints. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20161102175539.4375-4-mreitz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00
John Snow	111049a4ec	blockjob: refactor backup_start as backup_job_create Refactor backup_start as backup_job_create, which only creates the job, but does not automatically start it. The old interface, 'backup_start', is not kept in favor of limiting the number of nearly-identical interfaces that would have to be edited to keep up with QAPI changes in the future. Callers that wish to synchronously start the backup_block_job can instead just call block_job_start immediately after calling backup_job_create. Transactions are updated to use the new interface, calling block_job_start only during the .commit phase, which helps prevent race conditions where jobs may finish before we even finish building the transaction. This may happen, for instance, during empty block backup jobs. Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1478587839-9834-6-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00
John Snow	5ccac6f186	blockjob: add block_job_start Instead of automatically starting jobs at creation time via backup_start et al, we'd like to return a job object pointer that can be started manually at later point in time. For now, add the block_job_start mechanism and start the jobs automatically as we have been doing, with conversions job-by-job coming in later patches. Of note: cancellation of unstarted jobs will perform all the normal cleanup as if the job had started, particularly abort and clean. The only difference is that we will not emit any events, because the job never actually started. Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1478587839-9834-5-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00
John Snow	a7815a764c	blockjob: add .start field Add an explicit start field to specify the entrypoint. We already have ownership of the coroutine itself AND managing the lifetime of the coroutine, let's take control of creation of the coroutine, too. This will allow us to delay creation of the actual coroutine until we know we'll actually start a BlockJob in block_job_start. This avoids the sticky question of how to "un-create" a Coroutine that hasn't been started yet. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1478587839-9834-4-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00
John Snow	e8a40bf71d	blockjob: add .clean property Cleaning up after we have deferred to the main thread but before the transaction has converged can be dangerous and result in deadlocks if the job cleanup invokes any BH polling loops. A job may attempt to begin cleaning up, but may induce another job to enter its cleanup routine. The second job, part of our same transaction, will block waiting for the first job to finish, so neither job may now make progress. To rectify this, allow jobs to register a cleanup operation that will always run regardless of if the job was in a transaction or not, and if the transaction job group completed successfully or not. Move sensitive cleanup to this callback instead which is guaranteed to be run only after the transaction has converged, which removes sensitive timing constraints from said cleanup. Furthermore, in future patches these cleanup operations will be performed regardless of whether or not we actually started the job. Therefore, cleanup callbacks should essentially confine themselves to undoing create operations, e.g. setup actions taken in what is now backup_start. Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1478587839-9834-3-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-14 22:47:34 -05:00
Stefan Hajnoczi	682df581c6	-----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJYKeNwAAoJEH3vgQaq/DkO2uAQAKUidQMRQjHs3T5vyb7PcXCe DVV3PO+xKIFl+eWbjDYH2OdPL8OzgyNcGnwtHkdogKklWvYMD002vQ9YmNa2cbJn cO5d8jzSRtsTTLSbtjipFIrvJ8FxedX3Jay0cvEbaEqkgZXJV1sXN5CJ/Cseyf+G IZrG047Kf4V3inV8RDvJ9U/VcSlIcst9icZOuLlONvhXM7f+R5CkvqwUn4yVOObt Wwq32r47Dd9BwzrpxM//7haDvAXYm/xcP3bImN/3LAAwYPGkswxOe1I7Q62+fbpe dd/FSfhe6nRjStKTtH7T+AQk1VJKw34su9/FSxzIZaCzHYMco5CIziCwi0s4BocR GqZ0E0oPxWvrrFhljBxt1wA4d2j354Wq2cGbmb7rQpJTEbfGH5nDHqF1FAbMmd8N F9H6tSCvh1xJaJngGZjlMsgs6TkqyQEnCjk7SSAs1XS+qyrcyOWk7ydzAAc/RIHl iIN4aLcL7ix1rcoVttw+4VOSvihas6nTvRPPwVTbHO5003QpXdr3dckQaASP3PTd wky7blVk8+O8Y242F0AAYUb04agZ+KpqsaOcCL3SIPc3yBv3JCNCNy0gH4WIBX66 yYxTgRtaNhHiUWaVQLximq1QUjz+vsTE07FI56PSabz1e/RkRp+BbrwaYLKYy+/F jBfRpP7pkPIWJhrPmYpJ =fKei -----END PGP SIGNATURE----- Merge remote-tracking branch 'jsnow/tags/ide-pull-request' into staging # gpg: Signature made Mon 14 Nov 2016 04:16:48 PM GMT # gpg: using RSA key 0x7DEF8106AAFC390E # gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" # Primary key fingerprint: FAEB 9711 A12C F475 812F 18F2 88A9 064D 1835 61EB # Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76 CBD0 7DEF 8106 AAFC 390E * jsnow/tags/ide-pull-request: ahci-test: add QMP tray test for ATAPI libqos/ahci: Add get_sense and test_ready libqos/ahci: Add ATAPI tray macros libqos/ahci: Support expected errors libqtest: add qmp_eventwait_ref block-backend: Always notify on blk_eject ahci-test: test atapi read_cd with bcl, nb_sectors = 0 ahci-test: Create smaller test ISO images atapi: classify read_cd as conditionally returning data Message-id: 1479140746-22142-1-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-11-14 17:07:16 +00:00
John Snow	c47ee043dc	block-backend: Always notify on blk_eject blk_eject is only used by scsi-disk and atapi, and in both cases we only attempt to invoke blk_eject if we have a bona-fide change in tray state. The "issue" here is that the tray state does not generate a QMP event unless there is a medium/BDS attached to the device, so if libvirt et al are waiting for a tray event to occur from an empty-but-closed drive, software opening that drive will not emit an event and libvirt will wait forever. Change this by modifying blk_eject to always emit an event, instead of conditionally on a "real" backend eject. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1373264 Reported-by: Peter Krempa <pkrempa@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1478553214-497-2-git-send-email-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2016-11-14 11:15:54 -05:00
Fam Zheng	4e6d13c983	raw-posix: Rename 'raw_s' to 'rs' It is too confusing because it sounds like a BDRVRawState variable. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1477565117-17230-1-git-send-email-famz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-11-11 15:56:22 +01:00
Kevin Wolf	07555ba6f3	nfs: Fix memory leak in nfs_file_create() The leak was introduced in commit `94d6a7a7`. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-11-11 15:54:55 +01:00
Alberto Garcia	9dd76f82d9	qcow2: Remove stale FIXME comment It was from the time when none of the global functions had a qcow2_ prefix. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-11-11 15:54:55 +01:00
Tomáš Golembiovský	80a15e3e2e	raw_bsd: don't check size alignment when only offset is set We make sure that the size is aligned to sector length to prevent any round ups. Otherwise we could end up reading/writing data outside the area specified by user. This is only needed when user supplies the size option to avoid any surprises. It is not necessary when only offset is set. More over, the check made it difficult to use the offset option without size option. The check puts unneeded restriction on the offset which had to be aligned too. Because bdrv_getlength() returns aligned value having unaligned offset would make the check fail. Signed-off-by: Tomáš Golembiovský <tgolembi@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-11-11 15:54:55 +01:00
Tomáš Golembiovský	40332872fe	raw_bsd: move check to prevent overflow When only offset is specified but no size and the offset is greater than the real size of the containing device an overflow occurs when parsing the options. This overflow is harmless because we do check for this exact situation little bit later, but it leads to an error message with weird values. It is better to do the check is sooner and prevent the overflow. Signed-off-by: Tomáš Golembiovský <tgolembi@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-11-11 15:54:55 +01:00
Ashijeet Acharya	9a80832abf	block/ssh: Code cleanup for unused parameter This patch drops the unused parameter "BDRVSSHState" being passed into the ssh_config() function and does code cleanup. The unused parameter was introduced by the commit c322712. Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-11-11 15:54:55 +01:00
Ashijeet Acharya	a1d4e38a8b	block/nbd: Fix the leaked visitor This patch frees the leaked visitor in nbd_refresh_filename() and uses visit_free() to fix it. The leak was introduced by the commit `491d6c7`. Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-11-11 15:54:55 +01:00
Kevin Wolf	e6af1e0854	block: Don't mark node clean after failed flush Commit `3ff2f67a` changed bdrv_co_flush() so that no flush is issues if the image hasn't been dirtied since the last flush. This is not quite correct: The condition should be that the image hasn't been dirtied since the last _successful_ flush. This patch changes the logic accordingly. Without this fix, subsequent bdrv_co_flush() calls would return success without actually doing anything even though the image is still dirty. The difference is visible in some blkdebug test cases where error messages incorrectly disappeared after commit `3ff2f67a`. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1478300595-10090-1-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-11-08 16:06:35 +00:00
Stefan Hajnoczi	199a5bde46	* NBD bugfix (Changlong) * NBD write zeroes support (Eric) * Memory backend fixes (Haozhong) * Atomics fix (Alex) * New AVX512 features (Luwei) * "make check" logging fix (Paolo) * Chardev refactoring fallout (Paolo) * Small checkpatch improvements (Paolo, Jeff) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQExBAABCAAbBQJYGaRPFBxwYm9uemluaUByZWRoYXQuY29tAAoJEL/70l94x66D XKgH/RgNtosBTqJsmphkS7wACFAFOf7Uq46ajoKfB66Pt1J/++pFQg4TApPYkb7j KlKeKmXa7hb6+Jg8325H4zGkGno4kn2dE+OnznaB1xPKwiZVAMQVzQsagsEVqpno k/5PBVRptIiuHQKyU29Go0CxbWJBTH0O14S7rDK4YDF0YMnuT280HQOI3jdu1igV G/Q+CMgfk+yXf6GWHE8Z9sNq7n0ha8qgruA/X3NC7+pAvEsUcAP065zwLp9weYuK W1MU68L7Ub4tRo0SVf1HFkDUNdMv4T4hg+wpGe1GwthJWexHu9x0YAQBy60ykJb6 NtHwjLwCUWtm7AiZD/btsOJPmjk= =+Dt/ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * NBD bugfix (Changlong) * NBD write zeroes support (Eric) * Memory backend fixes (Haozhong) * Atomics fix (Alex) * New AVX512 features (Luwei) * "make check" logging fix (Paolo) * Chardev refactoring fallout (Paolo) * Small checkpatch improvements (Paolo, Jeff) # gpg: Signature made Wed 02 Nov 2016 08:31:11 AM GMT # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (30 commits) main-loop: Suppress I/O thread warning under qtest docs/rcu.txt: Fix minor typo vl: exit qemu on guest panic if -no-shutdown is not set checkpatch: allow spaces before parenthesis for 'coroutine_fn' x86: add AVX512_4VNNIW and AVX512_4FMAPS features slirp: fix CharDriver breakage qemu-char: do not forward events through the mux until QEMU has started nbd: Implement NBD_CMD_WRITE_ZEROES on client nbd: Implement NBD_CMD_WRITE_ZEROES on server nbd: Improve server handling of shutdown requests nbd: Refactor conversion to errno to silence checkpatch nbd: Support shorter handshake nbd: Less allocation during NBD_OPT_LIST nbd: Let client skip portions of server reply nbd: Let server know when client gives up negotiation nbd: Share common option-sending code in client nbd: Send message along with server NBD_REP_ERR errors nbd: Share common reply-sending code in server nbd: Rename struct nbd_request and nbd_reply nbd: Rename NbdClientSession to NBDClientSession ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-11-03 16:32:30 +00:00
Eric Blake	fa778fffdf	nbd: Implement NBD_CMD_WRITE_ZEROES on client Upstream NBD protocol recently added the ability to efficiently write zeroes without having to send the zeroes over the wire, along with a flag to control whether the client wants a hole. The generic block code takes care of falling back to the obvious write of lots of zeroes if we return -ENOTSUP because the server does not have WRITE_ZEROES. Ideally, since NBD_CMD_WRITE_ZEROES does not involve any data over the wire, we want to support transactions that are much larger than the normal 32M limit imposed on NBD_CMD_WRITE. But the server may still have a limit smaller than UINT_MAX, so until experimental NBD protocol additions for advertising various command sizes is finalized (see [1], [2]), for now we just stick to the same limits as normal writes. [1] https://github.com/yoe/nbd/blob/extension-info/doc/proto.md [2] https://sourceforge.net/p/nbd/mailman/message/35081223/ Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1476469998-28592-17-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-11-02 09:28:56 +01:00
Eric Blake	ed2dd91267	nbd: Rename struct nbd_request and nbd_reply Our coding convention prefers CamelCase names, and we already have other existing structs with NBDFoo naming. Let's be consistent, before later patches add even more structs. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1476469998-28592-6-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-11-02 09:28:55 +01:00
Eric Blake	10676b81a9	nbd: Rename NbdClientSession to NBDClientSession It's better to use consistent capitalization of the namespace used for NBD functions; we have more instances of NBD* than Nbd*. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1476469998-28592-5-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-11-02 09:28:55 +01:00
Eric Blake	b626b51a67	nbd: Treat flags vs. command type as separate fields Current upstream NBD documents that requests have a 16-bit flags, followed by a 16-bit type integer; although older versions mentioned only a 32-bit field with masking to find flags. Since the protocol is in network order (big-endian over the wire), the ABI is unchanged; but dealing with the flags as a separate field rather than masking will make it easier to add support for upcoming NBD extensions that increase the number of both flags and commands. Improve some comments in nbd.h based on the current upstream NBD protocol (https://github.com/yoe/nbd/blob/master/doc/proto.md), and touch some nearby code to keep checkpatch.pl happy. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1476469998-28592-3-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-11-02 09:28:55 +01:00
Changlong Xie	9bc9732fae	nbd: Use CoQueue for free_sema instead of CoMutex NBD is using the CoMutex in a way that wasn't anticipated. For example, if there are N(N=26, MAX_NBD_REQUESTS=16) nbd write requests, so we will invoke nbd_client_co_pwritev N times. ---------------------------------------------------------------------------------------- time request Actions 1 1 in_flight=1, Coroutine=C1 2 2 in_flight=2, Coroutine=C2 ... 15 15 in_flight=15, Coroutine=C15 16 16 in_flight=16, Coroutine=C16, free_sema->holder=C16, mutex->locked=true 17 17 in_flight=16, Coroutine=C17, queue C17 into free_sema->queue 18 18 in_flight=16, Coroutine=C18, queue C18 into free_sema->queue ... 26 N in_flight=16, Coroutine=C26, queue C26 into free_sema->queue ---------------------------------------------------------------------------------------- Once nbd client recieves request No.16' reply, we will re-enter C16. It's ok, because it's equal to 'free_sema->holder'. ---------------------------------------------------------------------------------------- time request Actions 27 16 in_flight=15, Coroutine=C16, free_sema->holder=C16, mutex->locked=false ---------------------------------------------------------------------------------------- Then nbd_coroutine_end invokes qemu_co_mutex_unlock what will pop coroutines from free_sema->queue's head and enter C17. More free_sema->holder is C17 now. ---------------------------------------------------------------------------------------- time request Actions 28 17 in_flight=16, Coroutine=C17, free_sema->holder=C17, mutex->locked=true ---------------------------------------------------------------------------------------- In above scenario, we only recieves request No.16' reply. As time goes by, nbd client will almostly recieves replies from requests 1 to 15 rather than request 17 who owns C17. In this case, we will encounter assert "mutex->holder == self" failed since Kevin's commit `0e438cdc` "coroutine: Let CoMutex remember who holds it". For example, if nbd client recieves request No.15' reply, qemu will stop unexpectedly: ---------------------------------------------------------------------------------------- time request Actions 29 15(most case) in_flight=15, Coroutine=C15, free_sema->holder=C17, mutex->locked=false ---------------------------------------------------------------------------------------- Per Paolo's suggestion "The simplest fix is to change it to CoQueue, which is like a condition variable", this patch replaces CoMutex with CoQueue. Cc: Wen Congyang <wency@cn.fujitsu.com> Reported-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Message-Id: <1476267508-19499-1-git-send-email-xiecl.fnst@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-11-01 16:06:57 +01:00
John Snow	c87621ea68	blockjobs: split interface into public/private, Part 1 To make it a little more obvious which functions are intended to be public interface and which are intended to be for use only by jobs themselves, split the interface into "public" and "private" files. Convert blockjobs (e.g. block/backup) to using the private interface. Leave blockdev and others on the public interface. There are remaining uses of private state by qemu-img, and several cases in blockdev.c and block/io.c where we grab job->blk for the purposes of acquiring an AIOContext. These will be corrected in future patches. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1477584421-1399-7-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-01 08:04:56 -04:00
John Snow	8254b6d953	blockjob: centralize QMP event emissions There's no reason to leave this to blockdev; we can do it in blockjobs directly and get rid of an extra callback for most users. All non-internal events, even those created outside of QMP, will consistently emit events. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1477584421-1399-5-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-01 07:55:57 -04:00
John Snow	47970dfb0a	Replication/Blockjobs: Create replication jobs as internal Bubble up the internal interface to commit and backup jobs, then switch replication tasks over to using this methodology. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1477584421-1399-4-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-01 07:55:57 -04:00
John Snow	f81e0b4532	blockjobs: Allow creating internal jobs Add the ability to create jobs without an ID. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1477584421-1399-3-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-01 07:55:57 -04:00
Prasanna Kumar Kalever	53d9837fb8	block/gluster: fix port type in the QAPI options list After introduction of qapi schema in gluster block driver code, the port type is now string as per InetSocketAddress { 'struct': 'InetSocketAddress', 'data': { 'host': 'str', 'port': 'str', 'to': 'uint16', 'ipv4': 'bool', '*ipv6': 'bool' } } but the current code still treats it as QEMU_OPT_NUMBER, hence fixing port to accept QEMU_OPT_STRING. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-01 07:55:57 -04:00
Prasanna Kumar Kalever	c56ac33b7a	block/gluster: improve defense over string to int conversion using atoi() for converting string to int may be error prone in case if string supplied in the argument is not a fold of numerical number, This is not a bug because in the existing code, static QemuOptsList runtime_tcp_opts = { .name = "gluster_tcp", .head = QTAILQ_HEAD_INITIALIZER(runtime_tcp_opts.head), .desc = { ... { .name = GLUSTER_OPT_PORT, .type = QEMU_OPT_NUMBER, .help = "port number ...", }, ... }; port type is QEMU_OPT_NUMBER, before we actually reaches atoi() port is already defended by parse_option_number() However It is a good practice to use function like parse_uint_full() over atoi() to keep port self defended Note: As now the port string to int conversion has its defence code set, and also we understand that port argument is actually a string type, in the follow up patch let's move port type from QEMU_OPT_NUMBER to QEMU_OPT_STRING [Jeff Cody: removed spurious parenthesis] Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-11-01 07:55:57 -04:00

1 2 3 4 5 ...

2928 Commits