openkylin/qemu - qemu - 红山开源项目托管

Commit Graph

Author	SHA1	Message	Date
Fam Zheng	69da3b0b47	block: Pause block jobs in bdrv_drain_all This is necessary to suppress more IO requests from being generated from block job coroutines. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1428069921-2957-3-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:09 +02:00
Fam Zheng	de50a20a4c	block: Switch to host monotonic clock for IO throttling Currently, throttle timers won't make any progress when VCPU is not running, which would stall the request queue in utils, qtest, vm suspending, and live migration, without special handling. Block jobs are confusingly inconsistent between with and without throttling: if user sets a bps limit, stops the vm, then start a block job, the block job will not make any progress; in contrary, if user unsets the bps limit, or if it's not set, the block job will run normally. After this patch, with the host clock, even if the VCPUs are stopped, the throttle queues will be processed. This patch also enables potential to add throttle to bdrv_drain_all. Currently all requests are drained immediately. In other words whenever it is called, IO throttling goes ineffective (examples: system reset, migration and many block job operations.). This is a loophole that guest could exploit. If we use the host clock, we can later just trust the nested poll. This could be done on top. Note that for qemu-iotests case 093, which uses qtest, we still keep vm clock so the script can control the clock stepping in order to be deterministic. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1427268446-6426-1-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:08 +02:00
Stefan Hajnoczi	786a4ea82e	Convert (ffs(val) - 1) to ctz32(val) This commit was generated mechanically by coccinelle from the following semantic patch: @@ expression val; @@ - (ffs(val) - 1) + ctz32(val) The call sites have been audited to ensure the ffs(0) - 1 == -1 case never occurs (due to input validation, asserts, etc). Therefore we don't need to worry about the fact that ctz32(0) == 32. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1427124571-28598-5-git-send-email-stefanha@redhat.com Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:08 +02:00
Fam Zheng	fc3959e466	block: Fix unaligned zero write If the zero write is not aligned, bdrv_co_do_pwritev will segfault because of accessing to the NULL qiov passed in by bdrv_co_write_zeroes. Fix this by allocating a local qiov in bdrv_co_do_pwritev if the request is not aligned. (In this case the padding iovs are necessary anyway, so it doesn't hurt.) Also add a check at the end of bdrv_co_do_pwritev to clear the zero flag if padding is involved. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1427160230-4489-2-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-03-27 10:01:12 +00:00
Fam Zheng	d51a2427f6	block: Drop bdrv_find All callers are converted, so drop it. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1425296209-1476-5-git-send-email-famz@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-03-16 12:10:30 -04:00
Markus Armbruster	a1f688f415	block: Deprecate QCOW/QCOW2 encryption We've steered users away from QCOW/QCOW2 encryption for a while, because it's a flawed design (commit `136cd19` Describe flaws in qcow/qcow2 encryption in the docs). In addition to flawed crypto, we have comically bad usability, and plain old bugs. Let me show you. = Example images = I'm going to use a raw image as backing file, and two QCOW2 images, one encrypted, and one not: $ qemu-img create -f raw backing.img 4m Formatting 'backing.img', fmt=raw size=4194304 $ qemu-img create -f qcow2 -o encryption,backing_file=backing.img,backing_fmt=raw geheim.qcow2 4m Formatting 'geheim.qcow2', fmt=qcow2 size=4194304 backing_file='backing.img' backing_fmt='raw' encryption=on cluster_size=65536 lazy_refcounts=off $ qemu-img create -f qcow2 -o backing_file=backing.img,backing_fmt=raw normal.qcow2 4m Formatting 'normal.qcow2', fmt=qcow2 size=4194304 backing_file='backing.img' backing_fmt='raw' encryption=off cluster_size=65536 lazy_refcounts=off = Usability issues = == Confusing startup == When no image is encrypted, and you don't give -S, QEMU starts the guest immediately: $ qemu-system-x86_64 -nodefaults -display none -monitor stdio normal.qcow2 QEMU 2.2.50 monitor - type 'help' for more information (qemu) info status VM status: running But as soon as there's an encrypted image in play, the guest is not started, with no notification whatsoever: $ qemu-system-x86_64 -nodefaults -display none -monitor stdio geheim.qcow2 QEMU 2.2.50 monitor - type 'help' for more information (qemu) info status VM status: paused (prelaunch) If the user figured out that he needs to type "cont" to enter his keys, the confusion enters the next level: "cont" asks for at most one key. If more are needed, it then silently does nothing. The user has to type "cont" once per encrypted image: $ qemu-system-x86_64 -nodefaults -display none -monitor stdio -drive if=none,file=geheim.qcow2 -drive if=none,file=geheim.qcow2 QEMU 2.2.50 monitor - type 'help' for more information (qemu) info status VM status: paused (prelaunch) (qemu) c none0 (geheim.qcow2) is encrypted. Password: **** (qemu) info status VM status: paused (prelaunch) (qemu) c none1 (geheim.qcow2) is encrypted. Password: **** (qemu) info status VM status: running == Incorrect passwords not caught == All existing encryption schemes give you the GIGO treatment: garbage password in, garbage data out. Guests usually refuse to mount garbage, but other usage is prone to data loss. == Need to stop the guest to add an encrypted image == $ qemu-system-x86_64 -nodefaults -display none -monitor stdio QEMU 2.2.50 monitor - type 'help' for more information (qemu) info status VM status: running (qemu) drive_add "" if=none,file=geheim.qcow2 Guest must be stopped for opening of encrypted image (qemu) stop (qemu) drive_add "" if=none,file=geheim.qcow2 OK Commit `c3adb58` added this restriction. Before, we could expose images lacking an encryption key to guests, with potentially catastrophic results. See also "Use without key is not always caught". = Bugs = == Use without key is not always caught == Encrypted images can be in an intermediate state "opened, but no key". The weird startup behavior and the need to stop the guest are there to ensure the guest isn't exposed to that state. But other things still are! * drive_backup $ qemu-system-x86_64 -nodefaults -display none -monitor stdio geheim.qcow2 QEMU 2.2.50 monitor - type 'help' for more information (qemu) drive_backup -f ide0-hd0 out.img raw Formatting 'out.img', fmt=raw size=4194304 I guess this writes encrypted data to raw image out.img. Good luck with figuring out how to decrypt that again. * commit $ qemu-system-x86_64 -nodefaults -display none -monitor stdio geheim.qcow2 QEMU 2.2.50 monitor - type 'help' for more information (qemu) commit ide0-hd0 I guess this writes encrypted data into the unencrypted raw backing image, effectively destroying it. == QMP device_add of usb-storage fails when it shouldn't == When the image is encrypted, device_add creates the device, defers actually attaching it to when the key becomes available, then fails. This is wrong. device_add must either create the device and succeed, or do nothing and fail. $ qemu-system-x86_64 -nodefaults -display none -usb -qmp stdio -drive if=none,id=foo,file=geheim.qcow2 {"QMP": {"version": {"qemu": {"micro": 50, "minor": 2, "major": 2}, "package": ""}, "capabilities": []}} { "execute": "qmp_capabilities" } {"return": {}} { "execute": "device_add", "arguments": { "driver": "usb-storage", "id": "bar", "drive": "foo" } } {"error": {"class": "DeviceEncrypted", "desc": "'foo' (geheim.qcow2) is encrypted"}} {"execute":"device_del","arguments": { "id": "bar" } } {"timestamp": {"seconds": 1426003440, "microseconds": 237181}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/bar/bar.0/legacy[0]"}} {"timestamp": {"seconds": 1426003440, "microseconds": 238231}, "event": "DEVICE_DELETED", "data": {"device": "bar", "path": "/machine/peripheral/bar"}} {"return": {}} This stuff is worse than useless, it's a trap for users. If people become sufficiently interested in encrypted images to contribute a cryptographically sane implementation for QCOW2 (or whatever other format), then rewriting the necessary support around it from scratch will likely be easier and yield better results than fixing up the existing mess. Let's deprecate the mess now, drop it after a grace period, and move on. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-16 17:07:25 +01:00
Ekaterina Tumanova	892b7de832	block: add bdrv functions for geometry and blocksize Add driver functions for geometry and blocksize detection Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1424087278-49393-2-git-send-email-tumanova@linux.vnet.ibm.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-03-10 14:02:21 +01:00
Markus Armbruster	dc523cd348	qemu-img: Suppress unhelpful extra errors in convert, amend img_convert() and img_amend() use qemu_opts_do_parse(), which reports errors with qerror_report_err(). Its error messages aren't helpful here, the caller reports one that actually makes sense. Reproducer: $ qemu-img convert -o backing_format=raw in.img out.img qemu-img: Invalid parameter 'backing_format' qemu-img: Invalid options for file format 'raw' To fix, propagate errors through qemu_opts_do_parse(). This lifts the error reporting into callers. Drop it from img_convert() and img_amend(), keep it in qemu_chr_parse_compat(), bdrv_img_create(). Since I'm touching qemu_opts_do_parse() anyway, write a function comment for it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-02-26 14:51:21 +01:00
Markus Armbruster	f43e47dbf6	QemuOpts: Drop qemu_opt_set(), rename qemu_opt_set_err(), fix use qemu_opt_set() is a wrapper around qemu_opt_set() that reports the error with qerror_report_err(). Most of its users assume the function can't fail. Make them use qemu_opt_set_err() with &error_abort, so that should the assumption ever break, it'll break noisily. Just two users remain, in util/qemu-config.c. Switch them to qemu_opt_set_err() as well, then rename qemu_opt_set_err() to qemu_opt_set(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-02-26 14:49:31 +01:00
Markus Armbruster	6be4194b92	block: Suppress unhelpful extra errors in bdrv_img_create() bdrv_img_create() uses qemu_opt_set(), which reports errors with qerror_report_err(). Its error messages aren't helpful here, the caller reports one that actually makes sense. I don't know how to trigger the error conditions, though. Switch to qemu_opt_set_err() to get rid of the unwanted messages. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-02-26 14:48:31 +01:00
Markus Armbruster	39101f2511	QemuOpts: Convert qemu_opt_set_number() to Error, fix its use Return the Error object instead of reporting it with qerror_report_err(). Change callers that assume the function can't fail to pass &error_abort, so that should the assumption ever break, it'll break noisily. Turns out all callers outside its unit test assume that. We could drop the Error ** argument, but that would make the interface less regular, so don't. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-02-26 14:47:32 +01:00
Max Reitz	b9c649470b	block: Keep bdrv_check*_request()'s return value Do not throw away the value returned by bdrv_check_request() and bdrv_check_byte_request(). Fix up some coding style issues in the proximity of the affected hunks. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1423162705-32065-17-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-02-16 15:07:19 +00:00
Max Reitz	c0191e763b	block: Remove "growable" from BDS Now that request clamping is done in the BlockBackend, the "growable" field can be removed from the BlockDriverState. All BDSs are now treated as being "growable" (that is, they are allowed to grow; they are not necessarily actually able to). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1423162705-32065-16-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-02-16 15:07:19 +00:00
Max Reitz	b65a5e12a4	block: Add Error parameter to bdrv_find_protocol() The argument given to bdrv_find_protocol() is just a file name, which makes it difficult for the caller to reconstruct what protocol bdrv_find_protocol() was hoping to find. This patch adds an Error parameter to that function to solve this issue. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1423162705-32065-4-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-02-16 15:07:18 +00:00
Markus Armbruster	b1ca639184	block: Eliminate silly QERR_ macros used for encryption keys The QERR_ macros are leftovers from the days of "rich" error objects. They're used with error_set() and qerror_report(), and expand into the first two arguments. This trickiness has become pointless. Clean up QERR_DEVICE_ENCRYPTED and QERR_DEVICE_NOT_ENCRYPTED. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1422524221-8566-5-git-send-email-armbru@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-02-06 11:46:32 -05:00
Markus Armbruster	4d2855a348	block: New bdrv_add_key(), convert monitor to use it Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1422524221-8566-4-git-send-email-armbru@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-02-06 11:46:32 -05:00
Peter Lieven	75af1f34cd	block: introduce BDRV_REQUEST_MAX_SECTORS we check and adjust request sizes at several places with sometimes inconsistent checks or default values: INT_MAX INT_MAX >> BDRV_SECTOR_BITS UINT_MAX >> BDRV_SECTOR_BITS SIZE_MAX >> BDRV_SECTOR_BITS This patches introdocues a macro for the maximal allowed sectors per request and uses it at several places. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:22 +01:00
Peter Lieven	f4564d53c6	block: add accounting for merged requests Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Lieven	98764152ad	block: change default for discard and write zeroes to INT_MAX do not trim requests if the driver does not supply a limit through BlockLimits. For write zeroes we still keep a limit for the unsupported path to avoid allocating a big bounce buffer. Suggested-by: Kevin Wolf <kwolf@redhat.com> Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Jeff Cody	a1a11d10ab	block: remove unused variable in bdrv_commit As Stefan pointed out, the variable 'filename' in bdrv_commit is unused, despite being maintained in previous patches. With this patch, get rid of the variable for good. Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-23 18:17:06 +01:00
Fam Zheng	bb00021de0	block: Split BLOCK_OP_TYPE_COMMIT to BLOCK_OP_TYPE_COMMIT_{SOURCE, TARGET} Like BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET, block-commit involves two asymmetric devices. This change is not user-visible (yet), because commit only works with device names. But once we enable backing reference in blockdev-add, or specifying node-name in block-commit command, we don't want the user to start two commit jobs on the same backing chain, which will corrupt things because of the final bdrv_swap. Before we have per category blockers, splitting this type is still better. [Resolved virtio-blk dataplane conflict by replacing BLOCK_OP_TYPE_COMMIT with both BLOCK_OP_TYPE_COMMIT_{SOURCE, TARGET}. They are safe since the block job runs in the same AioContext as the dataplane IOThread. --Stefan] Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-01-13 13:43:29 +00:00
Peter Lieven	095e4fa4b5	block: limited request size in write zeroes unsupported path If bs->bl.max_write_zeroes is large and we end up in the unsupported path we might allocate a lot of memory for the iovector and/or even generate an oversized requests. Fix this by limiting the request by the minimum of the reported maximum transfer size or 16MB (32768 sectors). Reported-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Denis V. Lunev <den@openvz.org> Message-id: 1420457389-16332-1-git-send-email-pl@kamp.de Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-01-13 13:43:29 +00:00
Vladimir Sementsov-Ogievskiy	c4237dfa63	block: fix spoiling all dirty bitmaps by mirror and migration Mirror and migration use dirty bitmaps for their purposes, and since commit [block: per caller dirty bitmap] they use their own bitmaps, not the global one. But they use old functions bdrv_set_dirty and bdrv_reset_dirty, which change all dirty bitmaps. Named dirty bitmaps series by Fam and Snow are affected: mirroring and migration will spoil all (not related to this mirroring or migration) named dirty bitmaps. This patch fixes this by adding bdrv_set_dirty_bitmap and bdrv_reset_dirty_bitmap, which change concrete bitmap. Also, to prevent such mistakes in future, old functions bdrv_(set,reset)_dirty are made static, for internal block usage. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com> CC: John Snow <jsnow@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: Denis V. Lunev <den@openvz.org> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1417081246-3593-1-git-send-email-vsementsov@parallels.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-01-13 11:47:56 +00:00
Max Reitz	291680186f	block: Relative backing file for image creation Relative backing filenames are always relative to the backed image's directory; the same applies to image creation. Therefore, if the backing file has to be opened for determining its size (in case the size has not been explicitly specified) its filename should be interpreted relative to the new image's base directory and not relative to qemu's working directory. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-13 11:47:56 +00:00
Max Reitz	9f07429e88	block: JSON filenames and relative backing files When using a relative backing file name, qemu needs to know the directory of the top image file. For JSON filenames, such a directory cannot be easily determined (e.g. how do you determine the directory of a qcow2 BDS directly on top of a quorum BDS?). Therefore, do not allow relative filenames for the backing file of BDSs only having a JSON filename. Furthermore, BDS::exact_filename should be used whenever possible. If BDS::filename is not equal to BDS::exact_filename, the former will always be a JSON object. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-13 11:47:56 +00:00
Max Reitz	0a82855a1a	block: Get full backing filename from string Introduce bdrv_get_full_backing_filename_from_filename(), a function which takes the name of the backed file and a potentially relative backing filename to produce the full (absolute) backing filename. Use this function from bdrv_get_full_backing_filename(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-13 11:47:56 +00:00
Paolo Bonzini	e012b78cf5	block: do not allocate an iovec per read of a growable/zero_after_eof BDS Most reads do not go past the end of the file, and they can use the input QEMUIOVector instead of creating one. This removes the qemu_iovec_* functions from the profile. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-13 11:47:56 +00:00
Fam Zheng	43c5d8f800	block: Don't add trailing space in "Formating..." message Change the message printing code to output a separator for each option string before it instead of after, then we don't one more extra ' ' in the end. To update qemu-iotests output files, most of the times one would just copy the .out.bad to .out. With this change we will not have the space disliked by checkpatch.pl. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-id: 1418110684-19528-3-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-12-12 16:52:33 +00:00
Max Reitz	5c98415b2a	vmdk: Fix error for JSON descriptor file names If vmdk blindly tries to use path_combine() using bs->file->filename as the base file name, this will result in a bad error message for JSON file names when calling bdrv_open(). It is better to only try bs->file->exact_filename; if that is empty, bs->file->filename will be useless for path_combine() and an error should be emitted (containing bs->file->filename because desc_file_path (which is bs->file->exact_filename) is empty). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1417615043-26174-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-12-12 13:14:10 +00:00
Max Reitz	c614972408	block: Check create_opts before image creation If a driver supports image creation, it needs to set the .create_opts field. We can use that to make sure .create_opts for both drivers involved is not NULL in bdrv_img_create(), which is important so that the create_opts pointer in that function is not NULL after the qemu_opts_append() calls and when going into qemu_opts_create(). Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:20 +01:00
Max Reitz	ef8104378c	block: Omit bdrv_find_format for essential drivers We can always assume raw, file and qcow2 being available; so do not use bdrv_find_format() to locate their BlockDriver objects but statically reference the respective objects. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:19 +01:00
Kevin Wolf	c5f6e493bb	block: Don't probe for unknown backing file format If a qcow2 image specifies a backing file format that doesn't correspond to any format driver that qemu knows, we shouldn't fall back to probing, but simply error out. Not looking up the backing file driver in bdrv_open_backing_file(), but just filling in the "driver" option if it isn't there moves us closer to the goal of having everything in QDict options and gets us the error handling of bdrv_open(), which correctly refuses unknown drivers. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1416935562-7760-4-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:13 +01:00
Kevin Wolf	38f3ef574b	raw: Prohibit dangerous writes for probed images If the user neglects to specify the image format, QEMU probes the image to guess it automatically, for convenience. Relying on format probing is insecure for raw images (CVE-2008-2004). If the guest writes a suitable header to the device, the next probe will recognize a format chosen by the guest. A malicious guest can abuse this to gain access to host files, e.g. by crafting a QCOW2 header with backing file /etc/shadow. Commit `1e72d3b` (April 2008) provided -drive parameter format to let users disable probing. Commit `f965509` (March 2009) extended QCOW2 to optionally store the backing file format, to let users disable backing file probing. QED has had a flag to suppress probing since the beginning (2010), set whenever a raw backing file is assigned. All of these additions that allow to avoid format probing have to be specified explicitly. The default still allows the attack. In order to fix this, commit `79368c8` (July 2010) put probed raw images in a restricted mode, in which they wouldn't be able to overwrite the first few bytes of the image so that they would identify as a different image. If a write to the first sector would write one of the signatures of another driver, qemu would instead zero out the first four bytes. This patch was later reverted in commit `8b33d9e` (September 2010) because it didn't get the handling of unaligned qiov members right. Today's block layer that is based on coroutines and has qiov utility functions makes it much easier to get this functionality right, so this patch implements it. The other differences of this patch to the old one are that it doesn't silently write something different than the guest requested by zeroing out some bytes (it fails the request instead) and that it doesn't maintain a list of signatures in the raw driver (it calls the usual probe function instead). Note that this change doesn't introduce new breakage for false positive cases where the guest legitimately writes data into the first sector that matches the signatures of an image format (e.g. for nested virt): These cases were broken before, only the failure mode changes from corruption after the next restart (when the wrong format is probed) to failing the problematic write request. Also note that like in the original patch, the restrictions only apply if the image format has been guessed by probing. Explicitly specifying a format allows guests to write anything they like. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1416497234-29880-8-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:13 +01:00
Kevin Wolf	7cddd3728e	block: Read only one sector for format probing The only image format driver that even potentially accesses anything after 512 bytes in its bdrv_probe() implementation is VMDK, which reads a plain-text descriptor file. In practice, the field it's looking for seems to come first and will be well within the first 512 bytes, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1416497234-29880-7-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:12 +01:00
Markus Armbruster	c6684249fd	block: Factor bdrv_probe_all() out of find_image_format() Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1416497234-29880-6-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:12 +01:00
Fam Zheng	20a9e77dfa	block: Add bdrv_get_node_name This returns the node name of a BDS. Remove the TODO comment and expect the callers to be explicit. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:25:29 +01:00
Fam Zheng	04df765ab4	block: Add bdrv_next_node Similar to bdrv_next, this traverses through graph_bdrv_states. Will be useful to enumerate all the named nodes. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:25:29 +01:00
Fam Zheng	f3a9cfddae	block: Fix max nb_sectors in bdrv_make_zero In bdrv_rw_co we report -EINVAL for nb_sectors > INT_MAX / BDRV_SECTOR_SIZE, so a caller shouldn't exceed it. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-id: 1415603264-21497-1-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-14 09:20:35 +00:00
Peter Maydell	776346cd63	trivial patches for 2014-11-11 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJUYh9vAAoJEL7lnXSkw9fbgPQH/065L5+SpaJR1Nte9Lz3N2s1 a6tGSI22yu85tKvYCdYjeoVHSkSTyR57FdTfUd2xc2QPj+J4sWXpA81KILBGTJUp NMpmLpWg4LOh8Ek4ViRgmFFdryzIFa4dT4gc1AcSAIAQ6jsgK1dM7m5kfncC3TN0 TUs248vJ2i/DaE0k8TOeJqxJTqInoFttlJEqG7RD+V5JznokE4zpFNXHDGx9BptE W2J38GJ/TKRPe9UrHMKZI1r6+ZBdXyE/CaqsNNKLJdqrHgSQuAyK/PS6dQbM4BLg M1qdP7Tp0wOlvv9qoEZMOEiUsi54XPqLgaLMbW74Yp5X459fqmLW2imy49pHXt8= =klsW -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mjt/tags/pull-trivial-patches-2014-11-11' into staging trivial patches for 2014-11-11 # gpg: Signature made Tue 11 Nov 2014 14:38:39 GMT using RSA key ID A4C3D7DB # gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>" # gpg: aka "Michael Tokarev <mjt@corpit.ru>" # gpg: aka "Michael Tokarev <mjt@debian.org>" * remotes/mjt/tags/pull-trivial-patches-2014-11-11: block: Fix comment for bdrv_co_get_block_status sysbus: Correct SYSTEM_BUS(obj) defines target-i386: cpu: keeping function parameters alignment on new line xen-hvm: Remove redundant variable 'xstate' coroutine-sigaltstack: Change jmp_buf to sigjmp_buf pc-bios: petalogix-s3adsp1800.dtb: Use 'xlnx, xps-ethernetlite-2.00.a' instead of 'xlnx, xps-ethernetlite-2.00.b' gdbstub: Add a missing case of signal number translation in gdbstub numa: make 'info numa' take into account hotplugged memory slirp/smbd: modify/set several parameters in generated smbd.conf qemu-doc.texi: fix typos in x509 examples icc_bus: fix typo ICC_BRIGDE -> ICC_BRIDGE Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2014-11-11 14:50:10 +00:00
Fam Zheng	705be728c0	block: Fix comment for bdrv_co_get_block_status It returns more information than binary, fix the comment. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2014-11-11 17:36:19 +03:00
Max Reitz	e56934bece	block: Propagate error in bdrv_img_create() If the specified backing file could not be opened, do not generate a new error message which contains the message which has been generated by bdrv_open(), but just propagate the latter. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-11-06 12:45:47 +01:00
Stefan Hajnoczi	5a7e7a0bad	block: let mirror blockjob run in BDS AioContext The mirror block job must run in the BlockDriverState AioContext so that it works with dataplane. Acquire the AioContext in blockdev.c so starting the block job is safe. Note that to_replace is treated separately from other BlockDriverStates in that it does not need to be in the same AioContext. Explicitly acquire/release to_replace's AioContext when accessing it. The completion code in block/mirror.c must perform BDS graph manipulation and bdrv_reopen() from the main loop. Use block_job_defer_to_main_loop() to achieve that. The bdrv_drain_all() call is not allowed outside the main loop since it could lead to lock ordering problems. Use bdrv_drain(bs) instead because we have acquired the AioContext so nothing else can sneak in I/O. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1413889440-32577-10-git-send-email-stefanha@redhat.com	2014-11-03 11:41:49 +00:00
Stefan Hajnoczi	5b98db0ad3	block: add bdrv_drain() Now that op blockers are in use, we can ensure that no other sources are generating I/O on a BlockDriverState. Therefore it is possible to drain requests for a single BDS. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1413889440-32577-7-git-send-email-stefanha@redhat.com	2014-11-03 11:41:49 +00:00
Max Reitz	7748543420	block: Add status callback to bdrv_amend_options() Depending on the changed options and the image format, bdrv_amend_options() may take a significant amount of time. In these cases, a way to be informed about the operation's status is desirable. Since the operation is rather complex and may fundamentally change the image, implementing it as AIO or a coroutine does not seem feasible. On the other hand, implementing it as a block job would be significantly more difficult than a simple callback and would not add benefits other than progress report to the amending operation, because it should not actually be run as a block job at all. A callback may not be very pretty, but it's very easy to implement and perfectly fits its purpose here. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414404776-4919-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:48 +00:00
Peter Maydell	573742a543	block.c: Fix type of IoOperationType variable in send_qmp_error_event() The local variable 'ac' in send_qmp_error_event() is declared with the wrong type, which causes clang to complain when it is initialized and again when it is used: block.c:3655:20: warning: implicit conversion from enumeration type 'enum IoOperationType' to different enumeration type 'BlockErrorAction' (aka 'enum BlockErrorAction') [-Wenum-conversion] ac = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE; ~ ^~~~~~~~~~~~~~~~~~~~~~ block.c:3655:45: warning: implicit conversion from enumeration type 'enum IoOperationType' to different enumeration type 'BlockErrorAction' (aka 'enum BlockErrorAction') [-Wenum-conversion] ac = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE; ~ ^~~~~~~~~~~~~~~~~~~~~~~ block.c:3656:62: warning: implicit conversion from enumeration type 'BlockErrorAction' (aka 'enum BlockErrorAction') to different enumeration type 'IoOperationType' (aka 'enum IoOperationType') [-Wenum-conversion] qapi_event_send_block_io_error(bdrv_get_device_name(bs), ac, action, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~ Correct the type to IoOperationType, and rename the variable to 'optype' to match its correct type. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com> Message-id: 1412969583-21045-1-git-send-email-peter.maydell@linaro.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 09:48:41 +00:00
Peter Lieven	6c5a42ac34	block: avoid creating oversized writes in multiwrite_merge Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 09:48:41 +00:00
Peter Lieven	2647fab57d	BlockLimits: introduce max_transfer_length Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 09:48:41 +00:00
Max Reitz	59c9a95fd2	block: Respect underlying file's EOF When falling through to the underlying file in bdrv_co_get_block_status(), if it returns that the query offset is beyond the file end (by setting *pnum to 0), return the range to be zero and do not let the number of sectors for which information could be obtained be overwritten. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:02 +02:00
Max Reitz	9ebd844805	block: Add qemu_{,try_}blockalign0() These functions call their non-0-counterparts and then fill the allocated buffer with 0 (if the allocation has been successful). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Markus Armbruster	a7f53e26a6	block: Lift device model API into BlockBackend Move device model attachment / detachment and the BlockDevOps device model callbacks and their wrappers from BlockDriverState to BlockBackend. Wrapper calls in block.c change from bdrv_dev_FOO_cb(bs, ...) to if (bs->blk) { bdrv_dev_FOO_cb(bs->blk, ...); } No change, because both bdrv_dev_change_media_cb() and bdrv_dev_resize_cb() do nothing when no device model is attached, and a device model can be attached only when bs->blk. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 14:03:50 +02:00
Markus Armbruster	097310b53e	block: Rename BlockDriverCompletionFunc to BlockCompletionFunc I'll use it with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:27 +02:00
Markus Armbruster	7c84b1b831	block: Rename BlockDriverAIOCB* to BlockAIOCB* I'll use BlockDriverAIOCB with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:27 +02:00
Markus Armbruster	7f06d47eff	block: Merge BlockBackend and BlockDriverState name spaces BlockBackend's name space is separate only to keep the initial patches simple. Time to merge the two. Retain bdrv_find() and bdrv_get_device_name() for now, to keep this series manageable. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	bfb197e0d9	block: Eliminate BlockDriverState member device_name[] device_name[] can become non-empty only in bdrv_new_root() and bdrv_move_feature_fields(). The latter is used only to undo damage done by bdrv_swap(). The former is called only by blk_new_with_bs(). Therefore, when a BlockDriverState's device_name[] is non-empty, then it's been created with a BlockBackend, and vice versa. Furthermore, blk_new_with_bs() keeps the two names equal. Therefore, device_name[] is redundant. Eliminate it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	fea68bb6e9	block: Eliminate bdrv_iterate(), use bdrv_next() Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	18e46a033d	block: Connect BlockBackend and DriveInfo Make the BlockBackend own the DriveInfo. Change blockdev_init() to return the BlockBackend instead of the DriveInfo. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	7e7d56d9e0	block: Connect BlockBackend to BlockDriverState Convenience function blk_new_with_bs() creates a BlockBackend with its BlockDriverState. Callers have to unref both. The commit after next will relieve them of the need to unref the BlockDriverState. Complication: due to the silly way drive_del works, we need a way to hide a BlockBackend, just like bdrv_make_anon(). To emphasize its "special" status, give the function a suitably off-putting name: blk_hide_on_behalf_of_do_drive_del(). Unfortunately, hiding turns the BlockBackend's name into the empty string. Can't avoid that without breaking the blk->bs->device_name equals blk->name invariant. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	e4e9986b1c	block: Split bdrv_new_root() off bdrv_new() Creating an anonymous BDS can't fail. Make that obvious. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Alexey Kardashevskiy	7ea2d269cb	block/migration: Disable cache invalidate for incoming migration When migrated using libvirt with "--copy-storage-all", at the end of migration there is race between NBD mirroring task trying to do flush and migration completion, both end up invalidating cache. Since qcow2 driver does not handle this situation very well, random crashes happen. This disables the BDRV_O_INCOMING flag for the block device being migrated once the cache has been invalidated. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> -- fixed parens by hand Signed-off-by: Juan Quintela <quintela@redhat.com>	2014-10-14 09:35:21 +02:00
Markus Armbruster	f5bebbbb28	util: Emancipate id_wellformed() from QemuOpts IDs have long spread beyond QemuOpts: not everything with an ID necessarily goes through QemuOpts. Commit `9aebf3b` is about such a case: block layer names are meant to be well-formed IDs, but some of them don't go through QemuOpts, and thus weren't checked. The commit fixed that the straightforward way: rename the internal QemuOpts helper id_wellformed() to qemu_opts_id_wellformed() and give it external linkage. Instead of using it directly in block.c, the commit adds wrapper bdrv_is_valid_name(), probably to hide the connection to QemuOpts. Go one logical step further: emancipate IDs from QemuOpts. Rename the function back to id_wellformed(), and put it in another file. While there, clean up its value to bool. Peel off the bdrv_is_valid_name() wrapper. [Replaced stray return 0 with return false to match bool returns used elsewhere in id_wellformed(). --Stefan] Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-10-03 10:30:33 +01:00
Kevin Wolf	9aebf3b892	block: Validate node-name The device_name of a BlockDriverState is currently checked because it is always used as a QemuOpts ID and qemu_opts_create() checks whether such IDs are wellformed. node-name is supposed to share the same namespace, but it isn't checked currently. This patch adds explicit checks both for device_name and node-name so that the same rules will still apply even if QemuOpts won't be used any more at some point. qemu-img used to use names with spaces in them, which isn't allowed any more. Replace them with underscores. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-25 15:24:32 +02:00
Markus Armbruster	d224469d87	block: Improve message for device name clashing with node name Suggested-by: Benoit Canet <benoit.canet@nodalink.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-25 15:24:14 +02:00
Markus Armbruster	3ae59580a0	block: Keep DriveInfo alive until BlockDriverState dies If the BDS's refcnt > 0, drive_del() destroys the DriveInfo, but not the BDS. This can happen in three places: * Device model destruction during unplug: blockdev_auto_del() * Xen IDE unplug: pci_piix3_xen_ide_unplug() * drive_del command when no device model is attached: do_drive_del() The other callers of drive_del are on error paths where refcnt == 1. If the user somehow manages to plug in a device model using a BDS that has gone through drive_del(), the legacy configuration passed in DriveInfo doesn't reach the device model, and automatic deletion on unplug doesn't work. Worse, some device models such as scsi-disk crash when DriveInfo doesn't exist. This is theoretical; I didn't research an actual reproducer. The problem was introduced when we replaced DriveInfo reference counting by BDS reference counting in commit a94a3fa..fa510eb. Fix by keeping DriveInfo alive until its BDS dies. This affects qemu_drive_opts: now you can't reuse the same ID for new drive options until the BDS dies. Before, you could, but since the code always attempts to create a BDS with the same ID next, the enclosing operation "create a new drive" failed anyway. Different error path, same result. Unfortunately, the fix involves use of blockdev.c stuff from block.c, which is a layering violation. Fortunately, my forthcoming BlockBackend work will get rid of it again. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-25 15:24:14 +02:00
Fam Zheng	8007429a99	block: Rename qemu_aio_release -> qemu_aio_unref Suggested-by: Benoît Canet <benoit.canet@irqsave.net> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:17 +01:00
Fam Zheng	ca5fd113b8	block: Drop AIOCBInfo.cancel Now that all the implementations are converted to asynchronous version and we can emulate synchronous cancellation with it. Let's drop the unused member. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:16 +01:00
Fam Zheng	f600ac1902	block: Drop bdrv_em_aiocb_info.cancel Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:00 +01:00
Fam Zheng	3acabd685e	block: Drop bdrv_em_co_aiocb_info.cancel Also drop the now unused ->done pointer. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:38:59 +01:00
Fam Zheng	02c50efe08	block: Add bdrv_aio_cancel_async This is the async version of bdrv_aio_cancel, which doesn't block the caller. It guarantees that the cb is called either before returning or some time later. bdrv_aio_cancel can base on bdrv_aio_cancel_async, later we can convert all .io_cancel implementations to .io_cancel_async, and the aio_poll is the common logic. In the end, .io_cancel can be dropped. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:38:58 +01:00
Fam Zheng	f197fe2b2c	block: Add refcnt in BlockDriverAIOCB This will be useful in synchronous cancel emulation with bdrv_aio_cancel_async. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:38:57 +01:00
Luiz Capitulino	624ff5736e	block: extend BLOCK_IO_ERROR with reason string BLOCK_IO_ERROR events are logged by libvirt, which helps with post mortem analysis of guests. However, one information that we miss today is a human readable string describing the cause of the I/O error. This commit adds that string it to BLOCK_IO_ERROR. Note that this string is a debugging aid for humans, meaning that it should not parsed by applications. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-11 17:14:13 +02:00
Benoît Canet	5366d0c8bc	block: Make the block accounting functions operate on BlockAcctStats This is the next step for decoupling block accounting functions from BlockDriverState. In a future commit the BlockAcctStats structure will be moved from BlockDriverState to the device models structures. Note that bdrv_get_stats was introduced so device models can retrieve the BlockAcctStats structure of a BlockDriverState without being aware of it's layout. This function should go away when BlockAcctStats will be embedded in the device models structures. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Keith Busch <keith.busch@intel.com> CC: Anthony Liguori <aliguori@amazon.com> CC: "Michael S. Tsirkin" <mst@redhat.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: Peter Maydell <peter.maydell@linaro.org> CC: Michael Tokarev <mjt@tls.msk.ru> CC: John Snow <jsnow@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Alexander Graf <agraf@suse.de> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Benoît Canet	5e5a94b605	block: Extract the block accounting code The plan is to add new accounting metrics (latency, invalid requests, failed requests, queue depth) and block.c is overpopulated so it will be better to work in a separate module. Moreover the long term plan is to have statistics in each of the BDS of the graph for metrology purpose; this means that the device model statistics must move from the topmost BDS to the device model. So we need to decouple the statistic code from BlockDriverState. This is another argument for the extraction of the code in a separate module. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: Benoit Canet <benoit@irqsave.net> CC: Fam Zheng <famz@redhat.com> CC: Peter Crosthwaite <peter.crosthwaite@xilinx.com> CC: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Benoît Canet	0ddd0ad96a	block: Extract the BlockAcctStats structure Extract the block accounting statistics into a structure so the block device models can hold them in the future. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Eric Blake <eblake@redhat.com> Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Luiz Capitulino	c7c2ff0c7e	block: extend BLOCK_IO_ERROR event with nospace indicator Management software, such as RHEV's vdsm, want to be able to allocate disk space on demand. The basic use case is to start a VM with a small disk and then the disk is enlarged when QEMU hits a ENOSPC condition. To this end, the management software has to be notified when QEMU encounters ENOSPC. The solution implemented by this commit is simple: it extends the BLOCK_IO_ERROR with a 'nospace' key, which is true when QEMU is stopped due to ENOSPC. Note that support for querying this event is already present in query-block by means of the 'io-status' key. Also, the new 'nospace' BLOCK_IO_ERROR field shares the same semantics with 'io-status', which basically means that werror= has to be set to either 'stop' or 'enospc' to enable 'nospace'. Finally, this commit also updates the 'io-status' key doc in the schema with a list of supported device models. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Liu Yuan	6bb4515849	block: kill tail whitespace in block.c Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-08 11:12:42 +01:00
Stefan Hajnoczi	391827eb10	block: fix overlapping multiwrite requests When request A is a strict superset of request B: AAAAAAAA BBBB multiwrite_merge() merges them as follows: AABBBB The tail of request A should have been included: AABBBBAA This patch fixes data loss but this code path is probably rare. Since guests cannot assume ordering between in-flight requests, few applications submit overlapping write requests. Reported-by: Slava Pestov <sviatoslav.pestov@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-08-29 14:09:43 +01:00
Max Reitz	33384421b3	block: Add AIO context notifiers If a long-running operation on a BDS wants to always remain in the same AIO context, it somehow needs to keep track of the BDS changing its context. This adds a function for registering callbacks on a BDS which are called whenever the BDS is attached or detached from an AIO context. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-29 10:48:45 +01:00
Stefan Hajnoczi	ada4240103	block: sort formats alphabetically in bdrv_iterate_format() Format names are best consumed in alphabetical order. This makes human-readable output easy to produce. bdrv_iterate_format() already has an array of format strings. Sort them before invoking the iteration callback. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>	2014-08-28 13:42:25 +01:00
Max Reitz	91af701412	block: Add bdrv_refresh_filename() Some block devices may not have a filename in their BDS; and for some, there may not even be a normal filename at all. To work around this, add a function which tries to construct a valid filename for the BDS.filename field. If a filename exists or a block driver is able to reconstruct a valid filename (which is placed in BDS.exact_filename), this can directly be used. If no filename can be constructed, we can still construct an options QDict which is then converted to a JSON object and prefixed with the "json:" pseudo protocol prefix. The QDict is placed in BDS.full_open_options. For most block drivers, this process can be done automatically; those that need special handling may define a .bdrv_refresh_filename() method to fill BDS.exact_filename and BDS.full_open_options themselves. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 14:31:56 +02:00
Markus Armbruster	5839e53bbc	block: Use g_new() & friends where that makes obvious sense g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer, for two reasons. One, it catches multiplication overflowing size_t. Two, it returns T * rather than void , which lets the compiler catch more type errors. Patch created with Coccinelle, with two manual changes on top: Add const to bdrv_iterate_format() to keep the types straight * Convert the allocation in bdrv_drop_intermediate(), which Coccinelle inexplicably misses Coccinelle semantic patch: @@ type T; @@ -g_malloc(sizeof(T)) +g_new(T, 1) @@ type T; @@ -g_try_malloc(sizeof(T)) +g_try_new(T, 1) @@ type T; @@ -g_malloc0(sizeof(T)) +g_new0(T, 1) @@ type T; @@ -g_try_malloc0(sizeof(T)) +g_try_new0(T, 1) @@ type T; expression n; @@ -g_malloc(sizeof(T) * (n)) +g_new(T, n) @@ type T; expression n; @@ -g_try_malloc(sizeof(T) * (n)) +g_try_new(T, n) @@ type T; expression n; @@ -g_malloc0(sizeof(T) * (n)) +g_new0(T, n) @@ type T; expression n; @@ -g_try_malloc0(sizeof(T) * (n)) +g_try_new0(T, n) @@ type T; expression p, n; @@ -g_realloc(p, sizeof(T) * (n)) +g_renew(T, p, n) @@ type T; expression p, n; @@ -g_try_realloc(p, sizeof(T) * (n)) +g_try_renew(T, p, n) Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 11:51:28 +02:00
Max Reitz	908bcd540f	block: Catch !bs->drv in bdrv_check() qemu-img check calls bdrv_check() twice if the first run repaired some inconsistencies. If the first run however again triggered corruption prevention (on qcow2) due to very bad inconsistencies, bs->drv may be NULL afterwards. Thus, bdrv_check() should check whether bs->drv is set. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-15 15:07:16 +02:00
Kevin Wolf	857d4f46c3	block: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses bounce buffer allocations in block.c. While at it, convert bdrv_commit() from plain g_malloc() to qemu_try_blockalign(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:15 +02:00
Kevin Wolf	7d2a35cc92	block: Introduce qemu_try_blockalign() This function returns NULL instead of aborting when an allocation fails. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-08-15 15:07:15 +02:00
Jeff Cody	9a4d5ca607	block: allow bdrv_unref() to be passed NULL pointers If bdrv_unref() is passed a NULL BDS pointer, it is safe to exit with no operation. This will allow cleanup code to blindly call bdrv_unref() on a BDS that has been initialized to NULL. Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-15 15:07:14 +02:00
Stefan Hajnoczi	2a87151fb2	block: bump coroutine pool size for drives When a BlockDriverState is associated with a storage controller DeviceState we expect guest I/O. Use this opportunity to bump the coroutine pool size by 64. This patch ensures that the coroutine pool size scales with the number of drives attached to the guest. It should increase coroutine pool usage (which makes qemu_coroutine_create() fast) without hogging too much memory when fewer drives are attached. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2014-08-15 15:07:14 +02:00
Markus Armbruster	52bf1e722d	block: Avoid bdrv_get_geometry() where errors should be detected bdrv_get_geometry() hides errors. Use bdrv_nb_sectors() or bdrv_getlength() instead where that's obviously inappropriate. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:13 +02:00
Markus Armbruster	75d3d21f9e	block: Drop superfluous aligning of bdrv_getlength()'s value It returns a multiple of the sector size. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:13 +02:00
Markus Armbruster	57322b7811	block: Use bdrv_nb_sectors() where sectors, not bytes are wanted Instead of bdrv_getlength(). Aside: a few of these callers don't handle errors. I didn't investigate whether they should. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:13 +02:00
Markus Armbruster	30a7f2fc91	block: Use bdrv_nb_sectors() in bdrv_co_get_block_status() Instead of bdrv_getlength(). Replace variables length, length2 by total_sectors, nb_sectors2. Bonus: use total_sectors instead of the slightly unclean bs->total_sectors. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:13 +02:00
Markus Armbruster	4049082c4b	block: Use bdrv_nb_sectors() in bdrv_aligned_preadv() Instead of bdrv_getlength(). Eliminate variable len. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:13 +02:00
Markus Armbruster	d32f7c101b	block: Use bdrv_nb_sectors() in bdrv_make_zero() Instead of bdrv_getlength(). Variable target_size is initially in bytes, then changes meaning to sectors. Ugh. Replace by target_sectors. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:13 +02:00
Markus Armbruster	65a9bb25d6	block: New bdrv_nb_sectors() A call to retrieve the image size converts between bytes and sectors several times: * BlockDriver method bdrv_getlength() returns bytes. * refresh_total_sectors() converts to sectors, rounding up, and stores in total_sectors. * bdrv_getlength() converts total_sectors back to bytes (now rounded up to a multiple of the sector size). * Callers wanting sectors rather bytes convert it right back. Example: bdrv_get_geometry(). bdrv_nb_sectors() provides a way to omit the last two conversions. It's exactly bdrv_getlength() with the conversion to bytes omitted. It's functionally like bdrv_get_geometry() without its odd error handling. Reimplement bdrv_getlength() and bdrv_get_geometry() on top of bdrv_nb_sectors(). The next patches will convert some users of bdrv_getlength() to bdrv_nb_sectors(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:12 +02:00
Kevin Wolf	3baca89139	block: Add Error argument to bdrv_refresh_limits() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-07-18 13:18:43 +01:00
Kevin Wolf	8eb029c26e	block: Assert qiov length matches request length At least raw-posix relies on this because it can allocate bounce buffers based on the request length, but access it using all of the qiov entries later. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-07-14 12:03:20 +02:00
Kevin Wolf	33f461e0c5	block: Make qiov match the request size until EOF If a read request goes across EOF, the block driver sees a shortened request that stops at EOF (the rest is memsetted in block.c), however the original qiov was used for this request. This patch makes the qiov size match the request size, avoiding a potential buffer overflow in raw-posix. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2014-07-14 12:03:20 +02:00
Paolo Bonzini	b47ec2c456	block: prefer aio_poll to qemu_aio_wait Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-07-09 15:50:11 +02:00
Kevin Wolf	01fb2705bd	block: Fix bdrv_is_allocated() return value bdrv_is_allocated() should return either 0 or 1 in successful cases. We're lucky that currently, the callers that rely on this (e.g. because they check for ret == 1) don't seem to break badly. They just might skip some optimisation or in the case of qemu-io 'map' print separate lines where a single line would suffice. In theory, a wrong allocation status could lead to image corruption with certain operations, so let's fix this quickly. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2014-07-09 15:50:11 +02:00
Ming Lei	448ad91db4	block: block: introduce APIs for submitting IO as a batch This patch introduces three APIs so that following patches can support queuing I/O requests and submitting them as a batch for improving I/O performance. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-07-07 11:05:17 +02:00
Jeff Cody	54e2690090	block: extend block-commit to accept a string for the backing file On some image chains, QEMU may not always be able to resolve the filenames properly, when updating the backing file of an image after a block commit. For instance, certain relative pathnames may fail, or drives may have been specified originally by file descriptor (e.g. /dev/fd/???), or a relative protocol pathname may have been used. In these instances, QEMU may lack the information to be able to make the correct choice, but the user or management layer most likely does have that knowledge. With this extension to the block-commit api, the user is able to change the backing file of the overlay image as part of the block-commit operation. This allows the change to be 'safe', in the sense that if the attempt to write the overlay image metadata fails, then the block-commit operation returns failure, without disrupting the guest. If the commit top is the active layer, then specifying the backing file string will be treated as an error (there is no overlay image to modify in that case). If a backing file string is not specified in the command, the backing file string to use is determined in the same manner as it was previously. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-07-01 10:47:01 +02:00
Jeff Cody	5a6684d2b9	block: add helper function to determine if a BDS is in a chain This is a small helper function, to determine if 'base' is in the chain of BlockDriverState 'top'. It returns true if it is in the chain, and false otherwise. If either argument is NULL, it will also return false. Reviewed-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-07-01 10:47:01 +02:00

1 2 3 4 5 ...

852 Commits