Currently, if an error occurs during the part of vdi_create() which
actually writes the image, the function stores -errno, but continues
anyway.
Instead of trying to write data which (if it can be written at all) does
not make any sense without the operations before succeeding (e.g.,
writing the image header), just error out immediately.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently, seek_to_sector() returns -1 both for errors and unallocated
sectors, resulting in silent errors. As 0 is an invalid offset of data
clusters (bitmap_offset is greater than 0 because s->data_offset is
greater than 0), just return 0 for unallocated sectors and -errno in
case of error. This should then be propagated by bochs_read(), the sole
user of seek_to_sector().
That function also has a case of "return -1 in case of error", which is
fixed by this patch as well.
bochs_read() is called by bochs_co_read() which passes the return value
through, therefore it is indeed correct for bochs_read() to return
-errno.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
First, new_l1_size is an int64_t, whereas min_size is a uint64_t.
Therefore, during the loop which adjusts new_l1_size until it equals or
exceeds min_size, new_l1_size might overflow and become negative. The
comparison in the loop condition however will take it as an unsigned
value (because min_size is unsigned) and therefore recognize it as
exceeding min_size. Therefore, the loop is left with a negative
new_l1_size, which is not correct. This could be fixed by making
new_l1_size uint64_t.
On the other hand, however, by doing this, the while loop may take
forever. If min_size is e.g. UINT64_MAX, it will take new_l1_size
probably multiple overflows to reach the exact same value (if it reaches
it at all). Then, right after the loop, new_l1_size will be recognized
as being too big anyway.
Both problems require a ridiculously high min_size value, which is very
unlikely to occur; but both problems are also simply avoided by checking
whether min_size is sane before calculating new_l1_size (which should
still be checked separately, though).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The call to bdrv_getlength() from qcow2_check_refcounts() may result in
an error. Check this and abort if necessary.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of blindly relying on a normal integer having a width of 32 bits
(which is a pretty good assumption, but we should not rely on it if
there is no need), use the correct format string macros.
This does not touch DEBUG output.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
alloc_clusters_noref() stores the cluster index in a uint64_t. However,
offsets are often represented as int64_t (as for example the return
value of alloc_clusters_noref() itself demonstrates). Therefore, we
should make sure all offsets in the allocated range of clusters are
representable using int64_t without overflows.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently, bdrv_image_info_specific_dump() uses an error variable for
visit_type_ImageInfoSpecific, but ignores the result. As this function
is used here with an output visitor to transform the ImageInfoSpecific
object to a generic QDict, an error should actually be impossible. It is
however better to assert that this is indeed the case. This is done by
this patch using error_abort instead of an unused local Error variable.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of having unlink() calls in the generic block layer, where we
aren't even guarateed to have a file name, move them to those block
drivers that are actually used and that always have a filename. Gets us
rid of some #ifdefs as well.
The patch also converts bs->is_temporary to a new BDRV_O_TEMPORARY open
flag so that it is inherited in the protocol layer and the raw-posix and
raw-win32 drivers can unlink the file.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
discard_single_l2() should not implement its own version of
qcow2_get_cluster_type(), but rather rely on this already existing
function. By doing so, it will work for compressed clusters as well
(which it did not so far).
Also, rename "old_offset" to "old_l2_entry", as both are quite different
(and the value is indeed of the latter kind).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_get_info could fail. Add check before using the returned value.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The direct return will skip releasing of all the resouces at
immediate_exit, don't miss that.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
depending on the target the opt_unmap_gran might be as low
as 4K. As we know use this also as a knob to activate the allocationmap
feature lower the barrier. The limit 4K (and not 512) is choosen
to avoid a potentially too big allocationmap.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
this patch implements a cache that tracks if a page on the
iscsi target is allocated or not. The cache is implemented in
a way that it allows for false positives
(e.g. pretending a page is allocated, but it isn't), but
no false negatives.
The cached allocation info is then used to speed up the
read process for unallocated sectors by issueing a GET_LBA_STATUS
request for all sectors that are not yet known to be allocated.
If the read request is confirmed to fall into an unallocated
range we directly return zeroes and do not transfer the
data over the wire.
Tests have shown that a relatively small amount of GET_LBA_STATUS
requests happens a vServer boot time to fill the allocation cache
(all those blocks are not queried again).
Not to transfer all the data of unallocated sectors saves a lot
of time, bandwidth and storage I/O load during block jobs or storage
migration and it saves a lot of bandwidth as well for any big sequential
read of the whole disk (e.g. block copy or speed tests) if a significant
number of blocks is unallocated.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Although bdrv_getlength() was just called above this, and checked for
error, it is better to just use the value we already get, and use
DIV_ROUND_UP.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
if the iscsi driver receives a write zeroes request with
the BDRV_REQ_MAY_UNMAP flag set it fails with -ENOTSUP
if the iscsi target does not support WRITE SAME with
UNMAP. However, the BDRV_REQ_MAY_UNMAP is only a hint
and writing zeroes with WRITE SAME will still be
better than falling back to writing zeroes with WRITE16.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* remotes/qmp-unstable/queue/qmp:
monitor: fix qmp_getfd() fd leak in error case
HMP: support specifying dump format for dump-guest-memory
HMP: fix doc of dump-guest-memory
qmp: object-add: Validate class before creating object
monitor: Add device_add and device_del completion.
monitor: Add command_completion callback to mon_cmd_t.
monitor: Fix drive_del id argument type completion.
error: Remove some unused headers
qerror.h: Replace QERR_NOT_SUPPORTED with QERR_UNSUPPORTED
qerror.h: Remove QERR defines that are only used once
qerror.h: Remove unused error classes
error: Print error_report() to stderr if using qmp
monitor: Remove unused monitor_print_filename
error: Privatize error_print_loc
vnc: Remove default_mon usage
slirp: Remove default_mon usage
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Using error_is_set(errp) that way can sweep programming errors under
the carpet when we get called incorrectly with an error set.
Commit 24d3bd6 added a broken error path to iscsi_do_inquiry(): it
first calls error_setg(), then jumps to the preexisting error label,
where error_setg() gets called again, triggering an assertion failure.
Commit cbee81f fixed this by guarding the second error_setg() with an
error_is_set().
Replace this fix by a simpler and safer one: jump right behind the
second error_setg().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Using error_is_set(errp) to check whether a function call failed is
fragile: it breaks when errp is null. Check perfectly suitable return
values instead when possible. errp can't be null there now, but this
is more robust and more obviously correct
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
error_is_set(&var) is the same as var != NULL, but it takes
whole-program analysis to figure that out. Unnecessarily hard for
optimizers, static checkers, and human readers. Commit 84d18f0 dumbed
it down to obvious, but a few more have crept in since, and
documentation was overlooked. Dumb these down, too.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Just hardcode them in the callers
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
PRIu32 is the format string specifier for uint32_t, let's use it.
Variables ->block_size, ->n_blocks, and i are all uint32_t.
Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds an errp parameter to bdrv_new() and updates all its
callers. The next patches will make use of this in order to check for
duplicate IDs. Most of the callers know that their ID is fine, so they
can simply assert that there is no error.
Behaviour doesn't change with this patch yet as bdrv_new() doesn't
actually assign errors to errp.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
This patch converts fprintf() calls to error_setg() in block/qed.c:bdrv_qed_create()
(error_setg() is part of error reporting API in include/qapi/error.h)
Signed-off-by: Aakriti Gupta <aakritty@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_getlength could fail, check the return value before using it.
Return NULL and set errno if it fails. Callers are updated to handle
the error case.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The old check was off by a factor of 512 and didn't consider cases where
we don't get an exact division. This could lead to an out-of-bounds
array access in seek_to_sector().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
When qcow2_get_cluster_offset() sees a zero cluster in a version 2
image, it (rightfully) returns an error. But in doing so it shouldn't
leak an L2 table cache reference.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
If lazy refcounts are enabled for a backing file, committing to this
backing file may leave it in a dirty state even if the commit succeeds.
The reason is that the bdrv_flush() call in bdrv_commit() doesn't flush
refcount updates with lazy refcounts enabled, and qcow2_reopen_prepare()
doesn't take care to flush metadata.
In order to fix this, this patch also fixes qcow2_mark_clean(), which
contains another ineffective bdrv_flush() call beause lazy refcounts are
disabled only afterwards. All existing callers of qcow2_mark_clean()
either don't modify refcounts or already flush manually, so that this
fixes only a latent, but not yet actually triggerable bug.
Another instance of the same problem is live snapshots. Again, a real
corruption is prevented by an explicit flush for non-read-only images in
external_snapshot_prepare(), but images using lazy refcounts stay dirty.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
This eliminates the possible assertion failure in error_setg().
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Max WRITE SAME length is also used when the UNMAP bit is zero, so it
should be queried even if LBPWS=0. Same for the optimal transfer
length.
However, the write_zeroes_alignment only matters for UNMAP=1 so we
still restrict it to LBPWS=1.
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Non-block SCSI devices do not support flushing, but we may still send
them requests via bdrv_flush_all. Just ignore them.
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some targets may return "invalid field" as the ASCQ from WRITE SAME
if they support the command only without the UNMAP field. Recognize
that, and return ENOTSUP just like for "invalid operation code".
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
preallocate() only links the first QCowL2Meta's data clusters into the
L2 table and ignores any chained QCowL2Metas in the linked list.
Chains of QCowL2Meta structs are built up when contiguous clusters span
L2 tables. Each QCowL2Meta describes one L2 table update. This is a
rare case in preallocate() but can happen.
This patch fixes preallocate() by iterating over the whole list of
QCowL2Metas. Compare with the qcow2_co_writev() function's
implementation, which is similar but also also handles request
dependencies. preallocate() only performs one allocation at a time so
there can be no dependencies.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This avoids a possible division by zero.
Convert s->tracks to unsigned as well because it feels better than
surviving just because the results of calculations with s->tracks are
converted to unsigned anyway.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The first test case would cause a huge memory allocation, leading to a
qemu abort; the second one to a too small malloc() for the catalog
(smaller than s->catalog_size), which causes a read-only out-of-bounds
array access and on big endian hosts an endianess conversion for an
undefined memory area.
The sample image used here is not an original Parallels image. It was
created using an hexeditor on the basis of the struct that qemu uses.
Good enough for trying to crash the driver, but not for ensuring
compatibility.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Even with a limit of 64k snapshots, each snapshot could have a filename
and an ID with up to 64k, which would still lead to pretty large
allocations, which could potentially lead to qemu aborting. Limit the
total size of the snapshot table to an average of 1k per entry when
the limit of 64k snapshots is fully used. This should be plenty for any
reasonable user.
This also fixes potential integer overflows of s->snapshot_size.
Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This avoids an unbounded allocation.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
For the L1 table to loaded for an internal snapshot, the code allocated
only enough memory to hold the currently active L1 table. If the
snapshot's L1 table is actually larger than the current one, this leads
to a buffer overflow.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The qcow2 code assumes that s->snapshots is non-NULL if s->nb_snapshots
!= 0. By having the initialisation of both fields separated in
qcow2_open(), any error occuring in between would cause the error path
to dereference NULL in qcow2_free_snapshots() if the image had any
snapshots.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
bs->total_sectors is not the highest possible sector number that could
be involved in a copy on write operation: VM state is after the end of
the virtual disk. This resulted in wrong values for the number of
sectors to be copied (n).
The code that checks for the end of the image isn't required any more
because the code hasn't been calling the block layer's bdrv_read() for a
long time; instead, it directly calls qcow2_readv(), which doesn't error
out on VM state sector numbers.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Both compressed and uncompressed I/O is buffered. dmg_open() calculates
the maximum buffer size needed from the metadata in the image file.
There is currently a buffer overflow since ->lengths[] is accounted
against the maximum compressed buffer size but actually uses the
uncompressed buffer:
switch (s->types[chunk]) {
case 1: /* copy */
ret = bdrv_pread(bs->file, s->offsets[chunk],
s->uncompressed_chunk, s->lengths[chunk]);
We must account against the maximum uncompressed buffer size for type=1
chunks.
This patch fixes the maximum buffer size calculation to take into
account the chunk type. It is critical that we update the correct
maximum since there are two buffers ->compressed_chunk and
->uncompressed_chunk.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The DMG metadata is stored as uint64_t, so use the same type for
sector_num. int was a particularly poor choice since it is only 32-bit
and would truncate large values.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Chunk length and sectorcount are used for decompression buffers as well
as the bdrv_pread() count argument. Ensure that they have reasonable
values so neither memory allocation nor conversion from uint64_t to int
will cause problems.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Use the right types instead of signed int:
size_t new_size;
This is a byte count for g_realloc() that is calculated from uint32_t
and size_t values.
uint32_t chunk_count;
Use the same type as s->n_chunks, which is used together with
chunk_count.
This patch is a cleanup and does not fix bugs.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
It is not necessary to check errno for EINTR and the block layer does
not produce short reads. Therefore we can drop the loop that attempts
to read a compressed chunk.
The loop is buggy because it incorrectly adds the transferred bytes
twice:
do {
ret = bdrv_pread(...);
i += ret;
} while (ret >= 0 && ret + i < s->lengths[chunk]);
Luckily we can drop the loop completely and perform a single
bdrv_pread().
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
When a terminator is reached the base for offsets and sectors is stored.
The following records that are processed will use this base value.
If the first record we encounter is a terminator, then calculating the
base values would result in out-of-bounds array accesses. Don't do
that.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Clean up the mix of tabs and spaces, as well as the coding style
violations in block/dmg.c. There are no semantic changes since this
patch simply reformats the code.
This patch is necessary before we can make meaningful changes to this
file, due to the inconsistent formatting and confusing indentation.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The size in bytes is assigned to an int later, so check that instead of
the number of entries.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In order to avoid integer overflows.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If the size becomes larger than what qcow2_open() would accept, fail the
growing operation.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This ensures that the checks catch all invalid cluster indexes
instead of returning the refcount of a wrong cluster.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
free_cluster_index is only correct if update_refcount() was called from
an allocation function, and even there it's brittle because it's used to
protect unfinished allocations which still have a refcount of 0 - if it
moves in the wrong place, the unfinished allocation can be corrupted.
So not using it any more seems to be a good idea. Instead, use the
first requested cluster to do the calculations. Return -EAGAIN if
unfinished allocations could become invalid and let the caller restart
its search for some free clusters.
The context of creating a snapsnot is one situation where
update_refcount() is called outside of a cluster allocation. For this
case, the change fixes a buffer overflow if a cluster is referenced in
an L2 table that cannot be represented by an existing refcount block.
(new_table[refcount_table_index] was out of bounds)
[Bump the qemu-iotests 026 refblock_alloc.write leak count from 10 to
11.
--Stefan]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
len could become negative and would pass the check then. Nothing bad
happened because bdrv_pread() happens to return an error for negative
length values, but make variables for sizes unsigned anyway.
This patch also changes the behaviour to error out on invalid lengths
instead of silently truncating it to 1023.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This avoids an unbounded allocation.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This avoid unbounded memory allocation and fixes a potential buffer
overflow on 32 bit hosts.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The end of the refcount table must not exceed INT64_MAX so that integer
overflows are avoided.
Also check for misaligned refcount table. Such images are invalid and
probably the result of data corruption. Error out to avoid further
corruption.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Limit the in-memory reference count table size to 8 MB, it's enough in
practice. This fixes an unbounded allocation as well as a buffer
overflow in qcow2_refcount_init().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Header, header extension and the backing file name must all be stored in
the first cluster. Setting the backing file to a much higher value
allowed header extensions to become much bigger than we want them to be
(unbounded allocation).
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This fixes an unbounded allocation for s->unknown_header_fields.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
curl_read_cb is callback function for libcurl when data arrives. The
data size passed in here is not guaranteed to be within the range of
request we submitted, so we may overflow the guest IO buffer. Check the
real size we have before memcpy to buffer to avoid overflow.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Other variables (e.g. sectors_per_block) are calculated using these
variables, and if not range-checked illegal values could be obtained
causing infinite loops and other potential issues when calculating
BAT entries.
The 1.00 VHDX spec requires BlockSize to be min 1MB, max 256MB.
LogicalSectorSize is required to be either 512 or 4096 bytes.
Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The maximum blocks_in_image is 0xffffffff / 4, which also limits the
maximum disk_size for a VDI image to 1024TB. Note that this is the maximum
size that QEMU will currently support with this driver, not necessarily the
maximum size allowed by the image format.
This also fixes an incorrect error message, a bug introduced by commit
5b7aa9b56d (Reported by Stefan Weil)
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This fixes some cases of division by zero crashes.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This adds checks to make sure that max_table_entries and block_size
are in sane ranges. Memory is allocated based on max_table_entries,
and block_size is used to calculate indices into that allocated
memory, so if these values are incorrect that can lead to potential
unbounded memory allocation, or invalid memory accesses.
Also, the allocation of the pagetable is changed from g_malloc0()
to qemu_blockalign().
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
32 bit truncation could let us access the wrong offset in the image.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This fixes two possible division by zero crashes: In bochs_open() and in
seek_to_sector().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
It should neither become negative nor allow unbounded memory
allocations. This fixes aborts in g_malloc() and an s->catalog_bitmap
buffer overflow on big endian hosts.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Gets us rid of integer overflows resulting in negative sizes which
aren't correctly checked.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This is an on-disk structure, so offsets must be accurate.
Before this patch, sizeof(bochs) != sizeof(header_v1), which makes the
memcpy() between both invalid. We're lucky enough that the destination
buffer happened to be the larger one, and the memcpy size to be taken
from the smaller one, so we didn't get a buffer overflow in practice.
This patch unifies the both structures, eliminating the need to do a
memcpy in the first place. The common fields are extracted to the top
level of the struct and the actually differing part gets a union of the
two versions.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
cloop stores the number of compressed blocks in the n_blocks header
field. The file actually contains n_blocks + 1 offsets, where the extra
offset is the end-of-file offset.
The following line in cloop_read_block() results in an out-of-bounds
offsets[] access:
uint32_t bytes = s->offsets[block_num + 1] - s->offsets[block_num];
This patch allocates and loads the extra offset so that
cloop_read_block() works correctly when the last block is accessed.
Notice that we must free s->offsets[] unconditionally now since there is
always an end-of-file offset.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The offsets[] array allows efficient seeking and tells us the maximum
compressed data size. If the offsets are bogus the maximum compressed
data size will be unrealistic.
This could cause g_malloc() to abort and bogus offsets mean the image is
broken anyway. Therefore we should refuse such images.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Limit offsets_size to 512 MB so that:
1. g_malloc() does not abort due to an unreasonable size argument.
2. offsets_size does not overflow the bdrv_pread() int size argument.
This limit imposes a maximum image size of 16 TB at 256 KB block size.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The following integer overflow in offsets_size can lead to out-of-bounds
memory stores when n_blocks has a huge value:
uint32_t n_blocks, offsets_size;
[...]
ret = bdrv_pread(bs->file, 128 + 4, &s->n_blocks, 4);
[...]
s->n_blocks = be32_to_cpu(s->n_blocks);
/* read offsets */
offsets_size = s->n_blocks * sizeof(uint64_t);
s->offsets = g_malloc(offsets_size);
[...]
for(i=0;i<s->n_blocks;i++) {
s->offsets[i] = be64_to_cpu(s->offsets[i]);
offsets_size can be smaller than n_blocks due to integer overflow.
Therefore s->offsets[] is too small when the for loop byteswaps offsets.
This patch refuses to open files if offsets_size would overflow.
Note that changing the type of offsets_size is not a fix since 32-bit
hosts still only have 32-bit size_t.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Avoid unbounded s->uncompressed_block memory allocation by checking that
the block_size header field has a reasonable value. Also enforce the
assumption that the value is a non-zero multiple of 512.
These constraints conform to cloop 2.639's code so we accept existing
image files.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The mirror blockjob coroutine rate-limits itself by sleeping. The
coroutine also performs I/O asynchronously so it's important that the
aio callback doesn't wake the coroutine early as that breaks
rate-limiting.
Reported-by: Joaquim Barrera <jbarrera@ac.upc.edu>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The throttling delay calculation was using an inaccurate sector count to
calculate the time to sleep. This broke rate-limiting for the block
mirror job.
Move the delay calculation into mirror_iteration() where we know how
many sectors were transferred. This lets us calculate an accurate delay
time.
Reported-by: Joaquim Barrera <jbarrera@ac.upc.edu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
if an NFS operation fails we should report what libnfs knows
about the failure. It is likely more than just an error code.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If qcow2_alloc_clusters() fails, new_offset and ret will both be
negative after the fail label, thus passing the first if condition and
subsequently resulting in a call of qcow2_free_clusters() with an
invalid (negative) offset parameter. Fix this by introducing a new label
"fail_free_cluster" which is only invoked if new_offset is indeed
pointing to a newly allocated cluster that should be cleaned up by
freeing it.
While we're at it, clean up the whole fail path. qcow2_cache_put()
should (and actually can) never fail, hence the return value can safely
be ignored (aside from asserting that it indeed did not fail).
Furthermore, there is no reason to give QCOW2_DISCARD_ALWAYS to
qcow2_free_clusters(), a mere QCOW2_DISCARD_OTHER will suffice.
Ultimately, rename the "fail" label to "done", as it is invoked both on
failure and success.
Suggested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Contrary to the comment describing this function's behavior, it does not
return 0 on success, but rather the offset of the newly allocated
cluster. This patch adjusts the comment accordingly to reflect the
actual behavior.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If it returns an error, the migrated VM will not be started, but qemu
exits with an error message.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
nbd_receive_reply() is called by the event loop whenever data is
available or the socket has been closed by the remote side.
This patch closes the socket when an error occurs to prevent the
nbd_receive_reply() handler from being called indefinitely after the
connection has failed.
Note that we were already correctly returning EIO for pending requests
but leaving the nbd_receive_reply() handler registered resulted in high
CPU consumption and a flood of error messages.
Reuse nbd_teardown_connection() to close the socket.
Reported-by: Zhifeng Cai <bluewindow@h3c.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
On 32-bit hosts, some compilers will warn on too large integer constants
for constants that are 64-bit in length. Explicitly put a 'ULL' suffix
on those defines.
Reported-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The "host_device" protocol driver should strip the "host_device:" prefix
from filenames if present.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The hdev_create() implementation in block/raw-posix.c is used by the
"host_device", "host_cdrom" and "host_floppy" protocol block drivers
together. Thus, any of the associated prefixes may occur and exactly one
should should be stripped, if it does (thus,
"host_device:host_cdrom:/dev/cdrom" is not shortened to "/dev/cdrom").
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The "host_cdrom" protocol drivers should strip the "host_cdrom:" prefix
from filenames if present.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The "host_floppy" protocol driver should strip the "host_floppy:" prefix
from filenames if present.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The "host_device" protocol driver should strip the "host_device:" prefix
from filenames if present.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
qcow2_open() causes writes when repairing an image with the dirty flag
set and when clearing autoclear flags. It shouldn't do this when another
qemu instance is still actively working on this image file.
One effect of the bug is that images may have a cleared dirty flag while
the migration source host still has it in use with lazy refcounts
enabled, so refcounts are not accurate and the dirty flag must remain
set.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Instead of manually building a list of all options from BDRVQcowState
values just reuse the options that were used to open the image.
qcow2_open() won't fully use all of the options in the QDict, but that's
okay.
This fixes all of the driver-specific options in qcow2, except for
lazy-refcounts, which was special cased before.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This patch keep the recursive way of doing things but simplify it by giving
two responsabilities to all block filters implementors.
They will need to do two things:
-Set the is_filter field of their block driver to true.
-Implement the bdrv_recurse_is_first_non_filter method of their block driver like
it is done on the Quorum block driver. (block/quorum.c)
[Paolo Bonzini <pbonzini@redhat.com> pointed out that this patch changes
the semantics of blkverify, which now recurses down both bs->file and
s->test_file.
-- Stefan]
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Before dereferencing bs->drv for a call to its member bdrv_co_readv(),
copy_sectors() should check whether that pointer is indeed valid, since
it may have been set to NULL by e.g. a concurrent write triggering the
corruption prevention mechanism.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
After migration has completed, we call bdrv_invalidate_cache() so that
drivers which cache some data drop their stale copy of the data and
reread it from the image file to get a new version of data that the
source modified while the migration was running.
Reloading metadata from the image file is useless, though, if the size
of the image file stays stale (this is a value that is cached for all
image formats in block.c). Reads from (meta)data after the old EOF
return only zeroes, causing image corruption.
We need to update bs->total_sectors in all layers that could potentially
have changed their size (i.e. backing files are not a concern - if they
are changed, we're in bigger trouble)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
When reading the refcount table entry in get_refcount(), only bits which
are actually significant for the refcount block offset should be taken
into account.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The bdrv_create() implementation of the block/raw-win32 "file" protocol
driver should strip the "file:" prefix from filenames if present.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The "file" protocol driver should strip the "file:" prefix from
filenames if present.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The bdrv_create() implementation of the block/raw-posix "file" protocol
driver should strip the "file:" prefix from filenames if present.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The "file" protocol driver should strip the "file:" prefix from
filenames if present.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Originally, this built up the error message with the backing filename,
so that errp was set as follows:
error_set(errp, QERR_OPEN_FILE_FAILED, backing_filename);
However, we now propagate the local_error from the
bdrv_open_backing_file() call instead, making these 2 lines useless
code.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The current iscsi block driver code makes the rather arbitrary decision
that TYPE_MEDIUM_CHANGER and TYPE_TAPE devices have bs->sg = 1 and all
other device types are disks.
Instead of this, check for TYPE_DISK to expose the disk interface and
make everything else bs->sg = 1. In particular, this includes devices
with TYPE_STORAGE_ARRAY, which is what LUN 0 of an iscsi target is.
(See https://bugzilla.redhat.com/show_bug.cgi?id=1067784 for the exact
scenario.)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Remove the definitions of GLUSTER_FD_WRITE and GLUSTER_FD_READ which are
no longer used. Also sockets.h isn't needed any more.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Pipe handling mechanism in gluster driver was based on similar implementation
in RBD driver and hence had GPLv2 and associated copyright information.
After changing gluster driver to coroutine based implementation, the pipe
handling code no longer exists and hence change gluster driver's licence to
GPLv2+ and remove RBD copyrights.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit adccfbcd60 (block: gluster - add
reopen support.) did not supply the qemu_gluster_init() Error **
argument, needed since commit a7451cb850
(gluster: correctly propagate errors).
Pass through qemu_gluster_reopen_prepare()'s errp, as done in
qemu_gluster_open().
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJTENUMAAoJEJykq7OBq3PIXoQH/R7Q9CqASrBEfbT1GJDSr3E5
yq0ZZ4nJBhhn0f8tOw9NWGCmTH1UZn0u4PbYEj4jTGJyWY2POkCKLPRvurge4c+Q
L+OwzspW14O35o7WuBC8uWvSxh8ExaiwwrutCsIsz68UAcnoiwyTMSKghIy9+Osk
lJKRFz6IFbgxCocqeNQrilj84vYSv9VzXZY8/nIhDMFsbjaKW7sKwu4jnXB7+v5J
cA0+voafINNICNMeXzkqBbKwWIFtYOkPe287MZQNbpS2vCS2bsZeivDxsBJ26Cmx
Xho8Xz5utvBMBrNtidHCTsd2jrj0VBdGgksYoHPyIwK/gBr1EVTGKpebraKXBEU=
=rgw9
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
Block pull request
# gpg: Signature made Fri 28 Feb 2014 18:27:24 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8
* remotes/stefanha/tags/block-pull-request:
block/vmdk: do not report file offset for compressed extents
discard rbd error output when not relevant in qemu-iotests
block: use /var/tmp instead of /tmp for -snapshot
qemu-io-test: Disable Quorum test when not compiled in.
qmp: Make Quorum error events more palatable.
qmp: Fix BlockdevOptionQuorum.
block: gluster - add reopen support.
block: gluster - code movements, state storage changes
qemu-iotests: add more tests to the "quick" group
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* remotes/bonzini/scsi-next:
block/iscsi: fix segfault if writesame fails
scsi-disk: Add support for port WWN and index descriptors in VPD page 83h
block/iscsi: query for supported VPD pages
block/iscsi: fix deadlock on scsi check condition
scsi-bus: Fix transfer length for VERIFY with BYTCHK=11b
scsi: report thin provisioning errors with werror=report
scsi: Change scsi sense buf size to 252
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Insert quorum QMP events documentation alphabetically.
Also change the "ret" errno value by an optional "error" being an strerror(-ret)
in the QUORUM_REPORT_BAD qmp event.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Gluster does parse open flags in its .bdrv_open() implementation,
and the .bdrv_reopen_* implementations need to do the same.
A new gluster connection to the image file to be created is established
in the .bdrv_reopen_prepare(), and the image file opened with the new
flags.
If this is successful, then the old image file is closed, and the
old connection torn down. The relevant structure pointers in the gluster
state structure are updated to the new connection.
If it is not successful, then the new file handle and connection is
abandoned (if it exists), while the old connection is not modified at
all.
With reopen supported, block-commit (and offline commit) is now also
supported for image files whose base image uses the native gluster
protocol driver.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In preparation for supporting reopen on gluster, move flag
parsing out to a function. Also, add a NULL check in the
gconf cleanup.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTB8hAAAoJEH8JsnLIjy/WQdgP/jEu5baA1/qKanDsS9l+81u1
/sIYSWpHDEJ0uavqTMBeyMOwkzel7SZRusIwA/d5pMqxbY6/86YJumTTozFWvtqc
IABqHtRKCxjcLdZRPbkuNAOiw6p76vSZa543o2t8OAhK2DIFy530wWXeoQEYvuJX
4pOh0lTradOrF1z6uW4ozgQ1efPppwh/iqwfWWNJVTgfnWxJk6qQaATEgkuSdsUN
Wp78UzOxLGO6JKJB6kP3LfNL0ANTYHpfH2/wkE6cW6TkSUduOm6hIBY+tb9khqYt
INOKxqFADK6EOgjvJBsZuZUtOnHK5oM921LepN/mOPAs6gKcn2j+FfqJrl3I1/5M
AXM3M0FPuijEKPGWw7pCLt7j84KJkD9a/rsKO37yRzw17fOma2Rpr4TrX43BF+5t
CGqQ7PzDJ6Fng4EXjyNDzviwXIK8xmG1tfn92tq/BUd6OuM9MCyzEGvEiGOMBoXv
w4iOV7UC+1P3TjnTBhMlBVGywSfdOJoHr9k4lXGNp0h8fPhM9rfruI3BFysxaas6
GmKbd7yvKwXOTptd3I9SB8BzVUL3CcD3FK24+cWKAl8GgyiDIWRlvBYyMp3p8Z8f
NDzcxYP6aRGsoddvpIWr3Tz89hw5wTW5u3RmNgxJUguz6HYKFbl30dpGT+96q2BN
YIAANTdPxn7BP6r3glQH
=ZaDG
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block patches
# gpg: Signature made Fri 21 Feb 2014 21:42:24 GMT using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
* remotes/kevin/tags/for-upstream: (54 commits)
iotests: Mixed quorum child device specifications
quorum: Simplify quorum_open()
quorum: Add unit test.
quorum: Add quorum_open() and quorum_close().
quorum: Implement recursive .bdrv_recurse_is_first_non_filter in quorum.
quorum: Add quorum_co_flush().
quorum: Add quorum_invalidate_cache().
quorum: Add quorum_getlength().
quorum: Add quorum mechanism.
quorum: Add quorum_aio_readv.
blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify.
quorum: Add quorum_aio_writev and its dependencies.
quorum: Create BDRVQuorumState and BlkDriver and do init.
quorum: Create quorum.c, add QuorumChildRequest and QuorumAIOCB.
check-qdict: Test termination of qdict_array_split()
check-qdict: Adjust test for qdict_array_split()
qdict: Extract non-QDicts in qdict_array_split()
qemu-config: Sections must consist of keys
qemu-iotests: Check qemu-img command line parsing
qemu-img: Allow -o help with incomplete argument list
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* remotes/bonzini/configure:
build: softmmu targets do not have a "main.o" file
configure: Disable libtool if -fPIE does not work with it (bug #1257099)
block: convert block drivers linked with libs to modules
Makefile: introduce common-obj-m and block-obj-m for DSO
Makefile: install modules with "make install"
module: implement module loading
rules.mak: introduce DSO rules
darwin: do not use -mdynamic-no-pic
block: use per-object cflags and libs
rules.mak: allow per object cflags and libs
rules.mak: fix $(obj) to a real relative path
util: Split out exec_dir from os_find_datadir
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
commit fa6252b0 introduced a segfault because it tries
to read iTask.task->sense after iTask.task has been
freed.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
this patch ensures that we only query for block provisioning and
block limits vpd pages if they are advertised. It also cleans
up the inquiry code and eliminates some redundant code.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
the retry logic was broken because the complete status
of the task structure was not reset. this resulted in
an infinite loop retrying the command over and over.
CC: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Although it may not look like it, this patch simplifies quorum_open().
qdict_array_split() is now able to return QLists with different objects
than only QDicts, therefore it will now do all the work and
quorum_open() does not have to handle reference strings by itself.
This allows mixing full option dicts and reference strings for
specifying the child block devices of quorum; furthermore, it improves
handling of malformed specifications.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Example of command line:
-drive if=virtio,driver=quorum,\
children.0.file.filename=1.raw,\
children.0.node-name=1.raw,\
children.0.driver=raw,\
children.1.file.filename=2.raw,\
children.1.node-name=2.raw,\
children.1.driver=raw,\
children.2.file.filename=3.raw,\
children.2.node-name=3.raw,\
children.2.driver=raw,\
vote-threshold=2
blkverify=on with vote-threshold=2 and two files can be passed to
emulate blkverify.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is used to activate quorum snapshot.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Makes a vote to select error if any.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We really want that live migration works with quorum so implement
quorum_invalidate_cache().
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Check that every bs file returns the same length.
Otherwise, return -EIO to disable the quorum and
avoid length discrepancy.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patchset enables the core of the quorum mechanism.
The num_children reads are compared to get the majority version and if this
version exists more than threshold times the guest won't see the error at all.
If a block is corrupted or if an error occurs during an IO or if the quorum
cannot be established QMP events are used to report to the management.
Use gnutls's SHA-256 to compare versions.
--enable-quorum must be used to enable the feature.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add code to do num_children reads in parallel and cleanup the structures
afterwards.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qemu_iovec_compare() will be used to compare IOs vectors in quorum blkverify
mode. The patch extracts these functions in order to factorize the code.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Writes are mirrored num_children times on num_children devices.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Create the structure holding the quorum settings and write the minimal block
driver instanciation boilerplate.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Quorum is a block filter mirroring writes to num_children children.
For reads quorum reads each children and does a vote.
If more than vote_threshold versions are identical the quorum is reached and
this winning version is returned to the guest. So quorum prevents bit corruption.
For high availability purpose minority errors are reported via QMP but the guest
does not see them.
This patch creates the driver C source file and introduces the structures that
will be used in asynchronous reads and writes.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of just putting it in debugging output, we can now put the
value in an Error.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Returning "Wrong medium type" for an image that does not have a valid
header is a bit weird. Improve the error by mentioning what format
was trying to open it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that we can return the "right" errors, use the Error** parameter
to pass them back instead of just printing them.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This prepares for propagating errors from vmdk_open_sparse and
vmdk_open_desc_file up to the caller of vmdk_open.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently, we just try reading a VMDK file as both image and descriptor.
This makes it hard to choose which of the two attempts gave the best error.
We'll decide in advance if the file looks like an image or a descriptor,
and this patch is the first step to that end.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Before:
$ ./qemu-io-old
qemu-io-old> open -r -o driver=vvfat,fat-type=24,dir=i386-softmmu
Valid FAT types are only 12, 16 and 32
qemu-io-old: can't open device (null): Could not open image: Invalid argument
After:
$ ./qemu-io
qemu-io> open -r -o driver=vvfat,fat-type=24,dir=i386-softmmu
qemu-io: can't open device (null): Valid FAT types are only 12, 16 and 32
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently, "gluster:///volname/img" and (using file. options)
"file.driver=gluster,file.filename=foo" will segfault. Also,
"//host/volname/img" will be rejected, but it is a valid URL
that should be accepted just fine with "file.driver=gluster".
Accept all of these, by inferring missing transport and host
as TCP and localhost respectively.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>