Commit Graph

223 Commits

Author SHA1 Message Date
Eric Blake f464906951 block: Comment cleanups
It's been a while since we got rid of the sector-based bdrv_read and
bdrv_write (commit 2e11d756); let's finish the job on a few remaining
comments.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200428213807.776655-1-eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-05-05 13:17:36 +02:00
Kevin Wolf 7b8e485742 block: Add flags to bdrv(_co)_truncate()
Now that block drivers can support flags for .bdrv_co_truncate, expose
the parameter in the node level interfaces bdrv_co_truncate() and
bdrv_truncate().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200424125448.63318-3-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-04-30 17:51:07 +02:00
Kevin Wolf dea9052ef1 qcow2: update_refcount(): Reset old_table_index after qcow2_cache_put()
In the case that update_refcount() frees a refcount block, it evicts it
from the metadata cache. Before doing so, however, it returns the
currently used refcount block to the cache because it might be the same.
Returning the refcount block early means that we need to reset
old_table_index so that we reload the refcount block in the next
iteration if it is actually still in use.

Fixes: f71c08ea8e
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200211094900.17315-2-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18 10:53:56 +01:00
Alberto Garcia ef97d608c7 qcow2: Don't round the L1 table allocation up to the sector size
The L1 table is read from disk using the byte-based bdrv_pread() and
is never accessed beyond its last element, so there's no need to
allocate more memory than that.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: b2e27214ec7b03a585931bcf383ee1ac3a641a10.1579374329.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-06 13:47:45 +01:00
Max Reitz c80d8b06cf block: Add @exact parameter to bdrv_co_truncate()
We have two drivers (iscsi and file-posix) that (in some cases) return
success from their .bdrv_co_truncate() implementation if the block
device is larger than the requested offset, but cannot be shrunk.  Some
callers do not want that behavior, so this patch adds a new parameter
that they can use to turn off that behavior.

This patch just adds the parameter and lets the block/io.c and
block/block-backend.c functions pass it around.  All other callers
always pass false and none of the implementations evaluate it, so that
this patch does not change existing behavior.  Future patches take care
of that.

Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190918095144.955-5-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28 12:00:07 +01:00
Kevin Wolf 5e97855052 qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation()
qcow2_detect_metadata_preallocation() calls qcow2_get_refcount() which
requires s->lock to be taken to protect its accesses to the refcount
table and refcount blocks. However, nothing in this code path actually
took the lock. This could cause the same cache entry to be used by two
requests at the same time, for different tables at different offsets,
resulting in image corruption.

As it would be preferable to base the detection on consistent data (even
though it's just heuristics), let's take the lock not only around the
qcow2_get_refcount() calls, but around the whole function.

This patch takes the lock in qcow2_co_block_status() earlier and asserts
in qcow2_detect_metadata_preallocation() that we hold the lock.

Fixes: 69f47505ee
Cc: qemu-stable@nongnu.org
Reported-by: Michael Weiser <michael.weiser@gmx.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Michael Weiser <michael.weiser@gmx.de>
Reviewed-by: Michael Weiser <michael.weiser@gmx.de>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2019-10-25 15:18:55 +02:00
Markus Armbruster a8d2532645 Include qemu-common.h exactly where needed
No header includes qemu-common.h after this commit, as prescribed by
qemu-common.h's file comment.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-5-armbru@redhat.com>
[Rebased with conflicts resolved automatically, except for
include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c
block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c
target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h
target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h
target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h
target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and
net/tap-bsd.c fixed up]
2019-06-12 13:20:20 +02:00
Vladimir Sementsov-Ogievskiy 1477b6c803 block/qcow2-refcount: add trace-point to qcow2_process_discards
Let's at least trace ignored failure.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04 16:55:58 +02:00
Vladimir Sementsov-Ogievskiy 69f47505ee block: avoid recursive block_status call if possible
drv_co_block_status digs bs->file for additional, more accurate search
for hole inside region, reported as DATA by bs since 5daa74a6eb.

This accuracy is not free: assume we have qcow2 disk. Actually, qcow2
knows, where are holes and where is data. But every block_status
request calls lseek additionally. Assume a big disk, full of
data, in any iterative copying block job (or img convert) we'll call
lseek(HOLE) on every iteration, and each of these lseeks will have to
iterate through all metadata up to the end of file. It's obviously
ineffective behavior. And for many scenarios we don't need this lseek
at all.

However, lseek is needed when we have metadata-preallocated image.

So, let's detect metadata-preallocation case and don't dig qcow2's
protocol file in other cases.

The idea is to compare allocation size in POV of filesystem with
allocations size in POV of Qcow2 (by refcounts). If allocation in fs is
significantly lower, consider it as metadata-preallocation case.

102 iotest changed, as our detector can't detect shrinked file as
metadata-preallocation, which don't seem to be wrong, as with metadata
preallocation we always have valid file length.

Two other iotests have a slight change in their QMP output sequence:
Active 'block-commit' returns earlier because the job coroutine yields
earlier on a blocking operation. This operation is loading the refcount
blocks in qcow2_detect_metadata_preallocation().

Suggested-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04 15:20:41 +02:00
Vladimir Sementsov-Ogievskiy 9353db47c5 qcow2.h: add missing include
qcow2.h depends on block_int.h. Compilation isn't broken currently only
due to block_int.h always included before qcow2.h. Though, it seems
better to directly include block_int.h in qcow2.h.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190506142741.41731-2-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-28 20:30:55 +02:00
Alberto Garcia b6c246942b qcow2: Define and use QCOW2_COMPRESSED_SECTOR_SIZE
When an L2 table entry points to a compressed cluster the space used
by the data is specified in 512-byte sectors. This size is independent
from BDRV_SECTOR_SIZE and is specific to the qcow2 file format.

The QCOW2_COMPRESSED_SECTOR_SIZE constant defined in this patch makes
this explicit.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-20 17:08:56 +02:00
Alberto Garcia e3b4257d03 qcow2: Replace bdrv_write() with bdrv_pwrite()
There's only one bdrv_write() call left in the qcow2 code, and it can
be trivially replaced with the byte-based bdrv_pwrite().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-10 16:45:40 +02:00
Vladimir Sementsov-Ogievskiy 54b10010eb qcow2-refcount: don't mask corruptions under internal errors
No reasons for not reporting found corruptions as corruptions in case
of some internal errors, especially in case of just failed to fix l2
entry (and in this case, missed corruptions may influence comparing
logic, when we calculate difference between corruptions fields of two
results)

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190227131433.197063-6-vsementsov@virtuozzo.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-07 17:14:21 +02:00
Vladimir Sementsov-Ogievskiy cbb51e9f93 qcow2-refcount: check_refcounts_l2: don't count fixed cluster as allocated
Do not count a cluster which is fixed to be ZERO as allocated.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190227131433.197063-5-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-07 17:14:21 +02:00
Vladimir Sementsov-Ogievskiy 1ef337b7a0 qcow2-refcount: check_refcounts_l2: reduce ignored overlaps
Reduce number of structures ignored in overlap check: when checking
active table ignore active tables, when checking inactive table ignore
inactive ones.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190227131433.197063-4-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-07 17:14:21 +02:00
Vladimir Sementsov-Ogievskiy a5fff8d4b4 qcow2-refcount: avoid eating RAM
qcow2_inc_refcounts_imrt() (through realloc_refcount_array()) can eat
an unpredictable amount of memory on corrupted table entries, which are
referencing regions far beyond the end of file.

Prevent this, by skipping such regions from further processing.

Interesting that iotest 138 checks exactly the behavior which we fix
here. So, change the test appropriately.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190227131433.197063-3-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-07 17:14:21 +02:00
Vladimir Sementsov-Ogievskiy 7e3e736cbd qcow2-refcount: fix check_oflag_copied
Increase corruptions_fixed only after successful fix.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190227131433.197063-2-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-05-07 17:14:21 +02:00
Kevin Wolf e9f5b6deaa qcow2: Support external data file in qemu-img check
For external data files, data clusters must be excluded from the
refcount calculations. Instead, an implicit refcount of 1 is assumed for
the COPIED flag.

Compressed clusters and internal snapshots are incompatible with
external data files, so print an error if they are in use for images
with an external data file.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-08 12:26:46 +01:00
Kevin Wolf 966b000f49 qcow2: External file I/O
This changes the qcow2 implementation to direct all guest data I/O to
s->data_file rather than bs->file, while metadata I/O still uses
bs->file. At the moment, this is still always the same, but soon we'll
add options to set s->data_file to an external data file.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-08 12:26:46 +01:00
Kevin Wolf 808c2bb4c4 qcow2: Pass bs to qcow2_get_cluster_type()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-08 12:26:45 +01:00
Alberto Garcia cdc674c736 qcow2: Assert that refcount block offsets fit in the refcount table
Refcount table entries have a field to store the offset of the
refcount block. The rest of the bits of the entry are currently
reserved.

The offset is always taken from the entry using REFT_OFFSET_MASK to
ensure that we only use the bits that belong to that field.

While that mask is used every time we read from the refcount table, it
is never used when we write to it. Due to the other constraints of the
qcow2 format QEMU can never produce refcount block offsets that don't
fit in that field so any such offset when allocating a refcount block
would indicate a bug in QEMU.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-01 13:46:44 +01:00
Eric Blake 77d6a21558 qcow2: Don't allow overflow during cluster allocation
Our code was already checking that we did not attempt to
allocate more clusters than what would fit in an INT64 (the
physical maximimum if we can access a full off_t's worth of
data).  But this does not catch smaller limits enforced by
various spots in the qcow2 image description: L1 and normal
clusters of L2 are documented as having bits 63-56 reserved
for other purposes, capping our maximum offset at 64PB (bit
55 is the maximum bit set).  And for compressed images with
2M clusters, the cap drops the maximum offset to bit 48, or
a maximum offset of 512TB.  If we overflow that offset, we
would write compressed data into one place, but try to
decompress from another, which won't work.

It's actually possible to prove that overflow can cause image
corruption without this patch; I'll add the iotests separately
in the next commit.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-19 12:51:40 +01:00
Liam Merwick 7cb6d3c9be qcow2: Read outside array bounds in qcow2_pre_write_overlap_check()
The commit for 0e4e4318ea increments QCOW2_OL_MAX_BITNR but does not
add an array entry for QCOW2_OL_BITMAP_DIRECTORY_BITNR to metadata_ol_names[].
As a result, an array dereference of metadata_ol_names[8] in
qcow2_pre_write_overlap_check() could result in a read outside of the array bounds.

Fixes: 0e4e4318ea ('qcow2: add overlap check for bitmap directory')

Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1541453919-25973-6-git-send-email-Liam.Merwick@oracle.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-11-12 17:49:21 +01:00
Peter Maydell 7851f1a706 Block layer patches:
- Copy offloading fixes for when the copy increases the image size
 - Temporary revert of the removal of deprecated -drive options
 - Fix request serialisation in the image fleecing scenario
 - Fix copy-on-read crash with unaligned image size
 - Fix another drain crash
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbRNLQAAoJEH8JsnLIjy/WOaQQALlZk01JohETuwGG6HGl0LdI
 jEEm+N0J+BlGOVjoGU67OKGidUCl5WvBsQTlyYkmlaToGuk/njWxCa/GA6+iNRnt
 MDq7Ovr8uZI3D+0Fuc6xg/6NBiLkukgh0Q9gMWkzn3jaNWzO2WcTr8WXwepvP6sj
 YtPhEQOXTT3sXf/MFY8ig7qRrZ6f7LFOoKu7LMnrD+QWDo8TY3QLZaxP9OUFHH7S
 A6J0LIfuRZlq79a7SgrRkCR2ddtgYyBQ+zD7PD5kf1vLW4+dOhDOutQEsZCMCPgR
 ft99kNhrZcJGN6n2r8/oVcvRkw5c4I1JPgakm/GoW/NllfPMebuPospKaS4wiJnB
 zI4YOtmco4Mfxkw/wK+Ep/bPCpxEF43uDcpPiEjsNADrdLq0eKnPn5ctwSyWlGvn
 ayQWxDoKoYckn/ccjtLxJ2xPws8433cTXrBdIKnJadWxi3iRNzlIKHRuEfXf9zQt
 G+Nq7ruysT9TPf9ifuCHcZnTsi3SLYLsjCj7pAgBkazBYE2cCI3eKN8kxsDJi7qv
 cWzFCpwE28pHRJ6FwtdzBVkNcfTlC/XopR1M66OzYZlLqR/4hbNhyHL3hBV+yfrM
 fC7mPi81ttI6e+JAgC6K8t3Ey242MjSzUYa7pJUNws7RpqUhfhr6EXXbBceJKsVW
 F8qKZoiIEK7wDacUiEiE
 =FXOo
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- Copy offloading fixes for when the copy increases the image size
- Temporary revert of the removal of deprecated -drive options
- Fix request serialisation in the image fleecing scenario
- Fix copy-on-read crash with unaligned image size
- Fix another drain crash

# gpg: Signature made Tue 10 Jul 2018 16:37:52 BST
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (24 commits)
  block: Use common write req handling in truncate
  block: Fix bdrv_co_truncate overlap check
  block: Use common req handling in copy offloading
  block: Use common req handling for discard
  block: Fix handling of image enlarging write
  block: Extract common write req handling
  block: Use uint64_t for BdrvTrackedRequest byte fields
  block: Use BdrvChild to discard
  block: Add copy offloading trace points
  block: Prefix file driver trace points with "file_"
  Revert "block: Remove deprecated -drive geometry options"
  Revert "block: Remove deprecated -drive option addr"
  Revert "block: Remove deprecated -drive option serial"
  Revert "block: Remove dead deprecation warning code"
  block/blklogwrites: Make sure the log sector size is not too small
  qapi/block-core.json: Add missing documentation for blklogwrites log-append option
  block/backup: fix fleecing scheme: use serialized writes
  block: add BDRV_REQ_SERIALISING flag
  block: split flags in copy_range
  block/io: fix copy_range
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-07-10 17:28:29 +01:00
Fam Zheng 0b9fd3f467 block: Use BdrvChild to discard
Other I/O functions are already using a BdrvChild pointer in the API, so
make discard do the same. It makes it possible to initiate the same
permission checks before doing I/O, and much easier to share the
helper functions for this, which will be added and used by write,
truncate and copy range paths.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-10 16:01:52 +02:00
Vladimir Sementsov-Ogievskiy 0e4e4318ea qcow2: add overlap check for bitmap directory
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20180705151515.779173-1-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-07-09 19:43:24 +02:00
Max Reitz 3cce51c919 qcow2: Repair OFLAG_COPIED when fixing leaks
Repairing OFLAG_COPIED is usually safe because it is done after the
refcounts have been repaired.  Therefore, it we did not find anyone else
referencing a data or L2 cluster, it makes no sense to not set
OFLAG_COPIED -- and the other direction (clearing OFLAG_COPIED) is
always safe, anyway, it may just induce leaks.

Furthermore, if OFLAG_COPIED is actually consistent with a wrong (leaky)
refcount, we will decrement the refcount with -r leaks, but OFLAG_COPIED
will then be wrong.  qemu-img check should not produce images that are
more corrupted afterwards then they were before.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1527085
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20180509200059.31125-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-11 16:18:45 +02:00
Michael S. Tsirkin 0d8c41dae5 block: use local path for local headers
When pulling in headers that are in the same directory as the C file (as
opposed to one in include/), we should use its relative path, without a
directory.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2018-05-31 04:16:06 +03:00
Alberto Garcia 74c44a5934 Fix error message about compressed clusters with OFLAG_COPIED
Compressed clusters are not supposed to have the COPIED bit set.
"qemu-img check" detects that and prints an error message reporting
the number of the affected host cluster. This doesn't make much sense
because compressed clusters are not aligned to host clusters, so it
would be better to report the offset instead. Plus, the calculation is
wrong and it uses the raw L2 entry as if it was simply an offset.

This patch fixes the error message and reports the offset of the
compressed cluster.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 0f687957feb72e80c740403191a47e607c2463fe.1523376013.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15 16:15:21 +02:00
Alberto Garcia abf754fe40 qcow2: Reset free_cluster_index when allocating a new refcount block
When we try to allocate new clusters we first look for available ones
starting from s->free_cluster_index and once we find them we increase
their reference counts. Before we get to call update_refcount() to do
this last step s->free_cluster_index is already pointing to the next
cluster after the ones we are trying to allocate.

During update_refcount() it may happen however that we also need to
allocate a new refcount block in order to store the refcounts of these
new clusters (and to complicate things further that may also require
us to grow the refcount table). After all this we don't know if the
clusters that we originally tried to allocate are still available, so
we return -EAGAIN to ask the caller to restart the search for free
clusters.

This is what can happen in a common scenario:

  1) We want to allocate a new cluster and we see that cluster N is
     free.

  2) We try to increase N's refcount but all refcount blocks are full,
     so we allocate a new one at N+1 (where s->free_cluster_index was
     pointing at).

  3) Once we're done we return -EAGAIN to look again for a free
     cluster, but now s->free_cluster_index points at N+2, so that's
     the one we allocate. Cluster N remains unallocated and we have a
     hole in the qcow2 file.

This can be reproduced easily:

     qemu-img create -f qcow2 -o cluster_size=512 hd.qcow2 1M
     qemu-io -c 'write 0 124k' hd.qcow2

After this the image has 132608 bytes (256 clusters), and the refcount
block is full. If we write 512 more bytes it should allocate two new
clusters: the data cluster itself and a new refcount block.

     qemu-io -c 'write 124k 512' hd.qcow2

However the image has now three new clusters (259 in total), and the
first one of them is empty (and unallocated):

     dd if=hd.qcow2 bs=512c skip=256 count=1 | hexdump -C

If we write larger amounts of data in the last step instead of the 512
bytes used in this example we can create larger holes in the qcow2
file.

What this patch does is reset s->free_cluster_index to its previous
value when alloc_refcount_block() returns -EAGAIN. This way the caller
will try to allocate again the original clusters if they are still
free.

The output of iotest 026 also needs to be updated because now that
images have no holes some tests fail at a different point and the
number of leaked clusters is different.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-26 12:16:00 +02:00
Alberto Garcia 0c2ada8136 qcow2: Make qemu-img check detect corrupted L1 tables in snapshots
'qemu-img check' cannot detect if a snapshot's L1 table is corrupted.
This patch checks the table's offset and size and reports corruption
if the values are not valid.

This patch doesn't add code to fix that corruption yet, only to detect
and report it.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-09 15:17:47 +01:00
Alberto Garcia c7a9d81d70 qcow2: Check snapshot L1 tables in qcow2_check_metadata_overlap()
The inactive-l2 overlap check iterates uses the L1 tables from all
snapshots, but it does not validate them first.

We now have a function to take care of this, so let's use it.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-09 15:17:47 +01:00
Paolo Bonzini 8b220eb7c8 qcow2: introduce qcow2_write_caches and qcow2_flush_caches
They will be used to avoid recursively taking s->lock during
bdrv_open or bdrv_check.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1516279431-30424-7-git-send-email-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-09 15:17:47 +01:00
Alberto Garcia 9e029689e1 qcow2: Replace align_offset() with ROUND_UP()
The align_offset() function is equivalent to the ROUND_UP() macro so
there's no need to use the former. The ROUND_UP() name is also a bit
more explicit.

This patch uses ROUND_UP() instead of the slower QEMU_ALIGN_UP()
because align_offset() already requires that the second parameter is a
power of two.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180215131008.5153-1-berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-03-02 18:39:56 +01:00
Alberto Garcia 83ad165be7 qcow2: Update qcow2_update_snapshot_refcount() to support L2 slices
qcow2_update_snapshot_refcount() increases the refcount of all
clusters of a given snapshot. In order to do that it needs to load all
its L2 tables and iterate over their entries. Since we'll be loading
L2 slices instead of full tables we need to add an extra loop that
iterates over all slices of each L2 table.

This function doesn't need any additional changes so apart from that
this patch simply updates the variable name from l2_table to l2_slice.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 5f4db199b9637f4833b58487135124d70add8cf0.1517840877.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 17:00:00 +01:00
Alberto Garcia ca62dd5c2b qcow2: Prepare qcow2_update_snapshot_refcount() for adding L2 slice support
Adding support for L2 slices to qcow2_update_snapshot_refcount() needs
(among other things) an extra loop that iterates over all slices of
each L2 table.

Putting all changes in one patch would make it hard to read because
all semantic changes would be mixed with pure indentation changes.

To make things easier this patch simply creates a new block and
changes the indentation of all lines of code inside it. Thus, all
modifications in this patch are cosmetic. There are no semantic
changes and no variables are renamed yet. The next patch will take
care of that.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 8ffaa5e55bd51121f80e498f4045b64902a94293.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 17:00:00 +01:00
Alberto Garcia 6e6fa7605e qcow2: Remove BDS parameter from qcow2_cache_is_table_offset()
This function was only using the BlockDriverState parameter to pass it
to qcow2_cache_get_table_addr(). This is no longer necessary so this
parameter can be removed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: eb0ed90affcf302e5a954bafb5931b5215483d3a.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:59 +01:00
Alberto Garcia 77aadd7bed qcow2: Remove BDS parameter from qcow2_cache_discard()
This function was only using the BlockDriverState parameter to pass it
to qcow2_cache_get_table_idx() and qcow2_cache_table_release(). This
is no longer necessary so this parameter can be removed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 9724f7e38e763ad3be32627c6b7fe8df9edb1476.1517840877.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:59 +01:00
Alberto Garcia 2013c3d44d qcow2: Remove BDS parameter from qcow2_cache_put()
This function was only using the BlockDriverState parameter to pass it
to qcow2_cache_get_table_idx(). This is no longer necessary so this
parameter can be removed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 6f98155489054a457563da77cdad1a66ebb3e896.1517840876.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:59 +01:00
Alberto Garcia 2d135ee92d qcow2: Remove BDS parameter from qcow2_cache_entry_mark_dirty()
This function was only using the BlockDriverState parameter to pass it
to qcow2_cache_get_table_idx(). This is no longer necessary so this
parameter can be removed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 5c40516a91782b083c1428b7b6a41bb9e2679bfb.1517840876.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-13 16:59:59 +01:00
Max Reitz ac5b787a6e qcow2: Repair unaligned preallocated zero clusters
We can easily repair unaligned preallocated zero clusters by discarding
them, so why not do it?

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171110203759.14018-2-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-01-23 12:34:42 +01:00
Max Reitz 23482f8a60 qcow2: Add bounds check to get_refblock_offset()
Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com>
Buglink: https://bugs.launchpad.net/qemu/+bug/1728661
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20171110203111.7666-5-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-17 18:21:31 +01:00
Alberto Garcia 8aa34834d5 qcow2: Prevent allocating compressed clusters at offset 0
If the refcount data is corrupted then we can end up trying to
allocate a new compressed cluster at offset 0 in the image, triggering
an assertion in qcow2_alloc_bytes() that would crash QEMU:

  qcow2_alloc_bytes: Assertion `offset' failed.

This patch adds an explicit check for this scenario and a new test
case.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: fb53467cf48e95ff3330def1cf1003a5b862b7d9.1509718618.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-14 18:06:25 +01:00
Alberto Garcia 6bf45d59f9 qcow2: Prevent allocating refcount blocks at offset 0
Each entry in the qcow2 cache contains an offset field indicating the
location of the data in the qcow2 image. If the offset is 0 then it
means that the entry contains no data and is available to be used when
needed.

Because of that it is not possible to store in the cache the first
cluster of the qcow2 image (offset = 0). This is not a problem because
that cluster always contains the qcow2 header and we're not using this
cache for that.

However, if the qcow2 image is corrupted it can happen that we try to
allocate a new refcount block at offset 0, triggering this assertion
and crashing QEMU:

  qcow2_cache_entry_mark_dirty: Assertion `c->entries[i].offset != 0' failed

This patch adds an explicit check for this scenario and a new test
case.

This problem was originally reported here:

   https://bugs.launchpad.net/qemu/+bug/1728615

Reported-by: R.Nageswara Sastry <nasastry@in.ibm.com>
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 92a2fadd10d58b423f269c1d1a309af161cdc73f.1509718618.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-14 18:06:25 +01:00
Pavel Butsykin 163bc39d2c qcow2: truncate the tail of the image file after shrinking the image
Now after shrinking the image, at the end of the image file, there might be a
tail that probably will never be used. So we can find the last used cluster and
cut the tail.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20170929121613.25997-3-pbutsykin@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:48 +02:00
Pavel Butsykin 46b732cdf3 qcow2: add shrink image support
This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20170918124230.8152-4-pbutsykin@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-09-26 15:00:32 +02:00
Pavel Butsykin f71c08ea8e qcow2: add qcow2_cache_discard
Whenever l2/refcount table clusters are discarded from the file we can
automatically drop unnecessary content of the cache tables. This reduces
the chance of eviction useful cache data and eliminates inconsistent data
in the cache with the data in the file.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20170918124230.8152-3-pbutsykin@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-09-26 15:00:32 +02:00
Philippe Mathieu-Daudé f80ac75d0e qcow2: fix null pointer dereference
It seems this assert() was somehow misplaced.

block/qcow2-refcount.c:2193:42: warning: Array access (from variable 'on_disk_reftable') results in a null pointer dereference
        on_disk_reftable[refblock_index] = refblock_offset;
        ~~~~~~~~~~~~~~~~                 ^

Reported-by: Clang Static Analyzer
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-07-31 13:06:38 +03:00
Max Reitz 772d1f973f block/qcow2: falloc/full preallocating growth
Implement the preallocation modes falloc and full for growing qcow2
images.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170613202107.10125-15-mreitz@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-07-11 17:45:02 +02:00
Max Reitz 60c48a29b7 block/qcow2: Rename "fail_block" to just "fail"
Now alloc_refcount_block() only contains a single fail label, so it
makes more sense to just name it "fail" instead of "fail_block".

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20170613202107.10125-14-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-07-11 17:45:02 +02:00