Commit Graph

20191 Commits

Author SHA1 Message Date
Paolo Bonzini dc5a137125 qemu-img: make "info" backing file output correct and easier to use
qemu-img info should use the same logic as qemu when printing the
backing file path, or debugging becomes quite tricky.  We can also
simplify the output in case the backing file has an absolute path
or a protocol.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini 6405875cdd block: move field reset from bdrv_open_common to bdrv_close
bdrv_close should leave fields in the same state as bdrv_new.  It is
not up to bdrv_open_common to fix the mess.

Also, backing_format was not being re-initialized.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini 947995c09e block: protect path_has_protocol from filenames with colons
path_has_protocol will erroneously return "true" if the colon is part
of a filename.  These names are common with stable device names produced
by udev.  We cannot fully protect against this in case the filename
does not have a path component (e.g. if the current directory is
/dev/disk/by-path), but in the common case there will be a slash before
and path_has_protocol can easily detect that and return false.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini f53f4da9c6 block: simplify path_is_absolute
On Windows, all the logic is already in is_windows_drive and
is_windows_drive_prefix.  On POSIX, there is no need to look
out for colons.

The win32 code changes the behaviour in some cases, we could have
something like "d:foo.img". The old code would treat it as relative
path, the new one as absolute. Now the path is absolute, because to
go from c:/program files/blah to d:foo.img you cannot say c:/program
files/blah/d:foo.img.  You have to say d:foo.img.  But you could also
say it's relative because (I think, at least it was like that in DOS
15 years ago) d:foo.img is relative to the current path of drive D.
Considering how path_is_absolute is used by path_combine, I think it's
better to treat it as absolute.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini fa4478d5c8 block: wait for job callback in block_job_cancel_sync
The limitation on not having I/O after cancellation cannot really be
kept.  Even streaming has a very small race window where you could
cancel a job and have it report completion.  If this window is hit,
bdrv_change_backing_file() will yield and possibly cause accesses to
dangling pointers etc.

So, let's just assume that we cannot know exactly what will happen
after the coroutine has set busy to false.  We can set a very lax
condition:

- if we cancel the job, the coroutine won't set it to false again
(and hence will not call co_sleep_ns again).

- block_job_cancel_sync will wait for the coroutine to exit, which
pretty much ensures no race.

Instead, we track the coroutine that executes the job and put very
strict conditions on what to do while it is quiescent (busy = false).
First of all, the coroutine must never set busy = false while the job
has been cancelled.  Second, the coroutine can be reentered arbitrarily
while it is quiescent, so you cannot really do anything but co_sleep_ns at
that time.  This condition is obeyed by the block_job_sleep_ns function.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini 4513eafe92 block: add block_job_sleep_ns
This function abstracts the pretty complex semantics of the "busy"
member of BlockJob.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini 0ac9377d04 block: fully delete bs->file when closing
We are reusing bs->file across close/open, which may not cause any
known bugs but is a recipe for trouble.  Prefer bdrv_delete, and
enjoy the new invariant in the implementation of bdrv_delete.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini a275fa42fa block: do not reuse the backing file across bdrv_close/bdrv_open
This is another bug caused by not doing a full cleanup of the BDS
across close/open.  This was found with mirroring by Shaolong Hu,
but it can probably be reproduced also with eject or change.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini 3a389e7926 block: another bdrv_append fix
bdrv_append must also copy open_flags to the top, because the snapshot
has BDRV_O_NO_BACKING set.  This causes interesting results if you
later use drive-reopen (not upstream) to reopen the image, and lose
the backing file in the process.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini e023b2e244 block: fix snapshot on QED
QED's opaque data includes a pointer back to the BlockDriverState.
This breaks when bdrv_append shuffles data between bs_new and bs_top.
To avoid this, add a "rebind" function that tells the driver about
the new relationship between the BlockDriverState and its opaque.

The patch also adds rebind to VVFAT for completeness, even though
it is not used with live snapshots.

Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Kevin Wolf 93e9eb6808 qtest: Add floppy test
Let's start with testing media change.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2012-05-10 10:32:12 +02:00
Kevin Wolf a3ca163cb5 qtest: Add function to send QMP commands
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2012-05-10 10:32:11 +02:00
Paolo Bonzini c68b039aa9 qemu-iotests: strip spaces from qemu-img/qemu-io/qemu command lines
A trailing space is left when qemu-img has no arguments, for example if
-nocache is not used.  This becomes an empty argument after split()
and causes qemu-io to fail.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Paolo Bonzini 71df14fcbe block: fix allocation size for dirty bitmap
Also reuse elsewhere the new constant for sizeof(unsigned long) * 8.

The dirty bitmap is allocated in bits but declared as unsigned long.
Thus, its memory block is accessed beyond its end unless the image
is a multiple of 64 chunks (i.e. a multiple of 64 MB).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Paolo Bonzini 63090dac3a block: open backing file as read-only when probing for size
bdrv_img_create will temporarily open the backing file to probe its size.
However, this could be done with a read-write open if the wrong flags are
passed to bdrv_img_create.  Since there is really no documentation on
what flags can be passed, assume that bdrv_img_create receives the flags
with which the new image will be opened; sanitize them when opening
the backing file.

Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Paolo Bonzini 469ef350e1 block: update in-memory backing file and format
These are needed to print "info block" output correctly.  QCOW2 does this
because it needs it to write the header, but QED does not, and common code
is the right place to do it.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Paolo Bonzini 5f3777945d block: push bdrv_change_backing_file error checking up from drivers
This check applies to all drivers, but QED lacks it.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Paolo Bonzini e86fe18ac9 block: fail live snapshot if disk has no medium
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Paolo Bonzini 31155b9b3c block: add mode argument to blockdev-snapshot-sync
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Zhi Yong Wu 4c355d53c6 block: add the support to drain throttled requests
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
[ Iterate until all block devices have processed all requests,
  add comments. - Paolo ]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Anthony Liguori 9f34841a81 Update version for 1.1.0-rc0 release
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-09 16:39:57 -05:00
Andreas Färber 1b3e76ebd1 tcg/ppc: Fix CONFIG_TCG_PASS_AREG0 mode
Adjust the tcg_out_qemu_{ld,st}() slow paths to pass AREG0 in r3,
based on patches by malc.

Also adjust the registers clobbered, based on patch by Alex.

Signed-off-by: Andreas Färber <afaerber@suse.de>
Acked-by: Alexander Graf <agraf@suse.de>
[AF: Do not hardcode r3 for AREG0, requested by Alex]
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-09 13:59:19 -05:00
Andreas Färber a082615b07 tcg/ppc: Clobber r5 for 64-bit qemu_ld
This accounts for the additional addr_reg2 register.

Signed-off-by: Andreas Färber <afaerber@suse.de>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-09 13:59:19 -05:00
Andreas Färber d831fdb051 tcg/ppc: Don't hardcode register numbers
Also assure i64 alignment where necessary.

Alignment code optimization suggested by malc.

Signed-off-by: Andreas Färber <afaerber@suse.de>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-09 13:59:19 -05:00
Andreas Färber c1696d94c1 tcg/ppc: Do not overwrite lower address word on Darwin and AIX
For targets where TARGET_LONG_BITS != 32, i.e. 64-bit guests,
addr_reg is moved to r4. For hosts without TCG_TARGET_CALL_ALIGN_ARGS
either data_reg2 or data_reg or a masked version thereof would overwrite
r4. Place it in r5 instead, matching TCG_TARGET_CALL_ALIGN_ARGS hosts.

This fixes immediate crashes of 64-bit guests observed on Darwin/ppc but
not on Darwin/ppc64.

Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Acked-by: malc <av1474@comtv.ru>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-09 13:59:18 -05:00
Anthony Liguori c438b1970b Merge remote-tracking branch 'qmp/queue/qmp' into staging
* qmp/queue/qmp:
  hmp: fix bad value conversion for M type
  hmp: expr_unary(): check for overflow in strtoul()/strtoull()
  vl: drop is_suspended variable
  runstate: introduce suspended state
  qapi-schema.json: fix RunState enums alphabetical order
  wakeup on migration
2012-05-08 13:07:41 -05:00
Luiz Capitulino 911628498c hmp: fix bad value conversion for M type
The M type converts from megabytes to bytes. However, the value can be
negative before the conversion, which will lead to a flawed conversion.

For example, this:

 (qemu) balloon -1000000000000011
 (qemu)

Just "works", but the value passed by the balloon command will be
something else.

This patch fixes this problem by requering a positive value before
converting. There's really no reason to accept a negative value for
the M type.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2012-05-08 14:30:22 -03:00
Luiz Capitulino 6b0e33be88 hmp: expr_unary(): check for overflow in strtoul()/strtoull()
It's not checked currently, so something like:

  (qemu) balloon -100000000000001111114334234
  (qemu)

Will just "work" (in this case the balloon command will get a random
value).

Fix it by checking if strtoul()/strtoull() overflowed.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2012-05-08 14:30:22 -03:00
Luiz Capitulino 9abc62f644 vl: drop is_suspended variable
Check for the RUN_STATE_SUSPENDED state instead.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-05-08 14:30:22 -03:00
Luiz Capitulino ad02b96ad8 runstate: introduce suspended state
QEMU enters in this state when the guest suspends to ram (S3).

This is important so that HMP users and QMP clients can know that
the guest is suspended. QMP also has an event for this, but events
are not reliable and are limited (ie. a client can connect to QEMU
after the event has been emitted).

Having a different state for S3 brings a new issue, though. Every
device that doesn't run when the VM is stopped but wants to run
when the VM is suspended has to check for RUN_STATE_SUSPENDED
explicitly. This is the case for the keyboard and mouse devices,
for example.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
2012-05-08 14:30:09 -03:00
Luiz Capitulino 0a24c7b18e qapi-schema.json: fix RunState enums alphabetical order
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-05-08 14:29:33 -03:00
Gerd Hoffmann 7b5d3aa2cc wakeup on migration
Wakeup the guest when the live part of the migation is finished.
This avoids being in suspended state on migration, so we don't
have to save the is_suspended bit.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-05-08 14:29:14 -03:00
Peter Maydell c5954819b6 user-exec.c: Don't assert on segfaults for non-valid addresses
h2g() will assert if passed an address that's not a valid guest address,
so handle_cpu_signal() needs to check before passing "data address
which caused a segfault" to it, since for a misbehaving guest
that could be anything. If the address isn't a valid guest address
then we can simply skip the attempt to unprotect a guest page
which was made read-only to catch self-modifying code.

This assertion probably fires more readily now than it used to
do because of recent changes to default to reserving guest address
space.

Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-08 11:15:18 -05:00
Andreas Färber 90f2cefb17 scripts/qemu-binfmt-conf.sh: Fix shell syntax
The script is organized as a sequence of binfmt registrations, with a
check whether the to be registered architecture matches the host.

Add a missing fi for the SuperH section.

Reported-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-08 11:15:18 -05:00
Andreas Färber f5df5baf11 cpu: Update documentation and comment
State struct CPU had been renamed to CPUState, former CPUState to
CPUArchState.

Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-08 11:15:18 -05:00
Andreas Färber 175003702a configure: Assure printing "yes" or "no" for VirtFS support
When auto-detecting VirtFS support, virtfs="". Set it to "no" after
checking whether it was explicitly requested through --enable-virtfs.

Signed-off-by: Andreas Färber <afaerber@suse.de>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-08 11:15:18 -05:00
Andreas Färber aabfd88d5e configure: Reindent VirtFS check
Avoid tab-indention and fit in with the surrounding code.

Signed-off-by: Andreas Färber <afaerber@suse.de>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-08 11:15:18 -05:00
Stefan Weil 17904bcfb7 tci: Fix wrong macro name for debug code
Code which is compiled with CONFIG_TCG_DEBUG (set by configure option
--enable-debug-tcg) should not disable the assert macro by
defining NDEBUG.

With the wrong macro name CONFIG_TCG_DEBUG, all assertions in tci.c
were completely useless because NDEBUG was always defined.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-08 11:15:18 -05:00
Andreas Faerber 165ceac095 qemu-timer: Move include for __FreeBSD_version to header
sys/param.h is needed for __FreeBSD_version.
Pointed out by Juergen, thanks.

Signed-off-by: Andreas Faerber <andreas.faerber@web.de>
Cc: Juergen Lock <nox@jelal.kn-bremen.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-08 11:14:56 -05:00
Anthony Liguori 4f08129eeb Merge remote-tracking branch 'mst/tags/for_anthony' into staging
* mst/tags/for_anthony:
  rtl8139: fix regression in TxStatus/TxAddr read
2012-05-08 09:41:10 -05:00
Anthony Liguori 7c652c1eaf Merge remote-tracking branch 'kwolf/for-anthony' into staging
* kwolf/for-anthony:
  fdc: simplify media change handling
  qcow2: lock on prealloc
  block: make bdrv_create adopt coroutine
  qcow2: Limit COW to where it's needed
  sheepdog: switch to writethrough mode if cluster doesn't support flush
2012-05-08 09:38:41 -05:00
Anthony Liguori e45bca682c Merge remote-tracking branch 'bonzini/scsi-next' into staging
* bonzini/scsi-next:
  scsi: Add assertion for use-after-free errors
  scsi: remove useless debug messages
  scsi: set VALID bit to 0 in fixed format sense data
  scsi: do not require a minimum allocation length for REQUEST SENSE
  scsi: do not require a minimum allocation length for INQUIRY
  scsi: parse 16-byte tape CDBs
  scsi: do not report bogus overruns for commands in the 0x00-0x1F range
  scsi-disk: add dpofua property
  scsi: change "removable" field to host many features
  scsi: Specify the xfer direction for UNMAP and ATA_PASSTHROUGH commands
  scsi: fix WRITE SAME transfer length and direction
  scsi: fix refcounting for reads
  scsi: prevent data transfer overflow
  ISCSI: Add support for thin-provisioning via discard/UNMAP and bigger LUNs
2012-05-08 09:37:12 -05:00
Anthony Liguori 233ffa1653 Merge remote-tracking branch 'riku/linux-user-for-upstream' into staging
* riku/linux-user-for-upstream:
  linux-user: fix emulation of /proc/self/maps
  linux-user: Clean up interim solution for exit syscall
2012-05-08 09:37:00 -05:00
Anthony Liguori acde8376ef Merge remote-tracking branch 'spice/spice.v54' into staging
* spice/spice.v54:
  qxl: don't assert on guest create_guest_primary
  qxl: ioport_write: remove guest trigerrable abort
  qxl: qxl_add_memslot: remove guest trigerrable panics
  qxl: interface_notify_update: remove guest trigerrable abort
  qxl: cleanup s/__FUNCTION__/__func__/
  qxl: don't abort on guest trigerrable ring indices mismatch
  qxl: fix > 80 chars line
  qxl: replace panic with guest bug in qxl_track_command
  qxl: check for NULL return from qxl_phys2virt
  hw/qxl.c: qxl_phys2virt: replace panics with guest_bug
  spice_info: add mouse_mode
  spice: require spice-protocol >= 0.8.1
2012-05-08 09:36:37 -05:00
Anthony Liguori 4b5463bfdf Merge remote-tracking branch 'sweil/fixes' into staging
* sweil/fixes:
  qemu-timer: Fix limits for w32 mmtimer
  qom: Fix memory leak in function container_get
  hw/pc_sysfw: Fix memory leak
  qdev: Fix memory leak in function set_pci_devfn
  arm-semi: Rename SYS_XXX macros to TARGET_SYS_XXX (fixes compiler warning)
  target-mips: Remove unused inline function
2012-05-08 09:36:18 -05:00
Avi Kivity bdc62e62ea rtl8139: fix regression in TxStatus/TxAddr read
Commit afe0a59535 added byte reads for TxStatus/TxAddr, but
broke 32-bit reads; the mask generation

   (1 << (8 * size)) - 1

is unspecified in C for size >= sizeof(int), and in fact returns 0
on x86.

Fix by using a larger type.

Fixes (at least) Fedora 9 i386 with -machine kernel_irqchip=on.  I
didn't see it with the qemu APIC implementation; may be due to timing
or (more likely) a tester error.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-05-08 17:09:53 +03:00
Hervé Poussineau 21fcf36095 fdc: simplify media change handling
This also (partly) fixes IBM OS/2 Warp 4.0 floppy installation, where
not all floppies have the same format (2x80x18 for the first ones,
2x80x23 for the next ones).

Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-07 19:33:18 +02:00
Zhi Yong Wu 15552c4ad3 qcow2: lock on prealloc
preallocate() will be locked. This is required because
qcow2_alloc_cluster_link_l2() assumes that it runs under a lock that it
can drop while COW is being performed.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-07 19:33:18 +02:00
Zhi Yong Wu 5b7e1542cf block: make bdrv_create adopt coroutine
The current qemu.git introduces failure with preallocation and some
sizes:

qemu-img create -f qcow2 new.img 976563K -o preallocation=metadata
qemu-img: qemu-coroutine-lock.c:111: qemu_co_mutex_unlock: Assertion
`mutex->locked == 1' failed.

And lock needs to work in coroutine context. So to fix this issue, we
need to make bdrv_create adopt coroutine at first.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-07 19:33:18 +02:00
Kevin Wolf 54e6814360 qcow2: Limit COW to where it's needed
This fixes a regression introduced in commit 250196f1. The bug leads to
data corruption, found during an Autotest run with a Fedora 8 guest.

Consider a write request whose first part is covered by an already
allocated cluster, but additional clusters need to be newly allocated.
When counting the number of clusters to allocate, the qcow2 code would
decide to do COW for all remaining clusters of the write request, even
if some of them are already allocated.

If during this COW operation another write request is issued that touches
the same cluster, it will still refer to the old cluster. When the COW
completes, the first request will update the L2 table and the second
write request will be lost. Note that the requests need not overlap, it's
enough for them to touch the same cluster.

This patch ensures that only clusters that really require COW are
considered for allocation. In this case any other request writing to the
same cluster will be an allocating write and gets serialised.

Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Tested-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-07 19:33:18 +02:00