Commit Graph

84 Commits

Author SHA1 Message Date
Kevin Wolf 40ff6d7e8d Don't leak file descriptors
We're leaking file descriptors to child processes. Set FD_CLOEXEC on file
descriptors that don't need to be passed to children to stop this misbehaviour.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-12-03 11:45:50 -06:00
Kevin Wolf 12c09b8ce2 qemu-img: There is more than one host device driver
I haven't heard yet of anyone using qemu-img to copy an image to a real floppy,
but it's a valid use case.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-12-03 11:45:50 -06:00
Kevin Wolf 702ef63f3e qcow2: Fix some more qemu_malloc fallout
Oh joy...

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-12-03 11:45:50 -06:00
Kevin Wolf e1c7f0e3f9 qcow2: Store exact backing format length
Currently qcow2 unnecessarily rounds up the length of the backing format string
to the next multiple of 8. At the same time, the array in BlockDriverState can
only hold 15 characters, so in effect backing formats with 9 characters or more
don't work (e.g. host_device).

Save the real string length and things start to work for all valid image format
names.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-12-03 11:45:49 -06:00
Aurelien Jarno a167ba5085 Add support for GNU/kFreeBSD
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2009-11-29 18:00:41 +01:00
David Woodhouse c34d2451ed Fix 32-bit overflow in parallels image support
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-11-12 11:23:56 -06:00
Stefan Weil d191d12d5f qcow2: Allow qcow2 disk images with size zero
Images with disk size 0 may be used for
VM snapshots, but not to save normal block data.

It is possible to create such images using
qemu-img, but opening them later fails.

So even "qemu-img info image.qcow2" is not
possible for an image created with
"qemu-img create -f qcow2 image.qcow2 0".

This is fixed here.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-11-09 08:43:01 -06:00
Kevin Wolf 1e5b9d2fcc Remove aio_ctx from paio_* interface
The context parameter in paio_submit isn't used anyway, so there is no reason
why block drivers should need to remember it. This also avoids passing a Linux
AIO context to paio_submit (which doesn't do any harm as long as the parameter
is unused, but it is highly confusing).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-30 08:39:34 -05:00
Kevin Wolf 72ecf02d7d Revert "qcow2: Bring synchronous read/write back to life"
It was merely a workaround and the real fix is done now.
This reverts commit ef845c3bf4.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-27 12:28:59 -05:00
Kevin Wolf 8febfa2684 Add qemu_aio_process_queue()
We'll leave some AIO completions unhandled when we can't call the callback.
qemu_aio_process_queue() is used later to run any callbacks that are left and
can be run then.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-27 12:28:59 -05:00
Kevin Wolf d2e4634504 raw/linux-aio: Also initialize POSIX AIO
When using Linux AIO raw still falls back to POSIX AIO sometimes, so we should
initialize it.

Not initializing it happens to work if POSIX AIO is used by another drive, or
if the format is not specified (probing the format uses POSIX AIO) or by pure
luck (e.g. it doesn't seem to happen any more with qcow2 since we have re-added
synchronous qcow2 functions).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-27 12:28:35 -05:00
Kevin Wolf c5baaa489f qcow2: Fix grow_refcount_table error handling
In case of failure, we haven't increased the refcount for the newly allocated
cluster yet. Therefore we must not free the cluster or its refcount will become
negative (and endless recursion is possible).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-27 12:28:35 -05:00
Kevin Wolf ef845c3bf4 qcow2: Bring synchronous read/write back to life
When the synchronous read and write functions were dropped, they were replaced
by generic emulation functions. Unfortunately, these emulation functions don't
provide the same semantics as the original functions did.

The original bdrv_read would mean that we read some data synchronously and that
we won't be interrupted during this read. The latter assumption is no longer
true with the emulation function which needs to use qemu_aio_poll and therefore
allows the callback of any other concurrent AIO request to be run during the
read. Which in turn means that (meta)data read earlier could have changed and
be invalid now. qcow2 is not prepared to work in this way and it's just scary
how many places there are where other requests could run.

I'm not sure yet where exactly it breaks, but you'll see breakage with virtio
on qcow2 with a backing file. Providing synchronous functions again fixes the
problem for me.

Patchworks-ID: 35437
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-15 09:32:04 -05:00
Kevin Wolf 0b4ce02eb2 block/raw: Add create_options for host_device
Today host_devices have a create function, so they also need a create_options
field to prevent qemu-img from complaining.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-05 14:20:34 -05:00
Kevin Wolf 80ee15a6b2 qcow2: Increase maximum cluster size to 2 MB
This patch increases the maximum qcow2 cluster size to 2 MB. Starting with 128k
clusters, L2 tables span 2 GB or more of virtual disk space, causing 32 bit
truncation and wraparound of signed integers. Therefore some variables need to
use a larger data type.

While being at reviewing data types, change some integers that are used for
array indices to unsigned. In some places they were checked against some upper
limit but not for negative values. This could avoid potential segfaults with
corrupted qcow2 images.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-05 09:32:52 -05:00
Stefan Weil ee682d27a5 Check availability of uuid header / library
If available, the Universally Unique Identifier library
is used by the vdi block driver.

Other parts of QEMU (vl.c) could also use it.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2009-10-04 13:24:45 +02:00
Anthony Liguori c227f0995e Revert "Get rid of _t suffix"
In the very least, a change like this requires discussion on the list.

The naming convention is goofy and it causes a massive merge problem.  Something
like this _must_ be presented on the list first so people can provide input
and cope with it.

This reverts commit 99a0949b72.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-01 16:12:16 -05:00
malc 99a0949b72 Get rid of _t suffix
Some not so obvious bits, slirp and Xen were left alone for the time
being.

Signed-off-by: malc <av1474@comtv.ru>
2009-10-01 22:45:02 +04:00
Michael S. Tsirkin 6ab00cee70 vvfat: fix coding style nit
Put space between = and & when taking a pointer,
to avoid confusion with old-style "&=".

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-09-30 18:45:50 +00:00
Blue Swirl a2a45a26c9 Fix signedness warnings on OpenSolaris
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-09-12 12:36:09 +00:00
Blue Swirl 72cf2d4f0e Fix sys-queue.h conflict for good
Problem: Our file sys-queue.h is a copy of the BSD file, but there are
some additions and it's not entirely compatible. Because of that, there have
been conflicts with system headers on BSD systems. Some hacks have been
introduced in the commits 15cc923584,
f40d753718,
96555a96d7 and
3990d09adf but the fixes were fragile.

Solution: Avoid the conflict entirely by renaming the functions and the
file. Revert the previous hacks.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-09-12 07:36:22 +00:00
Christoph Hellwig b2e12bc6e3 block: add aio_flush operation
Instead stalling the VCPU while serving a cache flush try to do it
asynchronously.  Use our good old helper thread pool to issue an
asynchronous fdatasync for raw-posix.  Note that while Linux AIO
implements a fdatasync operation it is not useful for us because
it isn't actually implement in asynchronous fashion.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-11 10:19:46 -05:00
Christoph Hellwig 6f1953c4c1 block: use fdatasync instead of fsync if possible
If we are flushing the caches for our image files we only care about the
data (including the metadata required for accessing it) but not things
like timestamp updates.  So try to use fdatasync instead of fsync to
implement the flush operations.

Unfortunately many operating systems still do not support fdatasync,
so we add a qemu_fdatasync wrapper that uses fdatasync if available
as per the _POSIX_SYNCHRONIZED_IO feature macro or fsync otherwise.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-11 10:19:46 -05:00
Kevin Wolf f214978a42 qcow2: Order concurrent AIO requests on the same unallocated cluster
When two AIO requests write to the same cluster, and this cluster is
unallocated, currently both requests allocate a new cluster and the second one
merges the first one when it is completed. This means an cluster allocation, a
read and a cluster deallocation which cause some overhead. If we simply let the
second request wait until the first one is done, we improve overall performance
with AIO requests (specifially, qcow2/virtio combinations).

This patch maintains a list of in-flight requests that have allocated new
clusters. A second request touching the same cluster is limited so that it
either doesn't touch the allocation of the first request (so it can have a
non-overlapping allocation) or it waits for the first request to complete.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-09 17:31:26 -05:00
Kevin Wolf ea80b906f4 qcow2: Fix metadata preallocation
The wrong version of the preallocation patch has been applied, so this is the
remaining diff.

We can't use truncate to grow the image file to the right size because we don't
know if metadata has been written after the last data cluster. In this case
truncate would shrink the file and destroy its metadata. Write a zero sector at
the end of the virtual disk instead to ensure that the file is big enough.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-09 17:31:26 -05:00
Stefan Weil cc2040f8c2 Fix spelling in comment.
The company which made Virtual PC was Connectix.
They use the magic string "conectix" in their disk images.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-09 14:57:20 -05:00
Blue Swirl 2000cbc50d Fix gcc 3 warning about uninitialized variable
If nb_sectors is 0, cluster_offset will not be initialized.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-08-29 16:37:26 +03:00
Stefan Weil e44bd6fc15 Don't compile aio code if CONFIG_LINUX_AIO is undefined
This patch fixes linker errors when building QEMU without Linux AIO support.

It is based on suggestions from malc and Kevin Wolf.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-28 08:57:49 -05:00
Christoph Hellwig 5c6c3a6c54 raw-posix: add Linux native AIO support
Now that do have a nicer interface to work against we can add Linux native
AIO support.  It's an extremly thing layer just setting up an iocb for
the io_submit system call in the submission path, and registering an
eventfd with the qemu poll handler to do complete the iocbs directly
from there.

This started out based on Anthony's earlier AIO patch, but after
estimated 42,000 rewrites and just as many build system changes
there's not much left of it.

To enable native kernel aio use the aio=native sub-command on the
drive command line.  I have also added an option to qemu-io to
test the aio support without needing a guest.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-27 20:30:22 -05:00
Christoph Hellwig 9ef91a6771 raw-posix: refactor AIO support
Currently the raw-posix.c code contains a lot of knowledge about the
asynchronous I/O scheme that is mostly implemented in posix-aio-compat.c.
All this code does not really belong here and is getting a bit in the
way of implementing native AIO on Linux.

So instead move all the guts of the AIO implementation into
posix-aio-compat.c (which might need a better name, btw).

There's now a very small interface between the AIO providers and raw-posix.c:

 - an init routine is called from raw_open_common to return an AIO context
   for this drive.  An AIO implementation may either re-use one context
   for all drives, or use a different one for each as the Linux native
   AIO support will do.
 - an submit routine is called from the aio_reav/writev methods to submit
   an AIO request

There are no indirect calls involved in this interface as we need to
decide which one to call manually.  We will only call the Linux AIO native
init function if we were requested to by vl.c, and we will only call
the native submit function if we are asked to and the request is properly
aligned.  That's also the reason why the alignment check actually does
the inverse move and now goes into raw-posix.c.

The old posix-aio-compat.h headers is removed now that most of it's
content is private to posix-aio-compat.c, and instead we add a new
block/raw-posix-aio.h headers is created containing only the tiny interface
between raw-posix.c and the AIO implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-27 20:30:22 -05:00
Kevin Wolf a35e1c177d qcow2: Metadata preallocation
This introduces a qemu-img create option for qcow2 which allows the metadata to
be preallocated, i.e. clusters are reserved in the refcount table and L1/L2
tables, but no data is written to them. Metadata is quite small, so this
happens in almost no time.

Especially with qcow2 on virtio this helps to gain a bit of performance during
the initial writes. However, as soon as create a snapshot, we're back to the
normal slow speed, obviously. So this isn't the real fix, but kind of a cheat
while we're still having trouble with qcow2 on virtio.

Note that the option is disabled by default and needs to be specified
explicitly using qemu-img create -f qcow2 -o preallocation=metadata.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-27 20:30:20 -05:00
Stefan Weil 6eea90eba4 block/vdi.c: Fix several bugs
* The code for option '-static' was wrong, so image creation
  always created static images.

* Static images created with qemu-img did not set header entry
  blocks_allocated.

* The size of the block map must be rounded to the next multiple
  of SECTOR_SIZE, otherwise the block map is only read partially
  for block map sizes which are not a multiple of SECTOR_SIZE.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-27 19:33:15 -05:00
Nathan Froyd 5ec4d682d2 eliminate errors about unused results in block/vpc.c
These errors come up when compiling with gcc-4.3.3 and some older headers:

/scratch/froydnj/qemu.git/block/vpc.c: In function 'vpc_create':
/scratch/froydnj/qemu.git/block/vpc.c:514: error: value computed is not used
/scratch/froydnj/qemu.git/block/vpc.c:516: error: value computed is not used
/scratch/froydnj/qemu.git/block/vpc.c:517: error: value computed is not used
/scratch/froydnj/qemu.git/block/vpc.c:566: error: value computed is not used

Use memcpy to copy the strings instead of strncpy.

Signed-off-by: Nathan Froyd <froydnj@codesourcery.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-24 08:46:48 -05:00
Christoph Hellwig 4dd75c702c make pthreads mandatory
As requested by Anthony make pthreads mandatory.  This means we will always
have AIO available on posix hosts, and it will also allow enabling the I/O
thread unconditionally once it's ready.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-24 08:46:47 -05:00
Blue Swirl 1786dc15ee Use pstrcpy to avoid OpenBSD linker warnings
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-08-15 11:33:58 +00:00
Stefan Weil 9aebd98aab Add new block driver for the VDI format (only aio supported)
This is a new block driver written from scratch
to support the VDI format in QEMU.

VDI is the native format used by Innotek / SUN VirtualBox.

Latest changes:

* stripped down version
  (code for synchronous operations and experimental code removed)

* don't open VDI snapshot images (with uuid_link or uuid_parent)

* modified vdi_aio_cancel

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Message-Id:
2009-08-10 13:05:30 -05:00
Blue Swirl df3cee1a3a Fix Sparse warning about "expression using sizeof on a function"
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-08-01 10:13:44 +00:00
Juan Quintela 71e72a19ba rename HOST_BSD to CONFIG_BSD
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-27 14:09:20 -05:00
Kevin Wolf b171271a50 vmdk: Fix backing file handling
Instead of storing the backing file in its own BlockDriverState, VMDK uses the
BlockDriverState of the raw image file it opened. This is wrong and breaks
functions that access the backing file or protocols. This fix replaces all
occurrences of s->hd->backing_* with bs->backing_*.

This fixes qemu-iotests failure in 020 (Commit changes to backing file).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-22 10:58:47 -05:00
Blue Swirl 0bf9e31af1 Fix most warnings (errors with -Werror) when debugging is enabled
I used the following command to enable debugging:
perl -p -i -e 's/^\/\/#define DEBUG/#define DEBUG/g' * */* */*/*

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-07-20 17:19:25 +00:00
Stefan Weil 1e37d05904 raw-posix: Handle errors in raw_create
In qemu-iotests, some large images are created using qemu-img.

Without checks for errors, qemu-img will just create an
empty image, and later read / write tests will fail.

With the patch, failures during image creation are detected
and reported.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-16 17:28:49 -05:00
Christoph Hellwig 45566e9c99 replace bdrv_{get, put}_buffer with bdrv_{load, save}_vmstate
The VM state offset is a concept internal to the image format.  Replace
the old bdrv_{get,put}_buffer method that require an index into the
image file that is constructed from the VM state offset and an offset
into the vmstate with the bdrv_{load,save}_vmstate that just take an
offset into the VM state.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-16 08:28:13 -05:00
Kevin Wolf 3f6a3ee51e qcow2: Fix L1 table memory allocation
Contrary to what one could expect, the size of L1 tables is not cluster
aligned. So as we're writing whole sectors now instead of single entries,
we need to ensure that the L1 table in memory is large enough; otherwise
write would access memory after the end of the L1 table.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-10 13:44:29 -05:00
Kevin Wolf c53ffce91b qcow1: Fix qcow_aio_writev
Pass is_write = 1 to qcow_aio_setup when writing.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-10 13:44:29 -05:00
G 3 1c27a8b35e Substitute O_DSYNC with O_SYNC or O_FSYNC when needed.
Signed-off-by: John Arbuckle <programmingkidx@gmail.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-09 16:58:07 -05:00
Nolan c76f4952bb Allow adjustment of http block device's readahead size, via a new
":readahead=###:" suffix.

Signed-off-by: Nolan Leake <nolan <at> sigbus.net>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-09 16:06:40 -05:00
Anthony Liguori 1cec71e359 Revert "support colon in filenames"
This reverts commit 707c0dbc97.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-09 16:06:38 -05:00
Kevin Wolf 0aa217e461 qcow2: Make cache=writethrough default
The performance of qcow2 has improved meanwhile, so we don't need to
special-case it any more. Switch the default to write-through caching
like all other block drivers.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-09 16:06:37 -05:00
Kevin Wolf 3b88e52b41 qcow2: Cache refcount blocks during snapshot creation
The really time consuming part of snapshotting is to adjust the reference count
of all clusters. Currently after each adjusted cluster the refcount block is
written to disk.

Don't write each single byte immediately to disk but cache all writes to the
refcount block and write them out once we're done with the block.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-29 14:18:07 -05:00
Kevin Wolf 22afa7b5b6 block-raw: Allow pread beyond the end of growable images
When using O_DIRECT, qcow2 snapshots didn't work any more for me. In the
process of creating the snapshot, qcow2 tries to pwrite some new information
(e.g. new L1 table) which will often end up being after the old end of the
image file. Now pwrite tries to align things and reads the old contents of the
file, read returns 0 because there is nothing to read after the end of file and
pwrite is stuck in an endless loop.

This patch allows to pread beyond the end of an image file. Whenever the
given offset is after the end of the image file, the read succeeds and fills
the buffer with zeros.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-29 14:18:07 -05:00