Commit Graph

121 Commits

Author SHA1 Message Date
Paolo Bonzini dc534f8fc0 block: document job API
I am not sure that these are really proper GtkDoc, but they follow
the existing documentation in block_int.h.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:40 +02:00
Paolo Bonzini 3e914655f2 block: fix streaming/closing race
Streaming can issue I/O while qcow2_close is running.  This causes the
L2 caches to become very confused or, alternatively, could cause a
segfault when the streaming coroutine is reentered after closing its
block device.  The fix is to cancel streaming jobs when closing their
underlying device.

The cancellation must be synchronous, on the other hand qemu_aio_wait
will not restart a coroutine that is sleeping in co_sleep.  So add
a flag saying whether streaming has in-flight I/O.  If the busy flag
is false, the coroutine is quiescent and, when cancelled, will not
issue any new I/O.

This protects streaming against closing, but not against deleting.
We have a reference count protecting us against concurrent deletion,
but I still added an assertion to ensure nothing bad happens.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:40 +02:00
Paolo Bonzini 85e8dab1ef aio: move BlockDriverAIOCB to qemu-aio.h
And remove several block_int.h inclusions that should not be there.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:39 +02:00
Jeff Cody 8802d1fdd4 qapi: Introduce blockdev-group-snapshot-sync command
This is a QAPI/QMP only command to take a snapshot of a group of
devices. This is similar to the blockdev-snapshot-sync command, except
blockdev-group-snapshot-sync accepts a list devices, filenames, and
formats.

It is attempted to keep the snapshot of the group atomic; if the
creation or open of any of the new snapshots fails, then all of
the new snapshots are abandoned, and the name of the snapshot image
that failed is returned.  The failure case should not interrupt
any operations.

Rather than use bdrv_close() along with a subsequent bdrv_open() to
perform the pivot, the original image is never closed and the new
image is placed 'in front' of the original image via manipulation
of the BlockDriverState fields.  Thus, once the new snapshot image
has been successfully created, there are no more failure points
before pivoting to the new snapshot.

This allows the group of disks to remain consistent with each other,
even across snapshot failures.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Acked-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-29 15:48:33 +01:00
Paolo Bonzini b6a127a156 block: drop aio_multiwrite in BlockDriver
These were never used.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-29 12:48:47 +01:00
Paolo Bonzini 56116a1469 block: remove unused fields in BlockDriverState
sync_aiocb is unused since commit ce1a14d (Dynamically allocate AIO
Completion Blocks., 2006-08-07).

private is unused since commit 56a1493 (drive cleanup fixes., 2009-09-25).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-29 12:48:47 +01:00
Luiz Capitulino f36f394952 block: bdrv_eject(): Make eject_flag a real bool
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
2012-02-22 17:23:05 -02:00
Stefan Hajnoczi f08f2ddae0 block: add .bdrv_co_write_zeroes() interface
The ability to zero regions of an image file is a useful primitive for
higher-level features such as image streaming or zero write detection.

Image formats may support an optimized metadata representation instead
of writing zeroes into the image file.  This allows zero writes to be
potentially faster than regular write operations and also preserve
sparseness of the image file.

The .bdrv_co_write_zeroes() interface should be implemented by block
drivers that wish to provide efficient zeroing.

Note that this operation is different from the discard operation, which
may leave the contents of the region indeterminate.  That means
discarded blocks are not guaranteed to contain zeroes and may contain
junk data instead.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09 16:17:50 +01:00
Marcelo Tosatti c8c3080f4a block: add support for partial streaming
Add support for streaming data from an intermediate section of the
image chain (see patch and documentation for details).

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 14:49:18 +01:00
Stefan Hajnoczi 4f1043b4ff block: add image streaming block job
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 11:45:26 +01:00
Stefan Hajnoczi eeec61f291 block: add BlockJob interface for long-running operations
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 11:45:26 +01:00
Stefan Hajnoczi 470c05047a block: make copy-on-read a per-request flag
Previously copy-on-read could only be enabled for all requests to a
block device.  This means requests coming from the guest as well as
QEMU's internal requests would perform copy-on-read when enabled.

For image streaming we want to support finer-grained behavior than just
populating the image file from its backing image.  Image streaming
supports partial streaming where a common backing image is preserved.
In this case guest requests should not perform copy-on-read because they
would indiscriminately copy data which should be left in a backing image
from the backing chain.

Introduce a per-request flag for copy-on-read so that a block device can
process both regular and copy-on-read requests.  Overlapping reads and
writes still need to be serialized for correctness when copy-on-read is
happening, so add an in-flight reference count to track this.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 11:45:26 +01:00
Stefan Hajnoczi 53fec9d3fd block: add interface to toggle copy-on-read
The bdrv_enable_copy_on_read()/bdrv_disable_copy_on_read() functions can
be used to programmatically enable or disable copy-on-read for a block
device.  Later patches add the actual copy-on-read logic.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:38 +01:00
Stefan Hajnoczi dbffbdcfff block: add request tracking
The block layer does not know about pending requests.  This information
is necessary for copy-on-read since overlapping requests must be
serialized to prevent races that corrupt the image.

The BlockDriverState gets a new tracked_request list field which
contains all pending requests.  Each request is a BdrvTrackedRequest
record with sector_num, nb_sectors, and is_write fields.

Note that request tracking is always enabled but hopefully this extra
work is so small that it doesn't justify adding an enable/disable flag.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:38 +01:00
Stefan Hajnoczi 6aebab140d block: drop .bdrv_is_allocated() interface
Now that all block drivers have been converted to
.bdrv_co_is_allocated() we can drop .bdrv_is_allocated().

Note that the public bdrv_is_allocated() interface is still available
but is in fact a synchronous wrapper around .bdrv_co_is_allocated().

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:37 +01:00
Stefan Hajnoczi 376ae3f1cb block: add .bdrv_co_is_allocated()
This patch adds the .bdrv_co_is_allocated() interface which is identical
to .bdrv_is_allocated() but runs in coroutine context.  Running in
coroutine context implies that other coroutines might be performing I/O
at the same time.   Therefore it must be safe to run while the following
BlockDriver functions are in-flight:

    .bdrv_co_readv()
    .bdrv_co_writev()
    .bdrv_co_flush()
    .bdrv_co_is_allocated()

The new .bdrv_co_is_allocated() interface is useful because it can be
used when a VM is running, whereas .bdrv_is_allocated() is a synchronous
interface that does not cope with parallel requests.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:36 +01:00
Zhi Yong Wu 98f90dba5e block: add I/O throttling algorithm
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:35 +01:00
Zhi Yong Wu 0563e19151 block: add the blockio limits command line support
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:35 +01:00
Anthony Liguori 0f15423c32 block: allow migration to work with image files (v3)
Image files have two types of data: immutable data that describes things like
image size, backing files, etc. and mutable data that includes offset and
reference count tables.

Today, image formats aggressively cache mutable data to improve performance.  In
some cases, this happens before a guest even starts.  When dealing with live
migration, since a file is open on two machines, the caching of meta data can
lead to data corruption.

This patch addresses this by introducing a mechanism to invalidate any cached
mutable data a block driver may have which is then used by the live migration
code.

NB, this still requires coherent shared storage.  Addressing migration without
coherent shared storage (i.e. NFS) requires additional work.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-21 14:58:48 -06:00
Kevin Wolf eb489bb1ec block: Introduce bdrv_co_flush_to_os
qcow2 has a writeback metadata cache, so flushing a qcow2 image actually
consists of writing back that cache to the protocol and only then flushes the
protocol in order to get everything stable on disk.

This introduces a separate bdrv_co_flush_to_os to reflect the split.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-11-11 14:02:59 +01:00
Kevin Wolf c68b89acd6 block: Rename bdrv_co_flush to bdrv_co_flush_to_disk
There are two different types of flush that you can do: Flushing one level up
to the OS (i.e. writing data to the host page cache) or flushing it all the way
down to the disk. The existing functions flush to the disk, reflect this in the
function name.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-11-11 14:02:59 +01:00
Luiz Capitulino b202381800 qapi: Convert query-block
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2011-10-27 11:48:47 -02:00
Luiz Capitulino d6bf279e7a block: iostatus: Drop BDRV_IOS_INVAL
A future commit will convert bdrv_info() to the QAPI and it won't
provide IOS_INVAL.

Luckily all we have to do is to add a new 'iostatus_enabled'
member to BlockDriverState and use it instead.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2011-10-27 11:48:47 -02:00
Paolo Bonzini 6db39ae2e2 block: change discard to co_discard
Since coroutine operation is now mandatory, convert both bdrv_discard
implementations to coroutines.  For qcow2, this means taking the lock
around the operation.  raw-posix remains synchronous.

The bdrv_discard callback is then unused and can be eliminated.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-21 17:34:14 +02:00
Paolo Bonzini 8b94ff8573 block: change flush to co_flush
Since coroutine operation is now mandatory, convert all bdrv_flush
implementations to coroutines.  For qcow2, this means taking the lock.
Other implementations are simpler and just forward bdrv_flush to the
underlying protocol, so they can avoid the lock.

The bdrv_flush callback is then unused and can be eliminated.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-21 17:34:14 +02:00
Paolo Bonzini 4265d620c5 block: add bdrv_co_discard and bdrv_aio_discard support
This similarly adds support for coroutine and asynchronous discard.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-21 17:34:13 +02:00
Paolo Bonzini 07f0761574 block: unify flush implementations
Add coroutine support for flush and apply the same emulation that
we already do for read/write.  bdrv_aio_flush is simplified to always
go through a coroutine.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-21 17:34:13 +02:00
Luiz Capitulino 28a7282a5d block: Keep track of devices' I/O status
This commit adds support to the BlockDriverState type to keep track
of devices' I/O status.

There are three possible status: BDRV_IOS_OK (no error), BDRV_IOS_ENOSPC
(no space error) and BDRV_IOS_FAILED (any other error). The distinction
between no space and other errors is important because a management
application may want to watch for no space in order to extend the
space assigned to the VM and put it to run again.

Qemu devices supporting the I/O status feature have to enable it
explicitly by calling bdrv_iostatus_enable() _and_ have to be
configured to stop the VM on errors (ie. werror=stop|enospc or
rerror=stop).

In case of multiple errors being triggered in sequence only the first
one is stored. The I/O status is always reset to BDRV_IOS_OK when the
'cont' command is issued.

Next commits will add support to some devices and extend the
query-block/info block commands to return the I/O status information.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-11 09:41:47 +02:00
Markus Armbruster d1a0739de5 block: Move BlockConf & friends from block_int.h to block.h
It's convenience stuff for block device models, so block.h isn't the
ideal home either, but better than block_int.h.

Permits moving some #include "block_int.h" from device model .h into
.c.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:21 +02:00
Markus Armbruster 9e6a4c9177 block: Drop BlockDriverState member removable
It's a confused mess (see previous commit).  No users remain.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:21 +02:00
Markus Armbruster 025e849a50 block: Rename bdrv_set_locked() to bdrv_lock_medium()
While there, make the locked parameter bool.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:20 +02:00
Markus Armbruster f107639a6f block: Drop medium lock tracking, ask device models instead
Requires new BlockDevOps member is_medium_locked().  Implement for IDE
and SCSI CD-ROMs.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:20 +02:00
Markus Armbruster 22cf56c4d8 block: Drop tray status tracking, no longer used
Commit 4be9762a is now completely redone.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:20 +02:00
Markus Armbruster ba5b7ad449 block: Declare qemu_blockalign() in block.h, not block_int.h
Device models should be able to use it without an unclean include of
block_int.h.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-06 11:24:07 +02:00
Markus Armbruster 8e49ca4624 block: Leave tracking media change to device models
hw/fdc.c is the only one that cares.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-06 11:24:06 +02:00
Markus Armbruster 145feb176f block: Split change_cb() into change_media_cb(), resize_cb()
Multiplexing callbacks complicates matters needlessly.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-06 11:23:51 +02:00
Markus Armbruster 0e49de5232 block: Generalize change_cb() to BlockDevOps
So we can more easily add device model callbacks.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-06 11:23:51 +02:00
Markus Armbruster fa879d62eb block: Attach non-qdev devices as well
For now, this just protects against programming errors like having the
same drive back multiple non-qdev devices, or untimely bdrv_delete().
Later commits will add other interesting uses.

While there, rename BlockDriverState member peer to dev, bdrv_attach()
to bdrv_attach_dev(), bdrv_detach() to bdrv_detach_dev(), and
bdrv_get_attached() to bdrv_get_attached_dev().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-06 11:23:51 +02:00
Christoph Hellwig c488c7f649 block: latency accounting
Account the total latency for read/write/flush requests.  This allows
management tools to average it based on a snapshot of the nr ops
counters and allow checking for SLAs or provide statistics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-26 18:18:38 +02:00
Christoph Hellwig a597e79ce1 block: explicit I/O accounting
Decouple the I/O accounting from bdrv_aio_readv/writev/flush and
make the hardware models call directly into the accounting helpers.

This means:
 - we do not count internal requests from image formats in addition
   to guest originating I/O
 - we do not double count I/O ops if the device model handles it
   chunk wise
 - we only account I/O once it actuall is done
 - can extent I/O accounting to synchronous or coroutine I/O easily
 - implement I/O latency tracking easily (see the next patch)

I've conveted the existing device model callers to the new model,
device models that are using synchronous I/O and weren't accounted
before haven't been updated yet.  Also scsi hasn't been converted
to the end-to-end accounting as I want to defer that after the pending
scsi layer overhaul.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-25 18:18:42 +02:00
Christoph Hellwig e8045d6726 block: include flush requests in info blockstats
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 17:41:14 +02:00
Kevin Wolf da1fa91d6c block: Add bdrv_co_readv/writev
Add new block driver callbacks bdrv_co_readv/writev, which work on a
QEMUIOVector like bdrv_aio_*, but don't need a callback. The function may only
be called inside a coroutine, so a block driver implementing this interface can
yield instead of blocking during I/O.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-02 15:53:40 +02:00
Markus Armbruster 822e1cd17e block: Make BlockDriver method bdrv_eject() return void
Callees always return 0, except for FreeBSD's cdrom_eject(), which
returns -ENOTSUP when the device is in a terminally wedged state.

The only caller is bdrv_eject(), and it maps -ENOTSUP to 0 since
commit 4be9762a.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-01 12:10:28 +02:00
Markus Armbruster 7bf37feddc block: Make BlockDriver method bdrv_set_locked() return void
The only caller is bdrv_set_locked(), and it ignores the value.

Callees always return 0, except for FreeBSD's cdrom_set_locked(),
which returns -ENOTSUP when the device is in a terminally wedged
state.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-01 12:10:28 +02:00
Fam Zheng 4a1d5e1fde block: add bdrv_get_allocated_file_size() operation
qemu-img.c wants to count allocated file size of image. Previously it
counts a single bs->file by 'stat' or Window API. As VMDK introduces
multiple file support, the operation becomes format specific with
platform specific meanwhile.

The functions are moved to block/raw-{posix,win32}.c and qemu-img.c calls
bdrv_get_allocated_file_size to count the bs. And also added VMDK code
to count his own extents.

Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-07-19 15:39:08 +02:00
Fam Zheng f66fd6c383 VMDK: create different subformats
Add create option 'format', with enums:
    monolithicSparse
    monolithicFlat
    twoGbMaxExtentSparse
    twoGbMaxExtentFlat
Each creates a subformat image file. The default is monolithicSparse.

Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-07-19 15:39:07 +02:00
Devin Nakamura 39aa9a12cc Replaced tabs with spaces in block.h and block_int.h
Signed-off-by: Devin Nakamura <devin122@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-06-15 14:36:15 +02:00
Markus Armbruster 8d278467ff block: Remove type hint, it's guest matter, doesn't belong here
No users of bdrv_get_type_hint() left.  bdrv_set_type_hint() can make
the media removable by side effect.  Make that explicit.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-05-19 10:26:23 +02:00
Marcelo Tosatti db593f2565 Add flag to indicate external users to block device
Certain operations such as drive_del or resize cannot be performed
while external users (eg. block migration) reference the block device.

Add a flag to indicate that.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-02-07 12:51:19 +01:00
Christoph Hellwig db97ee6a97 block: tell drivers about an image resize
Extend the change_cb callback with a reason argument, and use it
to tell drivers about size changes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-31 10:03:00 +01:00