The builtin NBD server uses its own BlockBackend now instead of reusing
the monitor/guest device one.
This means that it has its own writethrough setting now. The builtin
NBD server always uses writeback caching now regardless of whether the
guest device has WCE enabled. qemu-nbd respects the cache mode given on
the command line.
We still need to keep a reference to the monitor BB because we put an
eject notifier on it, but we don't use it for any I/O.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
The following commit
commit 3ff2f67a7c
Author: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Date: Mon Jul 18 22:39:52 2016 +0300
block: ignore flush requests when storage is clean
has introduced a regression.
There is a problem that it is still possible for 2 requests to execute
in non sequential fashion and sometimes this results in a deadlock
when bdrv_drain_one/all are called for BDS with such stalled requests.
1. Current flushed_gen and flush_started_gen is 1.
2. Request 1 enters bdrv_co_flush to with write_gen 1 (i.e. the same
as flushed_gen). It gets past flushed_gen != flush_started_gen and
sets flush_started_gen to 1 (again, the same it was before).
3. Request 1 yields somewhere before exiting bdrv_co_flush
4. Request 2 enters bdrv_co_flush with write_gen 2. It gets past
flushed_gen != flush_started_gen and sets flush_started_gen to 2.
5. Request 2 runs to completion and sets flushed_gen to 2
6. Request 1 is resumed, runs to completion and sets flushed_gen to 1.
However flush_started_gen is now 2.
From here on out flushed_gen is always != to flush_started_gen and all
further requests will wait on flush_queue. This change replaces
flush_started_gen with an explicitly tracked active flush request.
Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Message-id: 1471457214-3994-2-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Dell Equallogic iSCSI SANs have a very unusual advertised geometry:
$ iscsi-inq -e 1 -c $((0xb0)) iscsi://XXX/0
wsnz:0
maximum compare and write length:1
optimal transfer length granularity:0
maximum transfer length:0
optimal transfer length:0
maximum prefetch xdread xdwrite transfer length:0
maximum unmap lba count:30720
maximum unmap block descriptor count:2
optimal unmap granularity:30720
ugavalid:1
unmap granularity alignment:0
maximum write same length:30720
which says that both the maximum and the optimal discard size
is 15M. It is not immediately apparent if the device allows
discard requests not aligned to the optimal size, nor if it
allows discards at a finer granularity than the optimal size.
I tried to find details in the SCSI Commands Reference Manual
Rev. A on what valid values of maximum and optimal sizes are
permitted, but while that document mentions a "Block Limits
VPD Page", I couldn't actually find documentation of that page
or what values it would have, or if a SCSI device has an
advertisement of its minimal unmap granularity. So it is not
obvious to me whether the Dell Equallogic device is compliance
with the SCSI specification.
Fortunately, it is easy enough to support non-power-of-2 sizing,
even if it means we are less efficient than truly possible when
targetting that device (for example, it means that we refuse to
unmap anything that is not a multiple of 15M and aligned to a
15M boundary, even if the device truly does support a smaller
granularity where unmapping actually works).
Reported-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1469129688-22848-5-git-send-email-eblake@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rather than asserting that nbdflags is within range, just give
it the correct type to begin with :) nbdflags corresponds to
the per-export portion of NBD Protocol "transmission flags", which
is 16 bits in response to NBD_OPT_EXPORT_NAME and NBD_OPT_GO.
Furthermore, upstream NBD has never passed the global flags to
the kernel via ioctl(NBD_SET_FLAGS) (the ioctl was first
introduced in NBD 2.9.22; then a latent bug in NBD 3.1 actually
tried to OR the global flags with the transmission flags, with
the disaster that the addition of NBD_FLAG_NO_ZEROES in 3.9
caused all earlier NBD 3.x clients to treat every export as
read-only; NBD 3.10 and later intentionally clip things to 16
bits to pass only transmission flags). Qemu should follow suit,
since the current two global flags (NBD_FLAG_FIXED_NEWSTYLE
and NBD_FLAG_NO_ZEROES) have no impact on the kernel's behavior
during transmission.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1469129688-22848-3-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Correct comments of field notify_me
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Message-id: 1468575858-22975-1-git-send-email-caoj.fnst@cn.fujitsu.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
The NBD protocol doesn't have any notion of sectors, so it is
a fairly easy conversion to use byte-based read and write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468624988-423-19-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Now that all drivers have a byte-based .bdrv_co_pdiscard(), we
no longer need to worry about the sector-based version. We can
also relax our minimum alignment to 1 for drivers that support it.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-18-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
There's enough drivers with a sector-based callback that it will
be easier to switch one at a time. This patch adds a byte-based
callback, and then after all drivers are swapped, we'll drop the
sector-based callback.
[checkpatch doesn't like the space after coroutine_fn in
block_int.h, but it's consistent with the rest of the file]
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-10-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Another step towards byte-based interfaces everywhere. Replace
the sector-based driver callback .bdrv_aio_discard() with a new
byte-based .bdrv_aio_pdiscard(). Only raw-posix and RBD drivers
are affected, so it was not worth splitting into multiple patches.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-9-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Another step towards byte-based interfaces everywhere. Replace
the sector-based bdrv_aio_discard() with a new byte-based
bdrv_aio_pdiscard(), which silently ignores any unaligned head
or tail. Driver callbacks will be converted in followup patches.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1468624988-423-5-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Another step towards byte-based interfaces everywhere. Replace
the sector-based bdrv_discard() with a new byte-based
bdrv_pdiscard(), which silently ignores any unaligned head
or tail.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-3-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Another step towards byte-based interfaces everywhere. Replace
the sector-based bdrv_co_discard() with a new byte-based
bdrv_co_pdiscard(), which silently ignores any unaligned head
or tail. Driver callbacks will be converted in followup patches.
By calculating the alignment outside of the loop, and clamping
the max discard to an aligned value, we can simplify the actions
done within the loop.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-2-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Now that NBD relies on the block layer to fragment things, we no
longer need to track an offset argument for which fragment of
a request we are actually servicing.
While at it, use true and false instead of 0 and 1 for a bool
parameter.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468607524-19021-6-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Underlying HBitmap operates even with uint64_t. Thus this change is safe.
This would be useful f.e. to mark entire bitmap dirty in one call.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1468503209-19498-2-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Some guests (win2008 server for example) do a lot of unnecessary
flushing when underlying media has not changed. This adds additional
overhead on host when calling fsync/fdatasync.
This change introduces a write generation scheme in BlockDriverState.
Current write generation is checked against last flushed generation to
avoid unnessesary flushes.
The problem with excessive flushing was found by a performance test
which does parallel directory tree creation (from 2 processes).
Results improved from 0.424 loops/sec to 0.432 loops/sec.
Each loop creates 10^3 directories with 10 files in each.
This affected some blkdebug testcases that were expecting error logs from
failure-injected flushes which are now skipped entirely
(tests 026 071 089).
This also affects the performance of block jobs and thus BLOCK_JOB_READY
events for driver-mirror and active block-commit commands now arrives
faster, before QMP send successfully returns to caller (tests 141 144).
Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468870792-7411-5-git-send-email-den@openvz.org
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Parameter **errp of aio_context_setup() is useless, remove it
and clean up the related code.
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Fam Zheng <famz@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1468578524-23433-1-git-send-email-caoj.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This has better performance because it executes fewer system calls
and does not use a bottom half per disk.
Originally proposed by Ming Lei.
[Changed #include "raw-aio.h" to "block/raw-aio.h" in win32-aio.c to fix
build error as reported by Peter Maydell <peter.maydell@linaro.org>.
--Stefan]
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1467650000-51385-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
squash! linux-aio: share one LinuxAioState within an AioContext
* fixes to qemu-char and net exit
* FreeBSD fixes
* Other small bugfixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJXhiZDAAoJEL/70l94x66DrGAH/10ZlIYugx6Ijn12qy3irmIC
hbMY6HWjvPlk8ZpAcPa3UXNQvqhTwqhSMXRiwp9aNPlRUqrXnDXZapQunJveKSAn
luLE8ISRKODz0W39qg6znyb4R1ipCGJWwjBCQmLWZuD7883JJ2DsykTATRx7yKQF
qsq9r/DPBTfD3vnOCTbqp0GeB80UFleTNm+K7cct8M1+WzfiwKeVHk9CAKy0fkTH
hS+YnV9UWYL6PR/w+uZ+2MfgH5er4X794+HaNbio0QJJbEZ2bsL4A3Prh7pUonN7
qJoCbT4W79scrnWQ40RbWRXOMfUk4J7gIMEZYar8z6NmqnamNZgxbWj3dv6pO+k=
=sz/L
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* SCSI scanner support
* fixes to qemu-char and net exit
* FreeBSD fixes
* Other small bugfixes
# gpg: Signature made Wed 13 Jul 2016 12:30:11 BST
# gpg: using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini/tags/for-upstream:
hostmem: detect host backend memory is being used properly
hostmem: fix QEMU crash by 'info memdev'
char: do not use atexit cleanup handler
net: do not use atexit for cleanup
slirp: use exit notifier for slirp_smb_cleanup
tap: use an exit notifier to call down_script
util: Fix MIN_NON_ZERO
qemu-sockets: use qapi_free_SocketAddress in cleanup
disas: avoid including everything in headers compiled from C++
json-streamer: fix double-free on exiting during a parse
main-loop: check return value before using pointer
Use "-s" instead of "--quiet" to resolve non-fatal build error on FreeBSD.
scsi-bus: Use longer sense buffer with scanners
scsi-bus: Add SCSI scanner support
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This patch adds a new optional 'job-id' parameter to 'block-commit',
allowing the user to specify the ID of the block job to be created.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds a new optional 'job-id' parameter to 'block-stream',
allowing the user to specify the ID of the block job to be created.
The HMP 'block_stream' command remains unchanged.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds a new optional 'job-id' parameter to 'blockdev-backup'
and 'drive-backup', allowing the user to specify the ID of the block
job to be created.
The HMP 'drive_backup' command remains unchanged.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds a new optional 'job-id' parameter to 'blockdev-mirror'
and 'drive-mirror', allowing the user to specify the ID of the block
job to be created.
The HMP 'drive_mirror' command remains unchanged.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When a new job is created, the job ID is taken from the device name of
the BDS. This patch adds a new 'job_id' parameter to let the caller
provide one instead.
This patch also verifies that the ID is always unique and well-formed.
This causes problems in a couple of places where no ID is being set,
because the BDS does not have a device name.
In the case of test_block_job_start() (from test-blockjob-txn.c) we
can simply use this new 'job_id' parameter to set the missing ID.
In the case of img_commit() (from qemu-img.c) we still don't have the
API to make commit_active_start() set the job ID, so we solve it by
setting a default value. We'll get rid of this as soon as we extend
the API.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently the way to look for a specific block job is to iterate the
list manually using block_job_next().
Since we want to be able to identify a job primarily by its ID it
makes sense to have a function that does just that.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The 'id' field of the BlockJob structure will be able to hold any ID,
not only a device name. This patch updates the description of that
field and the error messages where it is being used.
Soon we'll add the ability to set an arbitrary ID when creating a
block job.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
'stream-start' has a parameter called 'backing-file', which is the
string to be written to bs->backing when the job finishes.
In the stream_start() implementation it is called 'backing_file_str',
but it the prototype in the header file it is called 'base_id'.
This patch fixes it so the name is the same in both cases and is
consistent with other cases (like commit_start()).
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add support for missing scanner specific SCSI commands and their xfer
lenghts as per ANSI spec section 15.
Signed-off-by: Jarkko Lavinen <jarkko.lavinen@iki.fi>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Header guard symbols should match their file name to make guard
collisions less likely. Offenders found with
scripts/clean-header-guards.pl -vn.
Cleaned up with scripts/clean-header-guards.pl, followed by some
renaming of new guard symbols picked by the script to better ones.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
This is the final patch for converting the common I/O path to take
a BdrvChild parameter instead of BlockDriverState.
The completion of this conversion means that all users that perform I/O
on an image need to actually hold a reference (in the form of BdrvChild,
possible as part of a BlockBackend) to that image. This also protects
against inconsistent use of BlockBackend vs. BlockDriverState functions
because direct use of a BlockDriverState isn't possible any more and
blk->root is private for block-backends.c.
In addition, we can now distinguish different users in the I/O path,
and the future op blockers work is going to add assertions based on
permissions stored in BdrvChild.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Using int for values that are only used as booleans is confusing.
While at it, rearrange a couple of members so that all the bools
are contiguous.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It makes more sense to have ALL block size limit constraints
in the same struct. Improve the documentation while at it.
Simplify a couple of conditionals, now that we have audited and
documented that request_alignment is always non-zero.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Sector-based limits are awkward to think about; in our on-going
quest to move to byte-based interfaces, convert max_discard and
discard_alignment. Rename them, using 'pdiscard' as an aid to
track which remaining discard interfaces need conversion, and so
that the compiler will help us catch the change in semantics
across any rebased code. The BlockLimits type is now completely
byte-based; and in iscsi.c, sector_limits_lun2qemu() is no
longer needed.
pdiscard_alignment is made unsigned (we use power-of-2 alignments
as bitmasks, where unsigned is easier to think about) while
leaving max_pdiscard signed (since we still have an 'int'
interface); this is comparable to what commit cf081fc did for
write zeroes limits. We may later want to make everything an
unsigned 64-bit limit - but that requires a bigger code audit.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Improve the documentation of the write zeroes limits, to mention
additional constraints that drivers should observe. Worth squashing
into commit cf081fca, if that hadn't been pushed already :)
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Sector-based limits are awkward to think about; in our on-going
quest to move to byte-based interfaces, convert max_transfer_length
and opt_transfer_length. Rename them (dropping the _length suffix)
so that the compiler will help us catch the change in semantics
across any rebased code, and improve the documentation. Use unsigned
values, so that we don't have to worry about negative values and
so that bit-twiddling is easier; however, we are still constrained
by 2^31 of signed int in most APIs.
When a value comes from an external source (iscsi and raw-posix),
sanitize the results to ensure that opt_transfer is a power of 2.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The NBD layer was breaking up request at a limit of 2040 sectors
(just under 1M) to cater to old qemu-nbd. But the server limit
was raised to 32M in commit 2d8214885 to match the kernel, more
than three years ago; and the upstream NBD Protocol is proposing
documentation that without any explicit communication to state
otherwise, a client should be able to safely assume that a 32M
transaction will work. It is time to rely on the larger sizing,
and any downstream distro that cares about maximum
interoperability to older qemu-nbd servers can just tweak the
value of #define NBD_MAX_SECTORS.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Block jobs that use additional BDSes or event loop resources need a
callback to get their affairs in order when the AioContext is switched.
Simple block jobs don't need an attach callback, they automatically work
thanks to the generic attach/detach notifiers that this patch adds.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1466096189-6477-7-git-send-email-stefanha@redhat.com
It's possible that an AioContext notifier user was close to finishing
when .detach_aio_context() or .attached_aio_context() is called. In
that case they may call bdrv_remove_aio_context_notifier() during the
callback.
Use safe iteration to avoid crashing when the notifier list is modified
during iteration. We must not only handle the case where the current
aio notifier is removed during a callback but also the one where any
other aio notifier is removed.
The next patch adds an AioContext notifier for block jobs and they
really could be terminating just as .detach_aio_context() is invoked.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1466096189-6477-6-git-send-email-stefanha@redhat.com
Block jobs are coroutines that usually perform I/O but sometimes also
sleep or yield. Currently only sleeping or yielded block jobs can be
paused. This means jobs that do not sleep or yield (using
block_job_yield()) are unaffected by block_job_pause().
Add block_job_pause_point() so that block jobs can mark quiescent points
that are suitable for pausing. This solves the problem that it can take
a block job a long time to pause if it is performing a long series of
I/O operations.
Transitioning to paused state involves a .pause()/.resume() callback.
These callbacks are used to ensure that I/O and event loop activity has
ceased while the job is at a pause point.
Note that this patch introduces a stricter pause state than previously.
The job->busy flag was incorrectly documented as a quiescent state
without I/O pending. This is violated by any job that has I/O pending
across sleep or block_job_yield(), like the mirror block job.
[Add missing block_job_should_pause() check to avoid deadlock after
job->driver->pause() in block_job_pause_point().
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1466096189-6477-4-git-send-email-stefanha@redhat.com