Commit Graph

108 Commits

Author SHA1 Message Date
Stefan Hajnoczi b1c7c07f2d virtio: use unsigned int for virtqueue_get_avail_bytes() index
The virtio code uses int, unsigned int, and uint16_t for virtqueue
indices.  The uint16_t is used for the low-level descriptor layout in
virtio_ring.h while code that isn't concerned with descriptor layout can
use unsigned int.

Use of int is problematic because it can result in signed/unsigned
comparison and incompatible int*/unsigned int* pointer types.

Make the virtqueue_get_avail_bytes() 'i' variable unsigned int.  This
eliminates the need to introduce casts and modify code further in the
patches that follow.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-09-23 19:03:56 +03:00
Stefan Hajnoczi d65abf85e7 virtio: handle virtqueue_get_avail_bytes() errors
If the vring is invalid, tell the caller no bytes are available and mark
the device broken.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-09-23 19:03:56 +03:00
Stefan Hajnoczi ec55da1924 virtio: handle virtqueue_map_desc() errors
Errors can occur during virtqueue_pop(), especially in
virtqueue_map_desc().  In order to handle this we must unmap iov[]
before returning NULL.  The caller will consider the virtqueue empty and
the virtio_error() call will have marked the device broken.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-23 19:03:55 +03:00
Stefan Hajnoczi 791b1daf72 virtio: migrate vdev->broken flag
Send a subsection if the vdev->broken flag is set.  This allows live
migration of broken virtio devices.

The subsection is only sent if vdev->broken has been set.  In most cases
the flag will be clear and no subsection will be sent.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-09-23 19:03:55 +03:00
Stefan Hajnoczi f5ed36635d virtio: stop virtqueue processing if device is broken
QEMU prints an error message and exits when the device enters an invalid
state.  Terminating the process is heavy-handed.  The guest may still be
able to function even if there is a bug in a virtio guest driver.

Moreover, exiting is a bug in nested virtualization where a nested guest
could DoS other nested guests by killing a pass-through virtio device.
I don't think this configuration is possible today but it is likely in
the future.

If the broken flag is set, do not process virtqueues or write back used
descriptors.  The broken flag can be cleared again by resetting the
device.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-09-23 19:03:55 +03:00
Stefan Hajnoczi 8275e2f6be virtio: fix stray tab character
Fix a single occurrence of a tab character in a file that otherwise uses
spaces for indentation.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-09-23 19:03:55 +03:00
Prasad J Pandit 973e7170dd virtio: add check for descriptor's mapped address
virtio back end uses set of buffers to facilitate I/O operations.
If its size is too large, 'cpu_physical_memory_map' could return
a null address. This would result in a null dereference while
un-mapping descriptors. Add check to avoid it.

Reported-by: Qinghao Tang <luodalongde@gmail.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
2016-09-23 18:51:40 +03:00
Stefan Hajnoczi 297a75e6c5 virtio: add virtqueue_rewind()
virtqueue_discard() requires a VirtQueueElement but virtio-balloon does
not migrate its in-use element.  Introduce a new function that is
similar to virtqueue_discard() but doesn't require a VirtQueueElement.

This will allow virtio-balloon to access element again after migration
with the usual proviso that the guest may have modified the vring since
last time.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09 20:58:34 +03:00
Stefan Hajnoczi 4b7f91ed02 virtio: zero vq->inuse in virtio_reset()
vq->inuse must be zeroed upon device reset like most other virtqueue
fields.

In theory, virtio_reset() just needs assert(vq->inuse == 0) since
devices must clean up in-flight requests during reset (requests cannot
not be leaked!).

In practice, it is difficult to achieve vq->inuse == 0 across reset
because balloon, blk, 9p, etc implement various different strategies for
cleaning up requests.  Most devices call g_free(elem) directly without
telling virtio.c that the VirtQueueElement is cleaned up.  Therefore
vq->inuse is not decremented during reset.

This patch zeroes vq->inuse and trusts that devices are not leaking
VirtQueueElements across reset.

I will send a follow-up series that refactors request life-cycle across
all devices and converts vq->inuse = 0 into assert(vq->inuse == 0) but
this more invasive approach is not appropriate for stable trees.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Cc: qemu-stable <qemu-stable@nongnu.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Ladi Prosek <lprosek@redhat.com>
2016-09-09 20:58:34 +03:00
Stefan Hajnoczi 58a83c6149 virtio: decrement vq->inuse in virtqueue_discard()
virtqueue_discard() moves vq->last_avail_idx back so the element can be
popped again.  It's necessary to decrement vq->inuse to avoid "leaking"
the element count.

Cc: qemu-stable@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-08-23 19:20:24 +03:00
Stefan Hajnoczi bccdef6b1a virtio: recalculate vq->inuse after migration
The vq->inuse field is not migrated.  Many devices don't hold
VirtQueueElements across migration so it doesn't matter that vq->inuse
starts at 0 on the destination QEMU.

At least virtio-serial, virtio-blk, and virtio-balloon migrate while
holding VirtQueueElements.  For these devices we need to recalculate
vq->inuse upon load so the value is correct.

Cc: qemu-stable@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-08-23 19:20:10 +03:00
Peter Maydell cbe81c6331 pc, pci, virtio: cleanups, fixes
a bunch of bugfixes and a couple of cleanups
 making these easier and/or making debugging easier
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXmslFAAoJECgfDbjSjVRpev0IAMZghEuSeKMB2JR88dErS8P5
 J6y/1W2VFuRa1YBkTz/ecr5r2RwIO5teZUZpUkibM65Zo6bu1liMo6gbzeCg/xOi
 k437pNRl6W9RVWuXQM9VOegNoGYhX3Hrnu3iQeiT8KRY3OMCwG52umUXYVodJh1R
 mlozlEcSyUEDZVdNjhRECuUiw8RRcErEtiKda+zjkf4tPAGkyCItVpLYshE6A2/I
 lfQLkv+EWOyuD4cfEHl+4F9K9wegothFTSd/xBmcqqaWRc+pboMVF2A2yga+GjKm
 Xgb8SzQYkt9Q1nFr9fz89q89CsjhmfrD/ct/vJAcCFnw/dNXnC6mYjr6MDX0Gd0=
 =26Uu
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc, pci, virtio: cleanups, fixes

a bunch of bugfixes and a couple of cleanups
making these easier and/or making debugging easier

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Fri 29 Jul 2016 04:11:01 BST
# gpg:                using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream: (41 commits)
  mptsas: Fix a migration compatible issue
  vhost: do not update last avail idx on get_vring_base() failure
  vhost: add vhost_net_set_backend()
  vhost-user: add error report in vhost_user_write()
  tests: fix vhost-user-test leak
  tests: plug some leaks in virtio-net-test
  vhost-user: wait until backend init is completed
  char: add and use tcp_chr_wait_connected
  char: add chr_wait_connected callback
  vhost: add assert() to check runtime behaviour
  vhost-net: vhost_migration_done is vhost-user specific
  Revert "vhost-net: do not crash if backend is not present"
  vhost-user: add get_vhost_net() assertions
  vhost-user: keep vhost_net after a disconnection
  vhost-user: check vhost_user_{read,write}() return value
  vhost-user: check qemu_chr_fe_set_msgfds() return value
  vhost-user: call set_msgfds unconditionally
  qemu-char: fix qemu_chr_fe_set_msgfds() crash when disconnected
  vhost: use error_report() instead of fprintf(stderr,...)
  vhost: add missing VHOST_OPS_DEBUG
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-07-29 11:57:01 +01:00
Prasad J Pandit 1e7aed7014 virtio: check vring descriptor buffer length
virtio back end uses set of buffers to facilitate I/O operations.
An infinite loop unfolds in virtqueue_pop() if a buffer was
of zero size. Add check to avoid it.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-29 00:07:10 +03:00
Stefan Hajnoczi afd9096eb1 virtio: error out if guest exceeds virtqueue size
A broken or malicious guest can submit more requests than the virtqueue
size permits, causing unbounded memory allocation in QEMU.

The guest can submit requests without bothering to wait for completion
and is therefore not bound by virtqueue size.  This requires reusing
vring descriptors in more than one request, which is not allowed by the
VIRTIO 1.0 specification.

In "3.2.1 Supplying Buffers to The Device", the VIRTIO 1.0 specification
says:

  1. The driver places the buffer into free descriptor(s) in the
     descriptor table, chaining as necessary

and

  Note that the above code does not take precautions against the
  available ring buffer wrapping around: this is not possible since the
  ring buffer is the same size as the descriptor table, so step (1) will
  prevent such a condition.

This implies that placing more buffers into the virtqueue than the
descriptor table size is not allowed.

QEMU is missing the check to prevent this case.  Processing a request
allocates a VirtQueueElement leading to unbounded memory allocation
controlled by the guest.

Exit with an error if the guest provides more requests than the
virtqueue size permits.  This bounds memory allocation and makes the
buggy guest visible to the user.

This patch fixes CVE-2016-5403 and was reported by Zhenhao Hong from 360
Marvel Team, China.

Reported-by: Zhenhao Hong <hongzhenhao@360.cn>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-27 14:04:40 +01:00
Dr. David Alan Gilbert 5943124cc0 virtio: Migration helper function and macro
To make conversion of virtio devices to VMState simple
at first add a helper function for the simple virtio_save
case and a helper macro that defines the VMState structure.
These will probably go away or change as more of the virtio
code gets converted.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-21 20:44:19 +03:00
Fam Zheng 872dd82c83 virtio: Introduce virtio_add_queue_aio
Using this function instead of virtio_add_queue marks the vq as aio
based. This differentiation will be useful in later patches.

Distinguish between virtqueue processing in the iohandler context and main loop
AioContext.  iohandler context is isolated from AioContexts and therefore does
not run during aio_poll().

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-21 20:44:19 +03:00
Fam Zheng bf1780b0d5 virtio: Add typedef for handle_output
The function pointer signature has been repeated a few times, using a
typedef may make coding easier.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-21 20:44:19 +03:00
Michael S. Tsirkin 62cee1a28a virtio: set low features early on load
virtio migrates the low 32 feature bits twice, the first copy is there
for compatibility but ever since
019a3edbb25f1571e876f8af1ce4c55412939e5d: ("virtio: make features 64bit
wide") it's ignored on load. This is wrong since virtio_net_load tests
self announcement and guest offloads before the second copy including
high feature bits is loaded.  This means that self announcement, control
vq and guest offloads are all broken after migration.

Fix it up by loading low feature bits: somewhat ugly since high and low
bits become out of sync temporarily, but seems unavoidable for
compatibility.  The right thing to do for new features is probably to
test the host features, anyway.

Fixes: 019a3edbb2
    ("virtio: make features 64bit wide")
Cc: qemu-stable@nongnu.org
Reported-by: Robin Geuze <robing@transip.nl>
Tested-by: Robin Geuze <robing@transip.nl>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-04 14:52:10 +03:00
Stefan Hajnoczi 3a90c4ace2 virtio: drop duplicate virtio_queue_get_id() function
The virtio_queue_get_id() function is the lesser used duplicate of
virtio_get_queue_index().  Use the latter instead.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1463767461-17922-1-git-send-email-stefanha@redhat.com
2016-06-07 14:40:51 +01:00
Fam Zheng 14560d69e7 virtio: Mark host notifiers as external
The effect of this change is the block layer drained section can work,
for example when mirror job is being completed.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-22 16:43:58 +02:00
Fam Zheng 54e18d35e4 event-notifier: Add "is_external" parameter
All callers pass "false" keeping the old semantics. The windows
implementation doesn't distinguish the flag yet. On posix, it is passed
down to the underlying aio context.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-22 16:43:56 +02:00
Paolo Bonzini a378b49a43 virtio: merge virtio_queue_aio_set_host_notifier_handler with virtio_queue_set_aio
Eliminating the reentrancy is actually a nice thing that we can do
with the API that Michael proposed, so let's make it first class.
This also hides the complex assign/set_handler conventions from
callers of virtio_queue_aio_set_host_notifier_handler, which in
fact was always called with assign=true.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-07 19:57:33 +03:00
Michael S. Tsirkin 344dc16fae virtio: add aio handler
In addition to handling IO in vcpu thread and in io thread, blk dataplane
introduces yet another mode: handling it by AioContext.

Currently, this reuses the same handler as previous modes,
which triggers races as these were not designed to be reentrant.
Add instead a separate handler just for aio; this will make
it possible to disable regular handlers when dataplane is active.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-07 19:57:33 +03:00
Paolo Bonzini 2b2cbcadc1 virtio: make virtio_queue_notify_vq static
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-07 19:57:33 +03:00
Paolo Bonzini 4771d756f4 hw: explicitly include qemu-common.h and cpu.h
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:17 +01:00
Markus Armbruster da34e65cb4 include/qemu/osdep.h: Don't include qapi/error.h
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the
Error typedef.  Since then, we've moved to include qemu/osdep.h
everywhere.  Its file comment explains: "To avoid getting into
possible circular include dependencies, this file should not include
any other QEMU headers, with the exceptions of config-host.h,
compiler.h, os-posix.h and os-win32.h, all of which are doing a
similar job to this file and are under similar constraints."
qapi/error.h doesn't do a similar job, and it doesn't adhere to
similar constraints: it includes qapi-types.h.  That's in excess of
100KiB of crap most .c files don't actually need.

Add the typedef to qemu/typedefs.h, and include that instead of
qapi/error.h.  Include qapi/error.h in .c files that need it and don't
get it now.  Include qapi-types.h in qom/object.h for uint16List.

Update scripts/clean-includes accordingly.  Update it further to match
reality: replace config.h by config-target.h, add sysemu/os-posix.h,
sysemu/os-win32.h.  Update the list of includes in the qemu/osdep.h
comment quoted above similarly.

This reduces the number of objects depending on qapi/error.h from "all
of them" to less than a third.  Unfortunately, the number depending on
qapi-types.h shrinks only a little.  More work is needed for that one.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
[Fix compilation without the spice devel packages. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:15 +01:00
Paolo Bonzini adb3feda8d virtio: export vring_notify as virtio_should_notify
Virtio dataplane needs to trigger the irq manually through the
guest notifier.  Export virtio_should_notify so that it can be
used around event_notifier_set.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-02-25 13:14:18 +02:00
Paolo Bonzini a1afb6062e virtio: add AioContext-specific function for host notifiers
This is used to register ioeventfd with a dataplane thread.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-02-25 13:14:18 +02:00
Victor Kaplansky 5669655aaf vhost-user interrupt management fixes
Since guest_mask_notifier can not be used in vhost-user mode due
to buffering implied by unix control socket, force
use_mask_notifier on virtio devices of vhost-user interfaces, and
send correct callfd to the guest at vhost start.

Using guest_notifier_mask function in vhost-user case may
break interrupt mask paradigm, because mask/unmask is not
really done when returning from guest_notifier_mask call, instead
message is posted in a unix socket, and processed later.

Add an option boolean flag 'use_mask_notifier' to disable the use
of guest_notifier_mask in virtio pci.

Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Signed-off-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-18 16:13:56 +02:00
Vincenzo Maffione 1cdd2ee54a virtio: combine write of an entry into used ring
Fill in an element of the used ring with a single combined access to the
guest physical memory, rather than using two separated accesses.
This reduces the overhead due to expensive address translation.

Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
Message-Id: <e4a89a767a4a92cbb6bcc551e151487eb36e1722.1450218353.git.v.maffione@gmail.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-06 20:44:08 +02:00
Vincenzo Maffione be1fea9bc2 virtio: read avail_idx from VQ only when necessary
The virtqueue_pop() implementation needs to check if the avail ring
contains some pending buffers. To perform this check, it is not
always necessary to fetch the avail_idx in the VQ memory, which is
expensive. This patch introduces a shadow variable tracking avail_idx
and modifies virtio_queue_empty() to access avail_idx in physical
memory only when necessary.

Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
Message-Id: <b617d6459902773d9f4ab843bfaca764f5af8eda.1450218353.git.v.maffione@gmail.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-06 20:44:08 +02:00
Vincenzo Maffione b796fcd1bf virtio: cache used_idx in a VirtQueue field
Accessing used_idx in the VQ requires an expensive access to
guest physical memory. Before this patch, 3 accesses are normally
done for each pop/push/notify call. However, since the used_idx is
only written by us, we can track it in our internal data structure.

Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
Message-Id: <3d062ec54e9a7bf9fb325c1fd693564951f2b319.1450218353.git.v.maffione@gmail.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-06 20:44:08 +02:00
Paolo Bonzini aa570d6fb6 virtio: combine the read of a descriptor
Compared to vring, virtio has a performance penalty of 10%.  Fix it
by combining all the reads for a descriptor in a single address_space_read
call.  This also simplifies the code nicely.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-06 20:44:08 +02:00
Paolo Bonzini 3b3b062821 virtio: slim down allocation of VirtQueueElements
Build the addresses and s/g lists on the stack, and then copy them
to a VirtQueueElement that is just as big as required to contain this
particular s/g list.  The cost of the copy is minimal compared to that
of a large malloc.

When virtqueue_map is used on the destination side of migration or on
loadvm, the iovecs have already been split at memory region boundary,
so we can just reuse the out_num/in_num we find in the file.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-06 20:44:08 +02:00
Paolo Bonzini 3724650db0 virtio: introduce virtqueue_alloc_element
Allocate the arrays for in_addr/out_addr/in_sg/out_sg outside the
VirtQueueElement.  For now, virtqueue_pop and vring_pop keep
allocating a very large VirtQueueElement.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-06 20:44:08 +02:00
Paolo Bonzini ab281c1781 virtio: introduce qemu_get/put_virtqueue_element
Move allocation to virtio functions also when loading/saving a
VirtQueueElement.  This will also let the load/save functions
keep backwards compatibility when the VirtQueueElement layout
is changed.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-06 20:44:08 +02:00
Paolo Bonzini 51b19ebe43 virtio: move allocation to virtqueue_pop/vring_pop
The return code of virtqueue_pop/vring_pop is unused except to check for
errors or 0.  We can thus easily move allocation inside the functions
and just return a pointer to the VirtQueueElement.

The advantage is that we will be able to allocate only the space that
is needed for the actual size of the s/g list instead of the full
VIRTQUEUE_MAX_SIZE items.  Currently VirtQueueElement takes about 48K
of memory, and this kind of allocation puts a lot of stress on malloc.
By cutting the size by two or three orders of magnitude, malloc can
use much more efficient algorithms.

The patch is pretty large, but changes to each device are testable
more or less independently.  Splitting it would mostly add churn.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-02-06 20:39:07 +02:00
Dr. David Alan Gilbert 3e996cc583 Fix virtio migration
I misunderstood the vmstate macro definition when I reworked the
virtio .get/.put.
The VMSTATE_STRUCT_VARRAY_KNOWN, was described as being for "a
variable length array (i.e. _type *_field) but we know the
length".  However it actually specified operation for arrays embedded in
the struct (i.e. _type _field[]) since it lacked the VMS_POINTER
flag. This caused offset calculation to be completely off, examining and
potentially sending random data instead of the VirtQueue content.

Replace the otherwise unused VMSTATE_STRUCT_VARRAY_KNOWN with a
VMSTATE_STRUCT_VARRAY_POINTER_KNOWN that includes the VMS_POINTER flag
(so now actually doing what it advertises) and use it in the virtio
migration code.

Fixes and description as per Sascha's suggestions/debug.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reported-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Tested-By: Sascha Silbe <silbe@linux.vnet.ibm.com>
Reviewed-By: Sascha Silbe <silbe@linux.vnet.ibm.com>

Fixes: 50e5ae4dc3
Fixes: 2cf0148674
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-02-04 19:53:02 +02:00
Peter Maydell 9b8bfe21be virtio: Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-15-git-send-email-peter.maydell@linaro.org
2016-01-29 15:07:23 +00:00
Cornelia Huck 8a1be662a6 virtio: fix error message for number of queues
There's no such thing as "PCI queues" in the virtio core.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-01-09 23:20:20 +02:00
Dr. David Alan Gilbert 50e5ae4dc3 migration/virtio: Remove simple .get/.put use
The 'virtqueue_state' and 'ringsize' can be saved using VMSTATE
macros rather than hand coded .get/.put

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
2016-01-09 23:20:20 +02:00
Jason Wang a6df8adf3e virtio-pci: fix 1.0 virtqueue migration
We don't migrate the followings fields for virtio-pci:

uint32_t dfselect;
uint32_t gfselect;
uint32_t guest_features[2];
struct {
    uint16_t num;
    bool enabled;
    uint32_t desc[2];
    uint32_t avail[2];
    uint32_t used[2];
} vqs[VIRTIO_QUEUE_MAX];

This will confuse driver if migrating during initialization. Solves
this issue by:

- introduce transport specific callbacks to load and store extra
  virtqueue states.
- add a new subsection for virtio to migrate transport specific modern
  device state.
- implement pci specific callbacks.
- add a new property for virtio-pci for whether or not to migrate
  extra state.
- compat the migration for 2.4 and elder machine types

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-11-12 15:49:32 +02:00
Michael S. Tsirkin 3945ecf1ec virtio: drop virtqueue_map_sg
Deprecated in favor of virtqueue_map.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2015-10-29 11:05:24 +02:00
Michael S. Tsirkin 13972ac5e2 virtio: switch to virtio_map
Drop use of the deprecated virtio_map_sg in virtio core.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2015-10-29 11:05:24 +02:00
Michael S. Tsirkin 8059feee00 virtio: introduce virtio_map
virtio_map_sg currently fails if one of the entries it's mapping is
contigious in GPA but not HVA address space.  Introduce virtio_map which
handles this by splitting sg entries.

This new API generally turns out to be a good idea since it's harder to
misuse: at least in one case the existing one was used incorrectly.

This will still fail if there's no space left in the sg, but luckily max
queue size in use is currently 256, while max sg size is 1024, so we
should be OK even is all entries happen to cross a single DIMM boundary.

Won't work well with very small DIMM sizes, unfortunately:
e.g. this will fail with 4K DIMMs where a single
request might span a large number of DIMMs.

Let's hope these are uncommon - at least we are not breaking things.

Note: virtio-scsi calls virtio_map_sg on data loaded from network, and
validates input, asserting on failure.  Copy the validating code here -
it will be dropped from virtio-scsi in a follow-up patch.

Reported-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2015-10-29 11:05:24 +02:00
Jason Wang 29b9f5efd7 virtio: introduce virtqueue_discard()
This patch introduces virtqueue_discard() to discard a descriptor and
unmap the sgs. This will be used by the patch that will discard
descriptor when packet is truncated.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-10-01 16:16:52 +03:00
Jason Wang ce31746157 virtio: introduce virtqueue_unmap_sg()
Factor out sg unmapping logic. This will be reused by the patch that
can discard descriptor.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Andrew James <andrew.james@hpe.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-10-01 16:16:52 +03:00
Cornelia Huck 46c5d0823d virtio: ring sizes vs. reset
We allow guests to change the size of the virtqueue rings by supplying
a number of buffers that is different from the number of buffers the
device was initialized with. Current code has some problems, however,
since reset does not reset the ringsizes to the default values (as this
is not saved anywhere).

Let's extend the core code to keep track of the default ringsizes and
migrate them once the guest changed them for any of the virtqueues
for a device.

Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-09-24 13:42:17 +03:00
Pierre Morel 50764fc8a3 virtio: right size for virtio_queue_get_avail_size
Being working on dataplane I notice something strange:

virtio_queue_get_avail_size() used a 64bit size index
for the calculation of the available ring size.

It is quite strange but it did work with the old calculation
of the avail ring, at most with performance penalty,
and I wonder where I missed something.

This patch let use a 16bit size as defined in virtio_ring.h

Signed-off-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-09-24 13:39:46 +03:00
Cornelia Huck 95129d6fc9 virtio: avoid leading underscores for helpers
Commit ef546f1275 ("virtio: add
feature checking helpers") introduced a helper __virtio_has_feature.
We don't want to use reserved identifiers, though, so let's
rename __virtio_has_feature to virtio_has_feature and virtio_has_feature
to virtio_vdev_has_feature.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-09-10 11:06:05 +03:00