Add trace calls. Convert some #ifdef DEBUG printfs to trace.
Signed-off-by: Don Koch <dkoch@verizon.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
this patch finally introduces multiread support to virtio-blk. While
multiwrite support was there for a long time, read support was missing.
The complete merge logic is moved into virtio-blk.c which has
been the only user of request merging ever since. This is required
to be able to merge chunks of requests and immediately invoke callbacks
for those requests. Secondly, this is required to switch to
direct invocation of coroutines which is planned at a later stage.
The following benchmarks show the performance of running fio with
4 worker threads on a local ram disk. The numbers show the average
of 10 test runs after 1 run as warmup phase.
| 4k | 64k | 4k
MB/s | rd seq | rd rand | rd seq | rd rand | wr seq | wr rand
--------------+--------+---------+--------+---------+--------+--------
master | 1221 | 1187 | 4178 | 4114 | 1745 | 1213
multiread | 1829 | 1189 | 4639 | 4110 | 1894 | 1216
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Turn all the D/DD/DDDPRINTFs into trace events
Turn most of the fprintf(stderr, into error_report
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Mostly on the load side, so that when we get a complaint about
a migration failure we can figure out what it didn't like.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
A bunch of fixes all over the place. Also, beginning to generalize acpi build
code for reuse by ARM.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJUx465AAoJECgfDbjSjVRpzewIAI/tzV1oCR1D/YDYBYpiK68W
85JJbyR90DpS9unjrkeUHEnJgkegCk8dMXlWJOlshpwxDw2khC2ol0yS6siwC6Z/
1peL9E5zHz2H8KWfH6JlhqLETovZxjd5Uv3q1mWULvK+zZcPzeQDCky5I8mbEw4b
0LGDGX8mcLlDnit9mnAbgHu7cbqGa0jtXoJTFveKdxQtHdyj4cAg0wCjOLhnEo6s
fJP7K1TJ2Ptiwwlk2cnj8T4Z9AoJkWjpFfr94dST2KqR3z5j8OUZYYhifrZa3e8t
qxO/UatY4IwSnsmWCn/hvzlHZFa03sc9nPIkAlj96j78sHqPafDJxCdnwlX8pF8=
=Kfwr
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pci, pc, virtio fixes and cleanups
A bunch of fixes all over the place. Also, beginning to generalize acpi build
code for reuse by ARM.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Tue 27 Jan 2015 13:12:25 GMT using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
* remotes/mst/tags/for_upstream:
pc-dimm: Add Error argument to pc_existing_dimms_capacity
pc-dimm: Make pc_existing_dimms_capacity global
pc: Fix DIMMs capacity calculation
smbios: Don't report unknown CPU speed (fix SVVP regression)
smbios: Fix dimm size calculation when RAM is multiple of 16GB
bios-linker-loader: move source to common location
bios-linker-loader: move header to common location
virtio: fix feature bit checks
bios-tables-test: split piix4 and q35 tests
acpi: build_append_nameseg(): add padding if necessary
acpi: update generated hex files
acpi-test: update expected DSDT
pc: acpi: fix WindowsXP BSOD when memory hotplug is enabled
pci: Split pcie_host_mmcfg_map()
Add some trace calls to pci.c.
ich9: add disable_s3, disable_s4, s4_val properties
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The ioreq-server API added to Xen 4.5 offers better security than
the existing Xen/QEMU interface because the shared pages that are
used to pass emulation request/results back and forth are removed
from the guest's memory space before any requests are serviced.
This prevents the guest from mapping these pages (they are in a
well known location) and attempting to attack QEMU by synthesizing
its own request structures. Hence, this patch modifies configure
to detect whether the API is available, and adds the necessary
code to use the API if it is.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
A new common module is created. It implements all functions
that have no device specificity (PCI, Platform).
This patch only consists in move (no functional changes)
Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
vfio_get_device now takes a VFIODevice as argument. The function is split
into 2 parts: vfio_get_device which is generic and vfio_populate_device
which is bus specific.
3 new fields are introduced in VFIODevice to store dev_info.
vfio_put_base_device is created.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This structure is going to be shared by VFIOPCIDevice and
VFIOPlatformDevice. VFIOBAR includes it.
vfio_eoi becomes an ops of VFIODevice specialized by parent device.
This makes possible to transform vfio_bar_write/read into generic
vfio_region_write/read that will be used by VFIOPlatformDevice too.
vfio_mmap_bar becomes vfio_map_region
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This patch removes all DPRINTF and replace them by trace points.
A few DPRINTF used in error cases were transformed into error_report.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
MSI-X works slightly different than INTx; the doorbell
registers are not necessarily used as MSI-X interrupts
are directed anyway. So the head pointer on the
reply queue needs to be updated as soon as a frame
is completed, and we can set the doorbell only
when in INTx mode.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Windows requires the frames to be unmapped, otherwise we run
into a race condition where the updated frame data is not
visible to the guest.
With that we can simplify the queue algorithm and use a bitmap
for tracking free frames.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Improve queue logging by displaying head and tail pointer
of the completion queue.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The windows driver is sending several init_firmware commands
when in MSI-X mode. It is, however, using only the first
queue. So disregard any additional init_firmware commands
until the HBA is reset.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The EFI firmware doesn't handle unit attentions properly,
so we need to clear the Power On/Reset unit attention upon
initial reset.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
To ease debugging we should be decoding
the register names.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The trace events already contain the function name, so the actual
message doesn't need to contain any of these informations.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Device models should access their block backends only through the
block-backend.h API. Convert them, and drop direct includes of
inappropriate headers.
Just four uses of BlockDriverState are left:
* The Xen paravirtual block device backend (xen_disk.c) opens images
itself when set up via xenbus, bypassing blockdev.c. I figure it
should go through qmp_blockdev_add() instead.
* Device model "usb-storage" prompts for keys. No other device model
does, and this one probably shouldn't do it, either.
* ide_issue_trim_cb() uses bdrv_aio_discard() instead of
blk_aio_discard() because it fishes its backend out of a BlockAIOCB,
which has only the BlockDriverState.
* PC87312State has an unused BlockDriverState[] member.
The next two commits take care of the latter two.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Let QEMU propagate the cpu state to kvm. If kvm doesn't yet support it, it is
silently ignored as kvm will still handle the cpu state itself in that case.
The state is not synced back, thus kvm won't have a chance to actively modify
the cpu state. To do so, control has to be given back to QEMU (which is already
done so in all relevant cases).
Setting of the cpu state can fail either because kvm doesn't support the
interface yet, or because the state is invalid/not supported. Failed attempts
will be traced
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
CC: Andreas Faerber <afaerber@suse.de>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
This patch makes sure that halting a cpu and stopping a cpu are two different
things. Stopping a cpu will also set the cpu halted - this is needed for common
infrastructure to work (note that the stop and stopped flag cannot be used for
our purpose because they are already used by other mechanisms).
A cpu can be halted ("waiting") when it is operating. If interrupts are
disabled, this is called a "disabled wait", as it can't be woken up anymore. A
stopped cpu is treated like a "disabled wait" cpu, but in order to prepare for a
proper cpu state synchronization with the kvm part, we need to track the real
logical state of a cpu.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
CC: Andreas Faerber <afaerber@suse.de>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
kvmclock fix. This was reverted last minute for 2.1, but it is now back
with the problematic case fixed.
Note: I will soon switch to a subkey for signing purposes. To verify
future signed pull requests from me, please update my key with
"gpg --recv-keys 9B4D86F2". You should see 3 new subkeys---the
one for signing will be a 2048-bit RSA key, 4E6B09D7.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJUJXmEAAoJEBvWZb6bTYbyuZoP+gJScjcoHVL27YL6GqCsLzQ+
swk/6TDaboR5+t3Tt1vHOWOvtXmoZlxp+sQU50ltiKS1mwRTsMAhEeTInTNPLJ6z
aKX46QdvNib2bYdUgnl8nqOiKQAZiA6rR8os2lWJ3E8n1n6qBbeWwneQK2YvRaVf
RMjmDm7sjhZt5h868O7SPNDSjk8k5TgYj5MOsDugFxQr6JXGxt8xmB+Ml6v6Ne+y
ufDchXtBhANIkr1aNMeOBYdX+/wuY+fltjnG11xpuAL1a63F/b3EGvPoqUHQCZSd
PzHmkSZV0z3rmgVTpcAHxSw2MxUEgmXLvxWAQTahrzGUi8mH3hF7z7zNqfULZWa1
3hctUdlf9aWbIgdfEdBZqOjol58TYvgCjIa7W2upsvEf7sjctpCZnMmBi8vMM3FQ
qv7S4139jN70+7zNY8CCnXPuOY63rsFTAs8XCPF7DL3HESzm7x8MEFgW7/7iJtwQ
PNJVbJX8+FpDmdAy/fwkMKA8BvjmNBuIZrr9/IfFJ6uM70//brq3OXq25agToBpZ
QYWW5CXYeQXurPBK1Q6XkGx61mNuc5zOveVS+8MhJKw9SiL/cIhQzrh3fgFtIP+e
63LlufFPMA2+fhBJctptIoU9+XeUwalQic92ZqepuXB/OZV6vhhV9/K2zZMUMpl9
B/Gg3cd1w5rMHg6O2kov
=nP/e
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
Usual mix of patches, the most important being Alex and Marcelo's
kvmclock fix. This was reverted last minute for 2.1, but it is now back
with the problematic case fixed.
Note: I will soon switch to a subkey for signing purposes. To verify
future signed pull requests from me, please update my key with
"gpg --recv-keys 9B4D86F2". You should see 3 new subkeys---the
one for signing will be a 2048-bit RSA key, 4E6B09D7.
# gpg: Signature made Fri 26 Sep 2014 15:34:44 BST using RSA key ID 9B4D86F2
# gpg: Good signature from "Paolo Bonzini <pbonzini@redhat.com>"
# gpg: aka "Paolo Bonzini <bonzini@gnu.org>"
* remotes/bonzini/tags/for-upstream:
kvm/valgrind: don't mark memory as initialized
po: fix conflict with %.mo rule in rules.mak
kvmvapic: fix migration when VM paused and when not running Windows
serial: check if backed by a physical serial port at realize time
serial: reset state at startup
target-i386: update fp status fix
hw/dma/i8257: Silence phony error message
kvmclock: Ensure time in migration never goes backward
kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec calculation
Introduce cpu_clean_all_dirty
pit: fix pit interrupt can't inject into vm after migration
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This exceeded the trace argument limit for LTTNG UST and wasn't really
needed as the flags value is stored anyway. Dropping this fixes the
compile failure for UST. It can probably be merged with the previous
trace shortening patch.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Recent traces rework introduced 2 tracepoints with 13 and 20
arguments. When dtrace backend is selected
(--enable-trace-backend=dtrace), compile fails as
sys/sdt.h defines DTRACE_PROBE up to DTRACE_PROBE12 only.
This splits long tracepoints.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
A few files have been renamed without updating their comment here. A
few events have been added in the wrong place. Clean that up.
Comments with no space after the '#' look ugly and confuse
cleanup-trace-events.pl. Insert a space.
scripts/cleanup-trace-events.pl is now happy again.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1411476811-24251-5-git-send-email-armbru@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Event monitor_protocol_event is unused since commit 7517517. Drop it.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1411476811-24251-4-git-send-email-armbru@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Event megasas_io_read was added in commit e8f943c, but never used.
Drop it.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1411476811-24251-3-git-send-email-armbru@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
iscsi_aio_write16_cb, iscsi_aio_writev, iscsi_aio_read16_cb, and
iscsi_aio_readv have not not been in use since commit
063c3378a9 ("block/iscsi: introduce
bdrv_co_{readv, writev, flush_to_disk}").
These were the only trace events in block/iscsi.c so drop the the
trace.h include.
Cc: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1411394595-15300-4-git-send-email-stefanha@redhat.com
This trace event was added in commit
840a178c94 ("usb: mtp filesharing") but
never used.
Cc: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1411394595-15300-3-git-send-email-stefanha@redhat.com
This trace event has not been in use since commit
b002254dbd ("virtio-blk: Unify
{non-,}dataplane's request handlings").
Cc: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1411394595-15300-2-git-send-email-stefanha@redhat.com
This converts many kinds of debug prints to traces.
This implements packets logging to avoid unnecessary calculations if
usb_ohci_td_pkt_short/usb_ohci_td_pkt_long is not enabled.
This makes OHCI errors (such as "DMA error") invisible by default.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Convert into trace event. Otherwise the message
dma: unregistered DMA channel used nchan=0 dma_pos=0 dma_len=1
gets printed every time and fills up the log-file with 50 MiB / minute.
Signed-off-by: Philipp Hahn <hahn@univention.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
With this patch the qemu console core stops using PixelFormat and pixman
format codes side-by-side, pixman format code is the primary way to
specify the DisplaySurface format:
* DisplaySurface stops carrying a PixelFormat field.
* qemu_create_displaysurface_from() expects a pixman format now.
Functions to convert PixelFormat to pixman_format_code_t (and back)
exist for those who still use PixelFormat. As PixelFormat allows
easy access to masks and shifts it will probably continue to exist.
[ xenfb added by Benjamin Herrenschmidt ]
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Add some trace events to virtio-rng for easier debugging
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Amos Kong <akong@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This adds a couple of tcg specific trace-events which are useful for
tracing execution though tcg generated blocks. It's been tested with
lttng user space tracing but is generic enough for all systems. The tcg
events are:
* translate_block - when a subject block is translated
* exec_tb - when a translated block is entered
* exec_tb_exit - when we exit the translated code
* exec_tb_nocache - special case translations
Of course we can only trace the entrance to the first block of a chain
as each block will jump directly to the next when it can. See the -d
nochain patch to allow more complete tracing at the expense of
performance.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
We have the experience that the guest doesn't stop successfully
though it was instructed to shut down.
The root cause may be not in QEMU mostly. However, QEMU is often
suspected at the beginning just because the issue occurred in
virtualization environment.
Therefore, we need to affirm that QEMU received the shutdown
request and raised ACPI irq from "virsh shutdown" command,
virt-manger or stopping QEMU process to the VM .
So that we can affirm the problems was belonged to the Guset OS
rather than the QEMU itself.
When we stop guests by "virsh shutdown" command or virt-manger,
or stopping QEMU process, qemu_system_powerdown_request() or
qemu_system_shutdown_request() is called. Then the below functions
in main_loop_should_exit() of Vl.c are called roughly in the
following order.
if (qemu_powerdown_requested())
qemu_system_powerdown()
monitor_protocol_event(QEVENT_POWERDOWN, NULL)
OR
if(qemu_shutdown_requested()}
monitor_protocol_event(QEVENT_SHUTDOWN, NULL);
The tracepoint of monitor_protocol_event() already exists, but no
tracepoints are defined for qemu_system_powerdown_request() and
qemu_system_shutdown_request(). So this patch adds two tracepoints for
the two functions. We believe that it will become much easier to
isolate the problem mentioned above by these tracepoints.
Signed-off-by: Yang Zhiyong <yangzy.fnst@cn.fujitsu.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Currently SPAPR PHB keeps track of all allocated MSI (here and below
MSI stands for both MSI and MSIX) interrupt because
XICS used to be unable to reuse interrupts. This is a problem for
dynamic MSI reconfiguration which happens when guest reloads a driver
or performs PCI hotplug. Another problem is that the existing
implementation can enable MSI on 32 devices maximum
(SPAPR_MSIX_MAX_DEVS=32) and there is no good reason for that.
This makes use of new XICS ability to reuse interrupts.
This reorganizes MSI information storage in sPAPRPHBState. Instead of
static array of 32 descriptors (one per a PCI function), this patch adds
a GHashTable when @config_addr is a key and (first_irq, num) pair is
a value. GHashTable can dynamically grow and shrink so the initial limit
of 32 devices is gone.
This changes migration stream as @msi_table was a static array while new
@msi_devs is a dynamic hash table. This adds temporary array which is
used for migration, it is populated in "spapr_pci"::pre_save() callback
and expanded into the hash table in post_load() callback. Since
the destination side does not know the number of MSI-enabled devices
in advance and cannot pre-allocate the temporary array to receive
migration state, this makes use of new VMSTATE_STRUCT_VARRAY_ALLOC macro
which allocates the array automatically.
This resets the MSI configuration space when interrupts are released by
the ibm,change-msi RTAS call.
This fixed traces to be more informative.
This changes vmstate_spapr_pci_msi name from "...lsi" to "...msi" which
was incorrect by accident. As the internal representation changed,
thus bumps migration version number.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[agraf: drop g_malloc_n usage]
Signed-off-by: Alexander Graf <agraf@suse.de>
This implements interrupt release function so IRQs can be returned back
to the pool for reuse in cases such as PCI hot plug.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
The current allocator returns IRQ numbers from a pool and does not
support IRQs reuse in any form as it did not keep track of what it
previously returned, it only keeps the last returned IRQ. Some use
cases such as PCI hot(un)plug may require IRQ release and reallocation.
This moves an allocator from SPAPR to XICS.
This switches IRQ users to use new API.
This uses LSI/MSI flags to know if interrupt is allocated.
The interrupt release function will be posted as a separate patch.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Add mhp_pc_dimm_assigned_slot & mhp_pc_dimm_assigned_address
events to trace which address and slot where assigned to
plugged in PC_DIMM device on target-i386 machine.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add events for tracing accesses to memory hotplug IO ports.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently only single TCE entry per request is supported (H_PUT_TCE).
However PAPR+ specification allows multiple entry requests such as
H_PUT_TCE_INDIRECT and H_STUFF_TCE. Having less transitions to the host
kernel via ioctls, support of these calls can accelerate IOMMU operations.
This implements H_STUFF_TCE and H_PUT_TCE_INDIRECT.
This advertises "multi-tce" capability to the guest if the host kernel
supports it (KVM_CAP_SPAPR_MULTITCE) or guest is running in TCG mode.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Modern Linux kernels support last POWERPC CPUs so when a kernel boots,
in most cases it can find a matching cpu_spec in the kernel's cpu_specs
list. However if the kernel is quite old, it may be missing a definition
of the actual CPU. To provide an ability for old kernels to work on modern
hardware, a Processor Compatibility Mode has been introduced
by the PowerISA specification.
>From the hardware prospective, it is supported by the Processor
Compatibility Register (PCR) which is defined in PowerISA. The register
enables one of the compatibility modes (2.05/2.06/2.07).
Since PCR is a hypervisor privileged register and cannot be
directly accessed from the guest, the mode selection is done via
ibm,client-architecture-support (CAS) RTAS call using which the guest
specifies what "raw" and "architected" CPU versions it supports.
QEMU works out the best match, changes a "cpu-version" property of
every CPU and notifies the guest about the change by setting these
properties in the buffer passed as a response on a custom H_CAS hypercall.
This implements ibm,client-architecture-support parameters parsing
(now only for PVRs) and cooks the device tree diff with new values for
"cpu-version", "ibm,ppc-interrupt-server#s" and
"ibm,ppc-interrupt-server#s" properties.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
The PAPR+ specification defines a ibm,client-architecture-support (CAS)
RTAS call which purpose is to provide a negotiation mechanism for
the guest and the hypervisor to work out the best compatibility parameters.
During the negotiation process, the guest provides an array of various
options and capabilities which it supports, the hypervisor adjusts
the device tree and (optionally) reboots the guest.
At the moment the Linux guest calls CAS method at early boot so SLOF
gets called. SLOF allocates a memory buffer for the device tree changes
and calls a custom KVMPPC_H_CAS hypercall. QEMU parses the options,
composes a diff for the device tree, copies it to the buffer provided
by SLOF and returns to SLOF. SLOF updates the device tree and returns
control to the guest kernel. Only then the Linux guest parses the device
tree so it is possible to avoid unnecessary reboot in most cases.
The device tree diff is a header with an update format version
(defined as 1 in this patch) followed by a device tree with the properties
which require update.
If QEMU detects that it has to reboot the guest, it silently does so
as the guest expects reboot to happen because this is usual pHyp firmware
behavior.
This defines custom KVMPPC_H_CAS hypercall. The current SLOF already
has support for it.
This implements stub which returns very basic tree (root node,
no properties) to the guest.
As the return buffer does not contain any change, no change in behavior is
expected.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
This allows guests to have a different timebase origin from the host.
This is needed for migration, where a guest can migrate from one host
to another and the two hosts might have a different timebase origin.
However, the timebase seen by the guest must not go backwards, and
should go forwards only by a small amount corresponding to the time
taken for the migration.
This is only supported for recent POWER hardware which has the TBU40
(timebase upper 40 bits) register. That includes POWER6, 7, 8 but not
970.
This adds kvm_access_one_reg() to access a special register which is not
in env->spr. This requires kvm_set_one_reg/kvm_get_one_reg patch.
The feature must be present in the host kernel.
This bumps vmstate_spapr::version_id and enables new vmstate_ppc_timebase
only for it. Since the vmstate_spapr::minimum_version_id remains
unchanged, migration from older QEMU is supported but without
vmstate_ppc_timebase.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Exploit the new api for userspace-controlled cmma. If supported, enable
cmma during kvm initialization and register a reset handler for cmma,
which is also called directly from the load IPL code.
The reset functionality is needed to reset the cmma state of the guest
pages, e.g. if a system reset is triggered via qemu monitor; otherwise
this could result in data corruption.
A guest triggered reboot may now lead to multiple cmma resets; this is
OK, however, as this is slowpath anyway and the simplest way to achieve
the intended effects.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>