When ppc_store_slb() is called from kvm_arch_get_registers(), it stores
a SLB in CPUPPCState::slb[slot]. However it drops the slot number from
ESID so when kvm_arch_put_registers() puts SLBs back to KVM, they do not
have correct "index" field anymore. This broke migration with LPCR_AIR
enabled as now the guest is handling interrupts in virtual mode and unable
to reconstruct correct SLBs anymore.
This adds "index" field for valid SLBs when putting them to KVM.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Alexander Graf <agraf@suse.de>
We now have to pass an address space to our _phys helpers. During the
transition apparently the EPR exit path missed out, so let's put it there.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
IBM POWERPC processors encode PVR as a CPU family in higher 16 bits and
a CPU version in lower 16 bits. Since there is no significant change
in behavior between versions, there is no point to add every single CPU
version in QEMU's CPU list. Also, new CPU versions of already supported
CPU won't break the existing code.
This adds PVR value/mask support for KVM, i.e. for -cpu host option.
As CPU family class name for POWER7 is "POWER7-family", there is no need
to touch aliases.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
The latest update to v3.13-rc3 (bf63839f) breaks the
ppc build with KVM:
kvm-all.o: In function `kvm_update_guest_debug':
kvm-all.c:1910: undefined reference to `kvm_arch_update_guest_debug'
kvm-all.o: In function `kvm_insert_breakpoint':
kvm-all.c:1937: undefined reference to `kvm_arch_insert_sw_breakpoint'
kvm-all.c:1945: undefined reference to `kvm_arch_insert_hw_breakpoint'
kvm-all.o: In function `kvm_remove_breakpoint':
kvm-all.c:1977: undefined reference to `kvm_arch_remove_sw_breakpoint'
kvm-all.c:1985: undefined reference to `kvm_arch_remove_hw_breakpoint'
kvm-all.o: In function `kvm_remove_all_breakpoints':
kvm-all.c:2009: undefined reference to `kvm_arch_remove_sw_breakpoint'
kvm-all.c:2006: undefined reference to `kvm_arch_remove_sw_breakpoint'
kvm-all.c:2017: undefined reference to `kvm_arch_remove_all_hw_breakpoints'
We need stubs until something gets implemented.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
Instead of opencoding 64 use MAX_SLB_ENTRIES. We don't update the kernel
header here.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Without this, a value of rb=0 and rs=0 results in replacing the 0th
index. This can be observed when using gdb remote debugging support.
(gdb) x/10i do_fork
0xc000000000085330 <do_fork>: Cannot access memory at address 0xc000000000085330
(gdb)
This is because when we do the slb sync via kvm_cpu_synchronize_state,
we overwrite the slb entry (0th entry) for 0xc000000000085330
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Recent PowerKVM allows the kernel to intercept some RTAS calls from the
guest directly. This is used to implement the more efficient in-kernel
XICS for example. qemu is still responsible for assigning the RTAS token
numbers however, and needs to tell the kernel which RTAS function name is
assigned to a given token value. This patch adds a convenience wrapper for
the KVM_PPC_RTAS_DEFINE_TOKEN ioctl() which is used for this purpose.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This is an autogenerated patch using scripts/switch-timer-api.
Switch the entire code base to using the new timer API.
Note this patch may introduce some line length issues.
Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
'dprintf' is the name of a POSIX standard function so we should not be
stealing it for our debug macro. Rename to 'DPRINTF' (in line with
a number of other source files.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Acked-by: Richard Henderson <rth@twiddle.net>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1375100199-13934-4-git-send-email-peter.maydell@linaro.org
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
At present, the savevm / migration support for the pseries machine will not
work when KVM is enabled. That's because KVM manages the guest's hash page
table in the host kernel, so qemu has no visibility of it. This patch
fixes this by using new kernel interfaces to extract and reinsert the
guest's hash table during the migration process.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Message-id: 1374175984-8930-11-git-send-email-aliguori@us.ibm.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Model TCE tables as a device that's hooked up as a child object to
the owner. Besides the code cleanup, we get a few nice benefits:
1) free actually works now (it was dead code before)
2) the TCE information is visible in the device tree
3) we can expose table information as properties such that if we
change the window_size, we can use globals to keep migration
working.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-id: 1374175984-8930-6-git-send-email-aliguori@us.ibm.com
[dwg: pseries: savevm support for PAPR TCE tables]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[alexey: ppc kvm: fix to compile]
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Move next_cpu from CPU_COMMON to CPUState.
Move first_cpu variable to qom/cpu.h.
gdbstub needs to use CPUState::env_ptr for now.
cpu_copy() no longer needs to save and restore cpu_next.
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
[AF: Rebased, simplified cpu_copy()]
Signed-off-by: Andreas Färber <afaerber@suse.de>
This adds a missing code to save CR (condition register) via
kvm_arch_put_registers(). kvm_arch_get_registers() already has it.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The common KVM code insists on calling kvm_arch_init_irq_routing()
as soon as it sees kernel header support for it (regardless of whether
QEMU supports it). Provide a dummy function to satisfy this.
Unlike x86, PPC does not have one default irqchip, so there's no common
code that we'd stick here. Even if you ignore the routes themselves,
which even on x86 are not set up in this function, the initial XICS
kernel implementation will not support IRQ routing, so it's best to
leave even the general feature flags up to the specific irqchip code.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Some source files #include the same header more than
once for no good reason. Remove second #includes in
such cases.
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
For PAPR guests, KVM tracks the various areas registered with the
H_REGISTER_VPA hypercall. For full emulation, of course, these are tracked
within qemu. At present these values are not synchronized. This is a
problem for reset (qemu's reset of the VPA address is not pushed to KVM)
and will also be a problem for savevm / migration.
The kernel now supports accessing the VPA state via the ONE_REG interface,
this patch adds code to qemu to use that interface to keep the qemu and
KVM ideas of the VPA state synchronized.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
PAPR requires that the device tree's CPU nodes have several properties
with information about the L1 cache. We already create two of these
properties, but with incorrect names - "[id]cache-block-size" instead
of "[id]-cache-block-size" (note the extra hyphen).
We were also missing some of the required cache properties. This
patch adds the [id]-cache-line-size properties (which have the same
values as the block size properties in all current cases). We also
add the [id]-cache-size properties.
Adding the cache sizes requires some extra infrastructure in the
general target-ppc code to (optionally) set the cache sizes for
various CPUs. The CPU family descriptions in translate_init.c can set
these sizes - this patch adds correct information for POWER7, I'm
leaving other CPU types to people who have a physical example to
verify against. In addition, for -cpu host we take the values
advertised by the host (if available) and use those to override the
information based on PVR.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
For the pseries machine, we need to advertise to the guest the size of its
RMA - that is the amount of memory it can access with the MMU off. For HV
KVM, this is constrained by the hardware limitations on the virtual RMA of
one hash PTE per PTE group in the hash page table. We already had code to
calculate this, but it was assuming the VRMA page size was the same as the
(host) backing page size for guest RAM.
In the case of a host kernel configured for 64k base page size, but running
on hardware (or firmware) which only allows 4k pages, the hose will do all
its allocations with a 64k page size, but still use 4k hardware pages for
actual mappings. Usually that's transparent to things running under the
host, but in the case of the maximum VRMA size it's not.
This patch refines the RMA size calculation to instead use the largest
available hardware page size (as reported by the SMMU_INFO call) which is
less than or equal to the backing page size. This now gives the correct
RMA size in all cases I've tested.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Enable the KVM emulated watchdog if KVM supports (use the
capability enablement in watchdog handler). Also watchdog exit
(KVM_EXIT_WATCHDOG) handling is added.
Watchdog state machine is cleared whenever VM state changes to running.
This is to handle the cases like return from debug halt etc.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
[agraf: rebase to current code base, fix non-kvm cases]
Signed-off-by: Alexander Graf <agraf@suse.de>
Older KVM versions don't support EPR which breaks guests when we announce
MPIC variants that support EPR.
Catch that case and expose only MPIC version 2.0 which tells the guest that
we don't support the EPR capability yet.
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
[agraf: Add comment, route cap check through kvm_ppc.c]
Signed-off-by: Alexander Graf <agraf@suse.de>
Many of these should be cleaned up with proper qdev-/QOM-ification.
Right now there are many catch-all headers in include/hw/ARCH depending
on cpu.h, and this makes it necessary to compile these files per-target.
However, fixing this does not belong in these patches.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently cpu.h contains a number of definitions relating to the 64-bit
hash MMU. Some are used in the MMU emulation code, but some are only used
in the spapr MMU management hcall implementations.
This patch moves these definitions (except for a few that are needed
more widely) into mmu-hash64.h header, shared between the MMU emulation
code and the spapr hcall code. The MMU emulation code is also updated to
actually use a number of those definitions in place of hard coded
constants.
Similarly, we add new analogous definitions to mmu-hash32.h and use those
in place of many hard-coded constants in mmu-hash32.c
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[agraf: fix 32-bit hosts]
Signed-off-by: Alexander Graf <agraf@suse.de>
target-ppc/kvm.c has an #ifdef on CONFIG_PSERIES, for the handling of
KVM exits due to a PAPR hypercall from the guest. However, since commit
e4c8b28cde "ppc: express FDT dependency of
pSeries and e500 boards via default-configs/", this hasn't worked properly.
That patch altered the configuration setup so that although CONFIG_PSERIES
is visible from the Makefiles, it is not visible from C files. This broke
the pseries machine when KVM is in use.
This patch makes a quick and dirty fix, by removing the CONFIG_PSERIES
dependency, replacing it with TARGET_PPC64 (since removing it entirely
leads to type mismatch errors). Technically this breaks the build when
configured with --disable-fdt, since that disables CONFIG_PSERIES on
TARGET_PPC64. However, it turns out the build was already broken in that
case, so this fixes pseries kvm without breaking anything extra. I'm
looking into how to fix that build breakage, but I don't think that need
delay applying this patch.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Both fields are used in VMState, thus need to be moved together.
Explicitly zero them on reset since they were located before
breakpoints.
Pass PowerPCCPU to kvmppc_handle_halt().
Signed-off-by: Andreas Färber <afaerber@suse.de>
This avoids assigning individual class fields and contributors
forgetting to add field assignments in KVM-only code.
ppc_cpu_class_find_by_pvr() requires the CPU model classes to be
registered, so defer host CPU type registration to kvm_arch_init().
Only register the host CPU type if there is a class with matching PVR.
This lets us drop error handling from instance_init.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently qemu does not get and put the state of the floating point and
vector registers to KVM. This is obviously a problem for savevm, as well
as possibly being problematic for debugging of FP-using guests.
This patch fixes this by using new extensions to the ONE_REG interface to
synchronize the qemu floating point state with KVM.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently when runing under KVM on ppc, we synchronize a certain number of
vital SPRs to KVM through the SET_SREGS call. This leaves out quite a lot
of important SPRs which are maintained in KVM. It would be helpful to
have their contents in qemu for debugging purposes, and when we implement
migration it will be vital, since they include important guest state that
will need to be restored on the target.
This patch sets up for synchronization of any registers supported by the
KVM ONE_REG calls. A new variant on spr_register() allows a ONE_REG id to
be stored with the SPR information. When we set/get information to KVM
we also synchronize any SPRs so registered.
For now we set this mechanism up to synchronize a handful of important
registers that already have ONE_REG IDs, notably the DAR and DSISR.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Turn the array of model definitions into a set of self-registering QOM
types with their own class_init. Unique identifiers are obtained from
the combination of PVR, SVR and family identifiers; this requires all
alias #defines to be removed from the list. Possibly there are some more
left after this commit that are not currently being compiled.
Prepares for introducing abstract intermediate CPU types for families.
Keep the right-aligned macro line breaks within 78 chars to aid
three-way merges.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
In preparation for more efficient setting of these fields.
Cc: Alexander Graf <agraf@suse.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
This will allow each architecture to define how the VCPU ID is set on
the KVM_CREATE_VCPU ioctl call.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Note that target-alpha accesses this field from TCG, now using a
negative offset. Therefore the field is placed last in CPUState.
Pass PowerPCCPU to [kvm]ppc_fixup_cpu() to facilitate this change.
Move common parts of mips cpu_state_reset() to mips_cpu_reset().
Acked-by: Richard Henderson <rth@twiddle.net> (for alpha)
[AF: Rebased onto ppc CPU subclasses and openpic changes]
Signed-off-by: Andreas Färber <afaerber@suse.de>
Previously we silently exited, with subclasses we got an opcode warning.
Instead, explicitly tell the user what's wrong.
An indication for this is -cpu ? showing "host" with an all-zero PVR.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
Since the model list is highly macrofied, keep ppc_def_t for now and
save a pointer to it in PowerPCCPUClass. This results in a flat list of
subclasses including aliases, to be refined later.
Move cpu_ppc_init() to translate_init.c and drop helper.c.
Long-term the idea is to turn translate_init.c into a standalone cpu.c.
Inline cpu_ppc_usable() into type registration.
Split cpu_ppc_register() in two by code movement into the initfn and
by turning the remaining part into a realizefn.
Move qemu_init_vcpu() call into the new realizefn and adapt
create_ppc_opcodes() to return an Error.
Change ppc_find_by_pvr() -> ppc_cpu_class_by_pvr().
Change ppc_find_by_name() -> ppc_cpu_class_by_name().
Turn -cpu host into its own subclass. This requires to move the
kvm_enabled() check in ppc_cpu_class_by_name() to avoid the class being
found via the normal name lookup in the !kvm_enabled() case.
Turn kvmppc_host_cpu_def() into the class_init and add an initfn that
asserts KVM is in fact enabled.
Implement -cpu ? and the QMP equivalent in terms of subclasses.
This newly exposes -cpu host to the user, ordered last for -cpu ?.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
On e500mc, the platform doesn't provide a way for the CPU to go idle.
To still not uselessly burn CPU time, expose an idle hypercall to the guest
if kvm supports it.
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
[agraf: adjust for current code base, add patch description, fix non-kvm case]
Signed-off-by: Alexander Graf <agraf@suse.de>
* 'ppc-for-upstream' of git://repo.or.cz/qemu/agraf: (35 commits)
PPC: KVM: Fix BAT put
PPC: e500: Only expose even TLB sizes in initial TLB
ppc/pseries: Reset VPA registration on CPU reset
pseries: Don't test for MSR_PR for hypercalls under KVM
PPC: e500: calculate initrd_base like dt_base
PPC: e500: increase DTC_LOAD_PAD
device tree: simplify dumpdtb code
fdt: move dumpdtb interpretation code to device_tree.c
target-ppc: Remove unused power_mode field from cpu state
pseries: Set hash table size based on RAM size
pseries: Remove unnecessary locking from PAPR hash table hcalls
ppc405_uc: Fix buffer overflow
target-ppc: KVM: Fix some kernel version edge cases for kvmppc_reset_htab()
pseries: Fix semantics of RTAS int-on, int-off and set-xive functions
pseries: Rework implementation of TCE bypass
pseries: Remove never used flags field from spapr vio devices
pseries: Remove XICS irq type enum type
pseries: Remove C bitfields from xics code
pseries: Small cleanup to H_CEDE implementation
pseries: Fix XICS reset
...
A terminal NUL is required by caller's use of strchr.
It's better not to use strncpy at all, since there is no need
to zero out hundreds of trailing bytes for each iteration.
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
In the sregs API, upper and lower 32bit segments of the BAT registers
are swapped when doing a set. Since we need to support old kernels out
there, don't bother to fix it in the kernel, but instead work around
the problem in QEMU by swapping on put.
Signed-off-by: Alexander Graf <agraf@suse.de>
The kvmppc_reset_htab() function invokes the KVM_PPC_ALLOCATE_HTAB vm ioctl
to request KVM to allocate and reset a hash page table for the guest - it
returns the size of hash table allocated, or 0 to indicate that qemu needs
to allocate the hash table itself. In practice qemu needs to allocate the
htab for full emulation and with Book3sPR KVM, but the kernel has to
allocate it for Book3sHV KVM (the hash table needs to be physically
contiguous in that case).
Unfortunately, the logic in this function is incorrect for some existing
kernels. Specifically:
* at least some PR KVM versions advertise the relevant capability but
don't actually implement the ioctl(), returning ENOTTY.
* For old kernels which don't have the capability, we currently return 0.
This is correct for PV KVM, where we need to allocate the htab, but not for
HV KVM - kernels of this era always allocate a 16MB hash table per guest.
This patch corrects both of these edge cases.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds support for then new "reset htab" ioctl which allows qemu
to properly cleanup the MMU hash table when the guest is reset. With
the corresponding kernel support, reset of a guest now works properly.
This also paves the way for indicating a different size hash table
to the kernel and for the kernel to be able to impose limits on
the requested size.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
At least when invoked with high enough 'level' arguments,
kvm_arch_put_registers() is supposed to copy essentially all the cpu state
as encoded in qemu's internal structures into the kvm state. Currently
the ppc version does not do this - it never calls KVM_SET_SREGS, for
example, and therefore never sets the SDR1 and various other important
though rarely changed registers.
Instead, the code paths which need to set these registers need to
explicitly make (conditional) kvm calls which transfer the changes to kvm.
This breaks the usual model of handling state updates in qemu, where code
just changes the internal model and has it flushed out to kvm automatically
at some later point.
This patch fixes this for Book S ppc CPUs by adding a suitable call to
KVM_SET_SREGS and als to KVM_SET_ONE_REG to set the HIOR (the only register
that is set with that call so far). This lets us remove the hacks to
explicitly set these registers from the kvmppc_set_papr() function.
The problem still exists for Book E CPUs (which use a different version of
the kvm_sregs structure). But fixing that has some complications of its
own so can be left to another day.
Lkewise, there is still some ugly code for setting the PVR through special
calls to SET_SREGS which is left in for now. The PVR needs to be set
especially early because it can affect what other features are available
on the CPU, so I need to do more thinking to see if it can be integrated
into the normal paths or not.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently for powerpc, kvm_arch_handle_exit() always returns 1, meaning
that its caller - kvm_cpu_exec() - will always exit immediately afterwards
to the loop in qemu_kvm_cpu_thread_fn().
There's no need to do this. Once we've handled the hypercall there's no
reason we can't go straight around and KVM_RUN again, which is what ret = 0
will signal. The only exception might be for hypercalls which affect the
state of cpu_can_run(), however the only one that might do this is H_CEDE
and for kvm that is always handled in the kernel, not qemu.
Furtherm setting ret = 0 means that when exit_requested is set from a
hypercall, we will enter KVM_RUN once more with a signal which lets the
the kernel do its internal logic to complete the hypercall with out
actually executing any more guest code. This is important if our hypercall
also triggered a reset, which previously would re-initialize everything
without completing the hypercall. This caused the kernel to get confused
because it thought the guest was still in the middle of a hypercall when
it has actually been reset.
This patch therefore changes to ret = 0, which is both a bugfix and a small
optimization.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The pseries platform already contains an IOMMU implementation, since it is
essential for the platform's paravirtualized VIO devices. This IOMMU
support is currently built into the implementation of the VIO "bus" and
the various VIO devices.
This patch converts this code to make use of the new common IOMMU
infrastructure.
We don't yet handle synchronization of map/unmap callbacks vs. invalidations,
this will require some complex interaction with the kernel and is not a
major concern at this stage.
Cc: Alex Graf <agraf@suse.de>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
More recent Power server chips (i.e. based on the 64 bit hash MMU)
support more than just the traditional 4k and 16M page sizes. This
can get quite complicated, because which page sizes are supported,
which combinations are supported within an MMU segment and how these
page sizes are encoded both in the SLB entry and the hash PTE can vary
depending on the CPU model (they are not specified by the
architecture). In addition the firmware or hypervisor may not permit
use of certain page sizes, for various reasons. Whether various page
sizes are supported on KVM, for example, depends on whether the PR or
HV variant of KVM is in use, and on the page size of the memory
backing the guest's RAM.
This patch adds information to the CPUState and cpu defs to describe
the supported page sizes and encodings. Since TCG does not yet
support any extended page sizes, we just set this to NULL in the
static CPU definitions, expanding this to the default 4k and 16M page
sizes when we initialize the cpu state. When using KVM, however, we
instead determine available page sizes using the new
KVM_PPC_GET_SMMU_INFO call. For old kernels without that call, we use
some defaults, with some guesswork which should do the right thing for
existing HV and PR implementations. The fallback might not be correct
for future versions, but that's ok, because they'll have
KVM_PPC_GET_SMMU_INFO.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
On target-ppc, our table of CPU types and features encodes the features as
found on the hardware, regardless of whether these features are actually
usable under TCG or KVM. We already have cases where the information from
the cpu table must be fixed up to account for limitations in the emulation
method we're using. e.g. TCG does not support the DFP and VSX instructions
and KVM needs different numbering of the CPUs in order to tell it the
correct thread to core mappings.
This patch cleans up these hacks to handle emulation limitations by
consolidating them into a pair of functions specifically for the purpose.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[AF: Style and typo fixes, rename new functions and drop ppc_def_t arg]
Signed-off-by: Andreas Färber <afaerber@suse.de>
The official spelling is QEMU.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Andreas Färber <afaerber@suse.de>
[blauwirbel@gmail.com: fixed comment style in hw/sun4m.c]
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
For the pseries machine, TCE (IOMMU) tables can either be directly
malloc()ed in qemu or, when running on a KVM which supports it, mmap()ed
from a KVM ioctl. The latter option is used when available, because it
allows the (frequent bottlenext) H_PUT_TCE hypercall to be KVM accelerated.
However, even when KVM is persent, TCE acceleration is not always possible.
Only KVM HV supports this ioctl(), not KVM PR, or the kernel could run out
of contiguous memory to allocate the new table. In this case we need to
fall back on the malloc()ed table.
When a device is removed, and we need to remove the TCE table, we need to
either munmap() or free() the table as appropriate for how it was
allocated. The code is supposed to do that, but we buggily fail to
initialize the tcet->fd variable in the malloc() case, which is used as a
flag to determine which is the right choice.
This patch fixes the bug, and cleans up error messages relating to this
path while we're at it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Scripted conversion:
sed -i "s/CPUState/CPUPPCState/g" target-ppc/*.[hc]
sed -i "s/#define CPUPPCState/#define CPUState/" target-ppc/cpu.h
Signed-off-by: Andreas Färber <afaerber@suse.de>
Acked-by: Anthony Liguori <aliguori@us.ibm.com>
Unfortunately the HIOR setting code slipped into upstream QEMU
before it was pulled into upstream KVM. And since Murphy is always
right, comments on the patches only emerged on the pull request
leading to changes in the interface.
So here's an update to the HIOR setting. While at it, I also relaxed
it a bit since for HV KVM we can already run fine without and 3.2
works just fine with HV KVM but when not setting HIOR. We will only
need this when running PAPR in PR KVM.
Since we accidently changed the ABI and API along the way, we have
to update the underlying kernel headers together with the code that
uses it to not break bisectability.
Signed-off-by: Alexander Graf <agraf@suse.de>
Commit c5705a772 ("vmstate, memory: decouple vmstate from memory API") changed
the signature of memory_region_init_ram_ptr() but did not update a caller in
the ppc kvm module. Fix.
Signed-off-by: Avi Kivity <avi@redhat.com>
When guest reset, we need to halt secondary cpus until guest kick them.
This already works for tcg. The patch add the support for kvm.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
[agraf: remove in-kernel irqchip code]
Sufficiently recent kernels include a KVM call to accelerate use of
PAPR TCE tables (IOMMU), which are used by PAPR virtual IO devices.
This involves qemu mapping the TCE table in from a kernel obtained fd,
which currently we do with PROT_READ only. This is a hangover from
early (never released) versions of this kernel interface which only
permitted read-only mappings and required us to destroy and recreate
the table when we needed to clear it from qemu.
Now, the kernel permits read-write mappings, and we rely on this to
clear the table in spapr_vio_quiesce_one(). However, due to
insufficient testing, I forgot to update the actual mapping of the
table in kvmppc_create_spapr_tce() to add PROT_WRITE to the mmap().
This patch corrects the oversight.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The -cpu host feature tries to find out the host capabilities based
on device tree information. However, we don't always have that available
because it's an optional property in dt.
So instead of force unsetting values depending on an unreliable source
of information, let's just try to be clever about it and not override
capabilities when we don't know the device tree pieces.
This fixes altivec with -cpu host on YDL PowerStations.
Reported-by: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently, when KVM is enabled, the pseries machine checks if the host
CPU supports VMX, VSX and/or DFP instructions and advertises
accordingly in the guest device tree. It does this regardless of what
CPU is selected on the command line. On the other hand, when in TCG
mode, it never advertises any of these facilities, even basic VMX
(Altivec) which is supported in TCG.
Now that we have a -cpu host option for ppc, it is fairly
straightforward to fix both problems. This patch changes the -cpu
host code to override the basic cpu spec derived from the PVR with
information queried from the host avout VMX, VSX and DFP capability.
The pseries code then uses the instruction availability advertised in
the cpu state to set the guest device tree correctly for both the KVM
and TCG cases.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
For convenience with kvm, x86 allows the user to specify -cpu host on the
qemu command line, which means make the guest cpu the same as the host
cpu. This patch implements the same option for ppc targets.
For now, this just read the host PVR (Processor Version Register) and
selects one of our existing CPU specs based on it. This means that the
option will not work if the host cpu is not supported by TCG, even if that
wouldn't matter for use under kvm.
In future, we can extend this in future to override parts of the cpu spec
based on information obtained from the host (via /proc/cpuinfo, the host
device tree, or explicit KVM calls). That will let us handle cases where
the real kvm-virtualized CPU doesn't behave exactly like the TCG-emulated
CPU. With appropriate annotation of the CPU specs we'll also then be able
to use host cpus under kvm even when there isn't a matching full TCG model.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Sufficiently recent PAPR specifications define properties "ibm,vmx"
and "ibm,dfp" on the CPU node which advertise whether the VMX vector
extensions (or the later VSX version) and/or the Decimal Floating
Point operations from IBM's recent POWER CPUs are available.
Currently we do not put these in the guest device tree and the guest
kernel will consequently assume they are not available. This is good,
because they are not supported under TCG. VMX is similar enough to
Altivec that it might be trivial to support, but VSX and DFP would
both require significant work to support in TCG.
However, when running under kvm on a host which supports these
instructions, there's no reason not to let the guest use them. This
patch, therefore, checks for the relevant support on the host CPU
and, if present, advertises them to the guest as well.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently the kvmppc_get_clockfreq() function reads the host's clock
frequency from /proc/device-tree, which is useful to past to the guest
in KVM setups. However, there are some other host properties
advertised in the device tree which can also be relevant to the
guests.
This patch, therefore, replaces kvmppc_get_clockfreq() which can
retrieve any named, single integer property from the host device
tree's CPU node.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The pseries machine of qemu implements the TCE mechanism used as a
virtual IOMMU for the PAPR defined virtual IO devices. Because the
PAPR spec only defines a small DMA address space, the guest VIO
drivers need to update TCE mappings very frequently - the virtual
network device is particularly bad. This means many slow exits to
qemu to emulate the H_PUT_TCE hypercall.
Sufficiently recent kernels allow this to be mitigated by implementing
H_PUT_TCE in the host kernel. To make use of this, however, qemu
needs to initialize the necessary TCE tables, and map them into itself
so that the VIO device implementations can retrieve the mappings when
they access guest memory (which is treated as a virtual DMA
operation).
This patch adds the necessary calls to use the KVM TCE acceleration.
If the kernel does not support acceleration, or there is some other
error creating the accelerated TCE table, then it will still fall back
to full userspace TCE implementation.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
At present, using the hypervisor aware Book3S-HV KVM will only work
with qemu on POWER7 CPUs. PPC970 CPUs also have hypervisor
capability, but they lack the VRMA feature which makes assigning guest
memory easier.
In order to allow KVM Book3S-HV on PPC970, we need to specially
allocate the first chunk of guest memory (the "Real Mode Area" or
RMA), so that it is physically contiguous.
Sufficiently recent host kernels allow such contiguous RMAs to be
allocated, with a kvm capability advertising whether the feature is
available and/or necessary on this hardware. This patch enables qemu
to use this support, thus allowing kvm acceleration of pseries qemu
machines on PPC970 hardware.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
---
agraf: fix to use memory api
Alex Graf has already made qemu support KVM for the pseries machine
when using the Book3S-PR KVM variant (which runs the guest in
usermode, emulating supervisor operations). This code allows gets us
very close to also working with KVM Book3S-HV (using the hypervisor
capabilities of recent POWER CPUs).
This patch moves us another step towards Book3S-HV support by
correctly handling SMT (multithreaded) POWER CPUs. There are two
parts to this:
* Querying KVM to check SMT capability, and if present, adjusting the
cpu numbers that qemu assigns to cause KVM to assign guest threads
to cores in the right way (this isn't automatic, because the POWER
HV support has a limitation that different threads on a single core
cannot be in different guests at the same time).
* Correctly informing the guest OS of the SMT thread to core mappings
via the device tree.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
When running with PR KVM, we need to set HIOR directly. Thankfully there
is now a new interface to set registers individually so we can just use that
and poke HIOR into the guest vcpu's HIOR register.
While at it, this also sets SDR1 because -M pseries requires it to run.
With this patch, -M pseries works properly with PR KVM.
Signed-off-by: Alexander Graf <agraf@suse.de>
Share the TLB array with KVM. This allows us to set the initial TLB
both on initial boot and reset, is useful for debugging, and could
eventually be used to support migration.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
When running PR style KVM, we need to tell the kernel that we want
to run in PAPR mode now. This means that we need to pass some more
register information down and enable papr mode. We also need to align
the HTAB to htab_size boundary.
Using this patch, -M pseries works with kvm even on non-hv kvm
implementations, as long as the preceding kernel patches are in.
Signed-off-by: Alexander Graf <agraf@suse.de>
---
v1 -> v2:
- match on CONFIG_PSERIES
v2 -> v3:
- remove HIOR pieces from PAPR patch (ABI breakage)
We need to find out the host's clock-frequency when running on KVM, so
let's export a respective function.
Signed-off-by: Alexander Graf <agraf@suse.de>
---
v1 -> v2:
- enable 64bit values
No longer needed with accompanied kernel headers.
CC: Alexander Graf <agraf@suse.de>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Required header support is now unconditionally available.
CC: Alexander Graf <agraf@suse.de>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
When compiling qemu with kvm support on BookE PPC machines, I get
the following error:
cc1: warnings being treated as errors
/tmp/qemu/target-ppc/kvm.c: In function 'kvm_arch_get_registers':
/tmp/qemu/target-ppc/kvm.c:188: error: unused variable 'sregs'
This is due to overly ambitious #ifdef'ery introduced in 90dc88.
Fix it by keeping code that doesn't depend on new headers alive
for the compiler, but never executed due to failing capability
checks.
CC: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
When compiling Qemu with older kernel headers, the PVR setting
mechanism isn't available yet. Unfortunately, back then I didn't add
a capability we could check against, so all we can do is add a configure
test to see if we support PVR setting. For BookE, we don't care yet.
This fixes compilation errors with KVM enabled on older kernel headers
(like 2.6.32).
Signed-off-by: Alexander Graf <agraf@suse.de>
Read them via KVM_GET_SREGS in kvm_arch_get_registers(),
and display them in "info registers".
Also get CR and PID from the existing KVM_GET_REGS.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Classic/server ppc has had SREGS for a while now (though I think not
always?), but it's still missing for booke. Check the capability before
calling KVM_SET_SREGS.
Without this, booke kvm fails to boot as of commit
84b4915dd2 (kvm: Handle kvm_init_vcpu
errors).
Also, don't write random stack state into the non-PVR sregs fields --
have kvm fill it in first.
Eventually booke will have sregs and it will have its own capability to
be tested here. However, we will want a way for platform code to request
to look like the actual CPU we're running on, especially if SoC devices
are being directly assigned.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
On ppc machines with hash table MMUs, the special purpose register SDR1
contains both the base address of the encoded size (hashed) page tables.
At present, we interpret the SDR1 value within the address translation
path. But because the encodings of the size for 32-bit and 64-bit are
different this makes for a confusing branch on the MMU type with a bunch
of curly shifts and masks in the middle of the translate path.
This patch cleans things up by moving the interpretation on SDR1 into the
helper function handling the write to the register. This leaves a simple
pre-sanitized base address and mask for the hash table in the CPUState
structure which is easier to work with in the translation path.
This makes the translation path more readable. It addresses the FIXME
comment currently in the mtsdr1 helper, by validating the SDR1 value during
interpretation. Finally it opens the way for emulating a pSeries-style
partition where the hash table used for translation is not mapped into
the guests's RAM.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This was done with:
sed -i 's/qemu_get_clock\>/qemu_get_clock_ns/' \
$(git grep -l 'qemu_get_clock\>' )
sed -i 's/qemu_new_timer\>/qemu_new_timer_ns/' \
$(git grep -l 'qemu_new_timer\>' )
after checking that get_clock and new_timer never occur twice
on the same line. There were no missed occurrences; however, even
if there had been, they would have been caught by the compiler.
There was exactly one false positive in qemu_run_timers:
- current_time = qemu_get_clock (clock);
+ current_time = qemu_get_clock_ns (clock);
which is of course not in this patch.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make the return code of kvm_arch_handle_exit directly usable for
kvm_cpu_exec. This is straightforward for x86 and ppc, just s390
would require more work. Avoid this for now by pushing the return code
translation logic into s390's kvm_arch_handle_exit.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We will broaden the scope of this function on x86 beyond irqchip events.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Commit 7a39fe5882 failed to convert the right arch function.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We do not check them, and the only arch with non-empty implementations
always returns 0 (this is also true for qemu-kvm).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Provide arch-independent kvm_on_sigbus* stubs to remove the #ifdef'ery
from cpus.c. This patch also fixes --disable-kvm build by providing the
missing kvm_on_sigbus_vcpu kvm-stub.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Instead of splattering the code with #ifdefs and runtime checks for
capabilities we cannot work without anyway, provide central test
infrastructure for verifying their availability both at build and
runtime.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Ensure that we stop the guest whenever we face a fatal or unknown exit
reason. If we stop, we also have to enforce a cpu loop exit.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
I get a warning on a signed comparison with an unsigned variable, so
let's make the variable signed and be happy.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Edgar E. Iglesias <edgar@axis.com>
KVM on PowerPC used to have completely broken interrupt logic. Usually,
interrupts work by having a PIC that pulls a line up/down, so the CPU knows
that an interrupt is active. This line stays active until some action is
done to the PIC to release the line.
On KVM for PPC, we just checked if there was an interrupt pending and pulled
a line in the kernel module. We never released it though, hoping that kernel
space would just declare an interrupt as released when injected - which is
wrong.
To fix this, we need to completely redesign the interrupt injection logic.
Whenever an interrupt line gets triggered, we need to notify kernel space
that the line is up. Whenever it gets released, we do the same. This way
we can assure that the interrupt state is always known to kernel space.
This fixes random stalls in KVM guests on PowerPC that were waiting for
an interrupt while everyone else thought they received it already.
Signed-off-by: Alexander Graf <agraf@suse.de>
On KVM for PPC we need to tell the guest which instructions to use when
doing a hypercall. The clean way to do this is to go through an ioctl
from userspace and passing it on to the guest using the device tree.
So let's do the qemu part here: read out the hypercall and pass it on
to the guest's fw_cfg so openBIOS can read it out and expose it again.
Signed-off-by: Alexander Graf <agraf@suse.de>
When running with --enable-io-thread the timer we have doesn't help,
because it doesn't wake up the CPU thread. So instead we need to
actually kick it.
While at it I refined the logic a bit to not dumbly trigger a timer
every 500ms, but rather do it more often after an interrupt got injected.
If there's no level based interrupt to be expected, we don't need the
timer anyways.
This makes qemu-system-ppc with --enable-io-thread work when using KVM.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Continue vcpu execution in case emulation failure happened while vcpu
was in userspace. In this case #UD will be injected into the guest
allowing guest OS to kill offending process and continue.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This grand cleanup drops all reset and vmsave/load related
synchronization points in favor of four(!) generic hooks:
- cpu_synchronize_all_states in qemu_savevm_state_complete
(initial sync from kernel before vmsave)
- cpu_synchronize_all_post_init in qemu_loadvm_state
(writeback after vmload)
- cpu_synchronize_all_post_init in main after machine init
- cpu_synchronize_all_post_reset in qemu_system_reset
(writeback after system reset)
These writeback points + the existing one of VCPU exec after
cpu_synchronize_state map on three levels of writeback:
- KVM_PUT_RUNTIME_STATE (during runtime, other VCPUs continue to run)
- KVM_PUT_RESET_STATE (on synchronous system reset, all VCPUs stopped)
- KVM_PUT_FULL_STATE (on init or vmload, all VCPUs stopped as well)
This level is passed to the arch-specific VCPU state writing function
that will decide which concrete substates need to be written. That way,
no writer of load, save or reset functions that interact with in-kernel
KVM states will ever have to worry about synchronization again. That
also means that a lot of reasons for races, segfaults and deadlocks are
eliminated.
cpu_synchronize_state remains untouched, just as Anthony suggested. We
continue to need it before reading or writing of VCPU states that are
also tracked by in-kernel KVM subsystems.
Consequently, this patch removes many cpu_synchronize_state calls that
are now redundant, just like remaining explicit register syncs.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
For some odd reason we sometimes hang inside KVM forever. I'd guess it's
a race condition where we actually have a level triggered interrupt, but
the infrastructure can't expose that yet, so the guest ACKs it, goes to
sleep and never gets notified that there's still an interrupt pending.
As a quick workaround, let's just wake up every 500 ms. That way we can
assure that we're always reinjecting interrupts in time.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Our guest systems need to know by how much the timebase increases every second,
so there usually is a "timebase-frequency" property in the cpu leaf of the
device tree.
This property is missing in OpenBIOS.
With qemu, Linux's fallback timebase speed and qemu's internal timebase speed
match up. With KVM, that is no longer true. The guest is running at the same
timebase speed as the host.
This leads to massive timing problems. On my test machine, a "sleep 2" takes
about 14 seconds with KVM enabled.
This patch exports the timebase frequency to OpenBIOS, so it can then put them
into the device tree.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>