2016-01-27 02:16:58 +08:00
|
|
|
#include "qemu/osdep.h"
|
include/qemu/osdep.h: Don't include qapi/error.h
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the
Error typedef. Since then, we've moved to include qemu/osdep.h
everywhere. Its file comment explains: "To avoid getting into
possible circular include dependencies, this file should not include
any other QEMU headers, with the exceptions of config-host.h,
compiler.h, os-posix.h and os-win32.h, all of which are doing a
similar job to this file and are under similar constraints."
qapi/error.h doesn't do a similar job, and it doesn't adhere to
similar constraints: it includes qapi-types.h. That's in excess of
100KiB of crap most .c files don't actually need.
Add the typedef to qemu/typedefs.h, and include that instead of
qapi/error.h. Include qapi/error.h in .c files that need it and don't
get it now. Include qapi-types.h in qom/object.h for uint16List.
Update scripts/clean-includes accordingly. Update it further to match
reality: replace config.h by config-target.h, add sysemu/os-posix.h,
sysemu/os-win32.h. Update the list of includes in the qemu/osdep.h
comment quoted above similarly.
This reduces the number of objects depending on qapi/error.h from "all
of them" to less than a third. Unfortunately, the number depending on
qapi-types.h shrinks only a little. More work is needed for that one.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
[Fix compilation without the spice devel packages. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-14 16:01:28 +08:00
|
|
|
#include "qapi/error.h"
|
2017-01-10 18:59:55 +08:00
|
|
|
#include "sysemu/hw_accel.h"
|
2012-12-18 01:20:04 +08:00
|
|
|
#include "sysemu/sysemu.h"
|
2015-12-15 20:16:16 +08:00
|
|
|
#include "qemu/log.h"
|
pseries: Implement HPT resizing
This patch implements hypercalls allowing a PAPR guest to resize its own
hash page table. This will eventually allow for more flexible memory
hotplug.
The implementation is partially asynchronous, handled in a special thread
running the hpt_prepare_thread() function. The state of a pending resize
is stored in SPAPR_MACHINE->pending_hpt.
The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or,
if one is already in progress, monitor it for completion. If there is an
existing HPT resize in progress that doesn't match the size specified in
the call, it will cancel it, replacing it with a new one matching the
given size.
The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only
be called successfully once H_RESIZE_HPT_PREPARE has successfully
completed initialization of a new HPT. The guest must ensure that there
are no concurrent accesses to the existing HPT while this is called (this
effectively means stop_machine() for Linux guests).
For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each
HPTE into the new HPT. This can have quite high latency, but it seems to
be of the order of typical migration downtime latencies for HPTs of size
up to ~2GiB (which would be used in a 256GiB guest).
In future we probably want to move more of the rehashing to the "prepare"
phase, by having H_ENTER and other hcalls update both current and
pending HPTs. That's a project for another day, but should be possible
without any changes to the guest interface.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-12 13:46:49 +08:00
|
|
|
#include "qemu/error-report.h"
|
2011-04-01 12:15:20 +08:00
|
|
|
#include "cpu.h"
|
2016-03-15 20:18:37 +08:00
|
|
|
#include "exec/exec-all.h"
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
#include "helper_regs.h"
|
2013-02-06 00:06:20 +08:00
|
|
|
#include "hw/ppc/spapr.h"
|
2013-03-12 08:31:18 +08:00
|
|
|
#include "mmu-hash64.h"
|
spapr: Implement processor compatibility in ibm, client-architecture-support
Modern Linux kernels support last POWERPC CPUs so when a kernel boots,
in most cases it can find a matching cpu_spec in the kernel's cpu_specs
list. However if the kernel is quite old, it may be missing a definition
of the actual CPU. To provide an ability for old kernels to work on modern
hardware, a Processor Compatibility Mode has been introduced
by the PowerISA specification.
>From the hardware prospective, it is supported by the Processor
Compatibility Register (PCR) which is defined in PowerISA. The register
enables one of the compatibility modes (2.05/2.06/2.07).
Since PCR is a hypervisor privileged register and cannot be
directly accessed from the guest, the mode selection is done via
ibm,client-architecture-support (CAS) RTAS call using which the guest
specifies what "raw" and "architected" CPU versions it supports.
QEMU works out the best match, changes a "cpu-version" property of
every CPU and notifies the guest about the change by setting these
properties in the buffer passed as a response on a custom H_CAS hypercall.
This implements ibm,client-architecture-support parameters parsing
(now only for PVRs) and cooks the device tree diff with new values for
"cpu-version", "ibm,ppc-interrupt-server#s" and
"ibm,ppc-interrupt-server#s" properties.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-23 10:26:57 +08:00
|
|
|
#include "cpu-models.h"
|
|
|
|
#include "trace.h"
|
|
|
|
#include "kvm_ppc.h"
|
2016-10-25 12:47:28 +08:00
|
|
|
#include "hw/ppc/spapr_ovec.h"
|
2017-03-20 07:46:46 +08:00
|
|
|
#include "qemu/error-report.h"
|
|
|
|
#include "mmu-book3s-v3.h"
|
2011-04-01 12:15:22 +08:00
|
|
|
|
2014-03-07 12:37:40 +08:00
|
|
|
struct SPRSyncState {
|
|
|
|
int spr;
|
|
|
|
target_ulong value;
|
|
|
|
target_ulong mask;
|
|
|
|
};
|
|
|
|
|
2016-10-31 17:36:08 +08:00
|
|
|
static void do_spr_sync(CPUState *cs, run_on_cpu_data arg)
|
2014-03-07 12:37:40 +08:00
|
|
|
{
|
2016-10-31 17:36:08 +08:00
|
|
|
struct SPRSyncState *s = arg.host_ptr;
|
2016-08-03 01:27:33 +08:00
|
|
|
PowerPCCPU *cpu = POWERPC_CPU(cs);
|
2014-03-07 12:37:40 +08:00
|
|
|
CPUPPCState *env = &cpu->env;
|
|
|
|
|
2016-08-03 01:27:33 +08:00
|
|
|
cpu_synchronize_state(cs);
|
2014-03-07 12:37:40 +08:00
|
|
|
env->spr[s->spr] &= ~s->mask;
|
|
|
|
env->spr[s->spr] |= s->value;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void set_spr(CPUState *cs, int spr, target_ulong value,
|
|
|
|
target_ulong mask)
|
|
|
|
{
|
|
|
|
struct SPRSyncState s = {
|
|
|
|
.spr = spr,
|
|
|
|
.value = value,
|
|
|
|
.mask = mask
|
|
|
|
};
|
2016-10-31 17:36:08 +08:00
|
|
|
run_on_cpu(cs, do_spr_sync, RUN_ON_CPU_HOST_PTR(&s));
|
2014-03-07 12:37:40 +08:00
|
|
|
}
|
|
|
|
|
2016-02-11 20:47:19 +08:00
|
|
|
static bool has_spr(PowerPCCPU *cpu, int spr)
|
|
|
|
{
|
|
|
|
/* We can test whether the SPR is defined by checking for a valid name */
|
|
|
|
return cpu->env.spr_cb[spr].name != NULL;
|
|
|
|
}
|
|
|
|
|
2017-02-21 11:00:16 +08:00
|
|
|
static inline bool valid_ptex(PowerPCCPU *cpu, target_ulong ptex)
|
2014-02-21 01:52:17 +08:00
|
|
|
{
|
|
|
|
/*
|
target/ppc: Eliminate htab_base and htab_mask variables
CPUPPCState includes fields htab_base and htab_mask which store the base
address (GPA) and size (as a mask) of the guest's hashed page table (HPT).
These are set when the SDR1 register is updated.
Keeping these in sync with the SDR1 is actually a little bit fiddly, and
probably not useful for performance, since keeping them expands the size of
CPUPPCState. It also makes some upcoming changes harder to implement.
This patch removes these fields, in favour of calculating them directly
from the SDR1 contents when necessary.
This does make a change to the behaviour of attempting to write a bad value
(invalid HPT size) to the SDR1 with an mtspr instruction. Previously, the
bad value would be stored in SDR1 and could be retrieved with a later
mfspr, but the HPT size as used by the softmmu would be, clamped to the
allowed values. Now, writing a bad value is treated as a no-op. An error
message is printed in both new and old versions.
I'm not sure which behaviour, if either, matches real hardware. I don't
think it matters that much, since it's pretty clear that if an OS writes
a bad value to SDR1, it's not going to boot.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2017-02-24 13:36:44 +08:00
|
|
|
* hash value/pteg group index is normalized by HPT mask
|
2014-02-21 01:52:17 +08:00
|
|
|
*/
|
target/ppc: Eliminate htab_base and htab_mask variables
CPUPPCState includes fields htab_base and htab_mask which store the base
address (GPA) and size (as a mask) of the guest's hashed page table (HPT).
These are set when the SDR1 register is updated.
Keeping these in sync with the SDR1 is actually a little bit fiddly, and
probably not useful for performance, since keeping them expands the size of
CPUPPCState. It also makes some upcoming changes harder to implement.
This patch removes these fields, in favour of calculating them directly
from the SDR1 contents when necessary.
This does make a change to the behaviour of attempting to write a bad value
(invalid HPT size) to the SDR1 with an mtspr instruction. Previously, the
bad value would be stored in SDR1 and could be retrieved with a later
mfspr, but the HPT size as used by the softmmu would be, clamped to the
allowed values. Now, writing a bad value is treated as a no-op. An error
message is printed in both new and old versions.
I'm not sure which behaviour, if either, matches real hardware. I don't
think it matters that much, since it's pretty clear that if an OS writes
a bad value to SDR1, it's not going to boot.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2017-02-24 13:36:44 +08:00
|
|
|
if (((ptex & ~7ULL) / HPTES_PER_GROUP) & ~ppc_hash64_hpt_mask(cpu)) {
|
2014-02-21 01:52:17 +08:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2016-01-21 11:48:43 +08:00
|
|
|
static bool is_ram_address(sPAPRMachineState *spapr, hwaddr addr)
|
|
|
|
{
|
|
|
|
MachineState *machine = MACHINE(spapr);
|
|
|
|
MemoryHotplugState *hpms = &spapr->hotplug_memory;
|
|
|
|
|
|
|
|
if (addr < machine->ram_size) {
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
if ((addr >= hpms->base)
|
|
|
|
&& ((addr - hpms->base) < memory_region_size(&hpms->mr))) {
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2015-07-02 14:23:04 +08:00
|
|
|
static target_ulong h_enter(PowerPCCPU *cpu, sPAPRMachineState *spapr,
|
2011-04-01 12:15:22 +08:00
|
|
|
target_ulong opcode, target_ulong *args)
|
|
|
|
{
|
|
|
|
target_ulong flags = args[0];
|
2017-02-21 11:00:16 +08:00
|
|
|
target_ulong ptex = args[1];
|
2011-04-01 12:15:22 +08:00
|
|
|
target_ulong pteh = args[2];
|
|
|
|
target_ulong ptel = args[3];
|
2016-07-01 15:10:10 +08:00
|
|
|
unsigned apshift;
|
2011-08-04 05:02:19 +08:00
|
|
|
target_ulong raddr;
|
2017-02-21 11:00:16 +08:00
|
|
|
target_ulong slot;
|
target/ppc: Cleanup HPTE accessors for 64-bit hash MMU
Accesses to the hashed page table (HPT) are complicated by the fact that
the HPT could be in one of three places:
1) Within guest memory - when we're emulating a full guest CPU at the
hardware level (e.g. powernv, mac99, g3beige)
2) Within qemu, but outside guest memory - when we're emulating user and
supervisor instructions within TCG, but instead of emulating
the CPU's hypervisor mode, we just emulate a hypervisor's behaviour
(pseries in TCG or KVM-PR)
3) Within the host kernel - a pseries machine using KVM-HV
acceleration. Mostly accesses to the HPT are handled by KVM,
but there are a few cases where qemu needs to access it via a
special fd for the purpose.
In order to batch accesses to the fd in case (3), we use a somewhat awkward
ppc_hash64_start_access() / ppc_hash64_stop_access() pair, which for case
(3) reads / releases several HPTEs from the kernel as a batch (usually a
whole PTEG). For cases (1) & (2) it just returns an address value. The
actual HPTE load helpers then need to interpret the returned token
differently in the 3 cases.
This patch keeps the same basic structure, but simplfiies the details.
First start_access() / stop_access() are renamed to map_hptes() and
unmap_hptes() to make their operation more obvious. Second, map_hptes()
now always returns a qemu pointer, which can always be used in the same way
by the load_hpte() helpers. In case (1) it comes from address_space_map()
in case (2) directly from qemu's HPT buffer and in case (3) from a
temporary buffer read from the KVM fd.
While we're at it, make things a bit more consistent in terms of types and
variable names: avoid variables named 'index' (it shadows index(3) which
can lead to confusing results), use 'hwaddr ptex' for HPTE indices and
uint64_t for each of the HPTE words, use ptex throughout the call stack
instead of pte_offset in some places (we still need that at the bottom
layer, but nowhere else).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-27 13:03:41 +08:00
|
|
|
const ppc_hash_pte64_t *hptes;
|
2011-04-01 12:15:22 +08:00
|
|
|
|
2016-07-01 15:10:10 +08:00
|
|
|
apshift = ppc_hash64_hpte_page_shift_noslb(cpu, pteh, ptel);
|
2016-01-27 09:01:20 +08:00
|
|
|
if (!apshift) {
|
|
|
|
/* Bad page size encoding */
|
|
|
|
return H_PARAMETER;
|
2011-04-01 12:15:22 +08:00
|
|
|
}
|
|
|
|
|
2016-01-27 09:01:20 +08:00
|
|
|
raddr = (ptel & HPTE64_R_RPN) & ~((1ULL << apshift) - 1);
|
2011-04-01 12:15:22 +08:00
|
|
|
|
2016-01-21 11:48:43 +08:00
|
|
|
if (is_ram_address(spapr, raddr)) {
|
2011-08-04 05:02:19 +08:00
|
|
|
/* Regular RAM - should have WIMG=0010 */
|
2013-03-12 08:31:18 +08:00
|
|
|
if ((ptel & HPTE64_R_WIMG) != HPTE64_R_M) {
|
2011-08-04 05:02:19 +08:00
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
} else {
|
2016-06-17 18:37:20 +08:00
|
|
|
target_ulong wimg_flags;
|
2011-08-04 05:02:19 +08:00
|
|
|
/* Looks like an IO address */
|
|
|
|
/* FIXME: What WIMG combinations could be sensible for IO?
|
|
|
|
* For now we allow WIMG=010x, but are there others? */
|
|
|
|
/* FIXME: Should we check against registered IO addresses? */
|
2016-06-17 18:37:20 +08:00
|
|
|
wimg_flags = (ptel & (HPTE64_R_W | HPTE64_R_I | HPTE64_R_M));
|
|
|
|
|
|
|
|
if (wimg_flags != HPTE64_R_I &&
|
|
|
|
wimg_flags != (HPTE64_R_I | HPTE64_R_M)) {
|
2011-08-04 05:02:19 +08:00
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
2011-04-01 12:15:22 +08:00
|
|
|
}
|
2011-08-04 05:02:19 +08:00
|
|
|
|
2011-04-01 12:15:22 +08:00
|
|
|
pteh &= ~0x60ULL;
|
|
|
|
|
2017-02-21 11:00:16 +08:00
|
|
|
if (!valid_ptex(cpu, ptex)) {
|
2011-04-01 12:15:22 +08:00
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
2014-02-21 01:52:24 +08:00
|
|
|
|
2017-02-21 11:00:16 +08:00
|
|
|
slot = ptex & 7ULL;
|
|
|
|
ptex = ptex & ~7ULL;
|
|
|
|
|
2011-04-01 12:15:22 +08:00
|
|
|
if (likely((flags & H_EXACT) == 0)) {
|
target/ppc: Cleanup HPTE accessors for 64-bit hash MMU
Accesses to the hashed page table (HPT) are complicated by the fact that
the HPT could be in one of three places:
1) Within guest memory - when we're emulating a full guest CPU at the
hardware level (e.g. powernv, mac99, g3beige)
2) Within qemu, but outside guest memory - when we're emulating user and
supervisor instructions within TCG, but instead of emulating
the CPU's hypervisor mode, we just emulate a hypervisor's behaviour
(pseries in TCG or KVM-PR)
3) Within the host kernel - a pseries machine using KVM-HV
acceleration. Mostly accesses to the HPT are handled by KVM,
but there are a few cases where qemu needs to access it via a
special fd for the purpose.
In order to batch accesses to the fd in case (3), we use a somewhat awkward
ppc_hash64_start_access() / ppc_hash64_stop_access() pair, which for case
(3) reads / releases several HPTEs from the kernel as a batch (usually a
whole PTEG). For cases (1) & (2) it just returns an address value. The
actual HPTE load helpers then need to interpret the returned token
differently in the 3 cases.
This patch keeps the same basic structure, but simplfiies the details.
First start_access() / stop_access() are renamed to map_hptes() and
unmap_hptes() to make their operation more obvious. Second, map_hptes()
now always returns a qemu pointer, which can always be used in the same way
by the load_hpte() helpers. In case (1) it comes from address_space_map()
in case (2) directly from qemu's HPT buffer and in case (3) from a
temporary buffer read from the KVM fd.
While we're at it, make things a bit more consistent in terms of types and
variable names: avoid variables named 'index' (it shadows index(3) which
can lead to confusing results), use 'hwaddr ptex' for HPTE indices and
uint64_t for each of the HPTE words, use ptex throughout the call stack
instead of pte_offset in some places (we still need that at the bottom
layer, but nowhere else).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-27 13:03:41 +08:00
|
|
|
hptes = ppc_hash64_map_hptes(cpu, ptex, HPTES_PER_GROUP);
|
2017-02-21 11:00:16 +08:00
|
|
|
for (slot = 0; slot < 8; slot++) {
|
target/ppc: Cleanup HPTE accessors for 64-bit hash MMU
Accesses to the hashed page table (HPT) are complicated by the fact that
the HPT could be in one of three places:
1) Within guest memory - when we're emulating a full guest CPU at the
hardware level (e.g. powernv, mac99, g3beige)
2) Within qemu, but outside guest memory - when we're emulating user and
supervisor instructions within TCG, but instead of emulating
the CPU's hypervisor mode, we just emulate a hypervisor's behaviour
(pseries in TCG or KVM-PR)
3) Within the host kernel - a pseries machine using KVM-HV
acceleration. Mostly accesses to the HPT are handled by KVM,
but there are a few cases where qemu needs to access it via a
special fd for the purpose.
In order to batch accesses to the fd in case (3), we use a somewhat awkward
ppc_hash64_start_access() / ppc_hash64_stop_access() pair, which for case
(3) reads / releases several HPTEs from the kernel as a batch (usually a
whole PTEG). For cases (1) & (2) it just returns an address value. The
actual HPTE load helpers then need to interpret the returned token
differently in the 3 cases.
This patch keeps the same basic structure, but simplfiies the details.
First start_access() / stop_access() are renamed to map_hptes() and
unmap_hptes() to make their operation more obvious. Second, map_hptes()
now always returns a qemu pointer, which can always be used in the same way
by the load_hpte() helpers. In case (1) it comes from address_space_map()
in case (2) directly from qemu's HPT buffer and in case (3) from a
temporary buffer read from the KVM fd.
While we're at it, make things a bit more consistent in terms of types and
variable names: avoid variables named 'index' (it shadows index(3) which
can lead to confusing results), use 'hwaddr ptex' for HPTE indices and
uint64_t for each of the HPTE words, use ptex throughout the call stack
instead of pte_offset in some places (we still need that at the bottom
layer, but nowhere else).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-27 13:03:41 +08:00
|
|
|
if (!(ppc_hash64_hpte0(cpu, hptes, slot) & HPTE64_V_VALID)) {
|
2011-04-01 12:15:22 +08:00
|
|
|
break;
|
|
|
|
}
|
2014-03-14 21:51:49 +08:00
|
|
|
}
|
target/ppc: Cleanup HPTE accessors for 64-bit hash MMU
Accesses to the hashed page table (HPT) are complicated by the fact that
the HPT could be in one of three places:
1) Within guest memory - when we're emulating a full guest CPU at the
hardware level (e.g. powernv, mac99, g3beige)
2) Within qemu, but outside guest memory - when we're emulating user and
supervisor instructions within TCG, but instead of emulating
the CPU's hypervisor mode, we just emulate a hypervisor's behaviour
(pseries in TCG or KVM-PR)
3) Within the host kernel - a pseries machine using KVM-HV
acceleration. Mostly accesses to the HPT are handled by KVM,
but there are a few cases where qemu needs to access it via a
special fd for the purpose.
In order to batch accesses to the fd in case (3), we use a somewhat awkward
ppc_hash64_start_access() / ppc_hash64_stop_access() pair, which for case
(3) reads / releases several HPTEs from the kernel as a batch (usually a
whole PTEG). For cases (1) & (2) it just returns an address value. The
actual HPTE load helpers then need to interpret the returned token
differently in the 3 cases.
This patch keeps the same basic structure, but simplfiies the details.
First start_access() / stop_access() are renamed to map_hptes() and
unmap_hptes() to make their operation more obvious. Second, map_hptes()
now always returns a qemu pointer, which can always be used in the same way
by the load_hpte() helpers. In case (1) it comes from address_space_map()
in case (2) directly from qemu's HPT buffer and in case (3) from a
temporary buffer read from the KVM fd.
While we're at it, make things a bit more consistent in terms of types and
variable names: avoid variables named 'index' (it shadows index(3) which
can lead to confusing results), use 'hwaddr ptex' for HPTE indices and
uint64_t for each of the HPTE words, use ptex throughout the call stack
instead of pte_offset in some places (we still need that at the bottom
layer, but nowhere else).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-27 13:03:41 +08:00
|
|
|
ppc_hash64_unmap_hptes(cpu, hptes, ptex, HPTES_PER_GROUP);
|
2017-02-21 11:00:16 +08:00
|
|
|
if (slot == 8) {
|
2014-03-14 21:51:49 +08:00
|
|
|
return H_PTEG_FULL;
|
|
|
|
}
|
2011-04-01 12:15:22 +08:00
|
|
|
} else {
|
target/ppc: Cleanup HPTE accessors for 64-bit hash MMU
Accesses to the hashed page table (HPT) are complicated by the fact that
the HPT could be in one of three places:
1) Within guest memory - when we're emulating a full guest CPU at the
hardware level (e.g. powernv, mac99, g3beige)
2) Within qemu, but outside guest memory - when we're emulating user and
supervisor instructions within TCG, but instead of emulating
the CPU's hypervisor mode, we just emulate a hypervisor's behaviour
(pseries in TCG or KVM-PR)
3) Within the host kernel - a pseries machine using KVM-HV
acceleration. Mostly accesses to the HPT are handled by KVM,
but there are a few cases where qemu needs to access it via a
special fd for the purpose.
In order to batch accesses to the fd in case (3), we use a somewhat awkward
ppc_hash64_start_access() / ppc_hash64_stop_access() pair, which for case
(3) reads / releases several HPTEs from the kernel as a batch (usually a
whole PTEG). For cases (1) & (2) it just returns an address value. The
actual HPTE load helpers then need to interpret the returned token
differently in the 3 cases.
This patch keeps the same basic structure, but simplfiies the details.
First start_access() / stop_access() are renamed to map_hptes() and
unmap_hptes() to make their operation more obvious. Second, map_hptes()
now always returns a qemu pointer, which can always be used in the same way
by the load_hpte() helpers. In case (1) it comes from address_space_map()
in case (2) directly from qemu's HPT buffer and in case (3) from a
temporary buffer read from the KVM fd.
While we're at it, make things a bit more consistent in terms of types and
variable names: avoid variables named 'index' (it shadows index(3) which
can lead to confusing results), use 'hwaddr ptex' for HPTE indices and
uint64_t for each of the HPTE words, use ptex throughout the call stack
instead of pte_offset in some places (we still need that at the bottom
layer, but nowhere else).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-27 13:03:41 +08:00
|
|
|
hptes = ppc_hash64_map_hptes(cpu, ptex + slot, 1);
|
|
|
|
if (ppc_hash64_hpte0(cpu, hptes, 0) & HPTE64_V_VALID) {
|
|
|
|
ppc_hash64_unmap_hptes(cpu, hptes, ptex + slot, 1);
|
2011-04-01 12:15:22 +08:00
|
|
|
return H_PTEG_FULL;
|
|
|
|
}
|
target/ppc: Cleanup HPTE accessors for 64-bit hash MMU
Accesses to the hashed page table (HPT) are complicated by the fact that
the HPT could be in one of three places:
1) Within guest memory - when we're emulating a full guest CPU at the
hardware level (e.g. powernv, mac99, g3beige)
2) Within qemu, but outside guest memory - when we're emulating user and
supervisor instructions within TCG, but instead of emulating
the CPU's hypervisor mode, we just emulate a hypervisor's behaviour
(pseries in TCG or KVM-PR)
3) Within the host kernel - a pseries machine using KVM-HV
acceleration. Mostly accesses to the HPT are handled by KVM,
but there are a few cases where qemu needs to access it via a
special fd for the purpose.
In order to batch accesses to the fd in case (3), we use a somewhat awkward
ppc_hash64_start_access() / ppc_hash64_stop_access() pair, which for case
(3) reads / releases several HPTEs from the kernel as a batch (usually a
whole PTEG). For cases (1) & (2) it just returns an address value. The
actual HPTE load helpers then need to interpret the returned token
differently in the 3 cases.
This patch keeps the same basic structure, but simplfiies the details.
First start_access() / stop_access() are renamed to map_hptes() and
unmap_hptes() to make their operation more obvious. Second, map_hptes()
now always returns a qemu pointer, which can always be used in the same way
by the load_hpte() helpers. In case (1) it comes from address_space_map()
in case (2) directly from qemu's HPT buffer and in case (3) from a
temporary buffer read from the KVM fd.
While we're at it, make things a bit more consistent in terms of types and
variable names: avoid variables named 'index' (it shadows index(3) which
can lead to confusing results), use 'hwaddr ptex' for HPTE indices and
uint64_t for each of the HPTE words, use ptex throughout the call stack
instead of pte_offset in some places (we still need that at the bottom
layer, but nowhere else).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-27 13:03:41 +08:00
|
|
|
ppc_hash64_unmap_hptes(cpu, hptes, ptex, 1);
|
2011-04-01 12:15:22 +08:00
|
|
|
}
|
2014-02-21 01:52:24 +08:00
|
|
|
|
2017-02-21 11:00:16 +08:00
|
|
|
ppc_hash64_store_hpte(cpu, ptex + slot, pteh | HPTE64_V_HPTE_DIRTY, ptel);
|
2011-04-01 12:15:22 +08:00
|
|
|
|
2017-02-21 11:00:16 +08:00
|
|
|
args[0] = ptex + slot;
|
2011-04-01 12:15:22 +08:00
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
|
2013-06-25 01:48:47 +08:00
|
|
|
typedef enum {
|
2011-08-31 23:50:50 +08:00
|
|
|
REMOVE_SUCCESS = 0,
|
|
|
|
REMOVE_NOT_FOUND = 1,
|
|
|
|
REMOVE_PARM = 2,
|
|
|
|
REMOVE_HW = 3,
|
2013-06-25 01:48:47 +08:00
|
|
|
} RemoveResult;
|
2011-08-31 23:50:50 +08:00
|
|
|
|
2016-01-14 12:33:27 +08:00
|
|
|
static RemoveResult remove_hpte(PowerPCCPU *cpu, target_ulong ptex,
|
2011-08-31 23:50:50 +08:00
|
|
|
target_ulong avpn,
|
|
|
|
target_ulong flags,
|
|
|
|
target_ulong *vp, target_ulong *rp)
|
2011-04-01 12:15:22 +08:00
|
|
|
{
|
target/ppc: Cleanup HPTE accessors for 64-bit hash MMU
Accesses to the hashed page table (HPT) are complicated by the fact that
the HPT could be in one of three places:
1) Within guest memory - when we're emulating a full guest CPU at the
hardware level (e.g. powernv, mac99, g3beige)
2) Within qemu, but outside guest memory - when we're emulating user and
supervisor instructions within TCG, but instead of emulating
the CPU's hypervisor mode, we just emulate a hypervisor's behaviour
(pseries in TCG or KVM-PR)
3) Within the host kernel - a pseries machine using KVM-HV
acceleration. Mostly accesses to the HPT are handled by KVM,
but there are a few cases where qemu needs to access it via a
special fd for the purpose.
In order to batch accesses to the fd in case (3), we use a somewhat awkward
ppc_hash64_start_access() / ppc_hash64_stop_access() pair, which for case
(3) reads / releases several HPTEs from the kernel as a batch (usually a
whole PTEG). For cases (1) & (2) it just returns an address value. The
actual HPTE load helpers then need to interpret the returned token
differently in the 3 cases.
This patch keeps the same basic structure, but simplfiies the details.
First start_access() / stop_access() are renamed to map_hptes() and
unmap_hptes() to make their operation more obvious. Second, map_hptes()
now always returns a qemu pointer, which can always be used in the same way
by the load_hpte() helpers. In case (1) it comes from address_space_map()
in case (2) directly from qemu's HPT buffer and in case (3) from a
temporary buffer read from the KVM fd.
While we're at it, make things a bit more consistent in terms of types and
variable names: avoid variables named 'index' (it shadows index(3) which
can lead to confusing results), use 'hwaddr ptex' for HPTE indices and
uint64_t for each of the HPTE words, use ptex throughout the call stack
instead of pte_offset in some places (we still need that at the bottom
layer, but nowhere else).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-27 13:03:41 +08:00
|
|
|
const ppc_hash_pte64_t *hptes;
|
2016-01-15 13:12:09 +08:00
|
|
|
target_ulong v, r;
|
2011-04-01 12:15:22 +08:00
|
|
|
|
2017-02-21 11:00:16 +08:00
|
|
|
if (!valid_ptex(cpu, ptex)) {
|
2011-08-31 23:50:50 +08:00
|
|
|
return REMOVE_PARM;
|
2011-04-01 12:15:22 +08:00
|
|
|
}
|
|
|
|
|
target/ppc: Cleanup HPTE accessors for 64-bit hash MMU
Accesses to the hashed page table (HPT) are complicated by the fact that
the HPT could be in one of three places:
1) Within guest memory - when we're emulating a full guest CPU at the
hardware level (e.g. powernv, mac99, g3beige)
2) Within qemu, but outside guest memory - when we're emulating user and
supervisor instructions within TCG, but instead of emulating
the CPU's hypervisor mode, we just emulate a hypervisor's behaviour
(pseries in TCG or KVM-PR)
3) Within the host kernel - a pseries machine using KVM-HV
acceleration. Mostly accesses to the HPT are handled by KVM,
but there are a few cases where qemu needs to access it via a
special fd for the purpose.
In order to batch accesses to the fd in case (3), we use a somewhat awkward
ppc_hash64_start_access() / ppc_hash64_stop_access() pair, which for case
(3) reads / releases several HPTEs from the kernel as a batch (usually a
whole PTEG). For cases (1) & (2) it just returns an address value. The
actual HPTE load helpers then need to interpret the returned token
differently in the 3 cases.
This patch keeps the same basic structure, but simplfiies the details.
First start_access() / stop_access() are renamed to map_hptes() and
unmap_hptes() to make their operation more obvious. Second, map_hptes()
now always returns a qemu pointer, which can always be used in the same way
by the load_hpte() helpers. In case (1) it comes from address_space_map()
in case (2) directly from qemu's HPT buffer and in case (3) from a
temporary buffer read from the KVM fd.
While we're at it, make things a bit more consistent in terms of types and
variable names: avoid variables named 'index' (it shadows index(3) which
can lead to confusing results), use 'hwaddr ptex' for HPTE indices and
uint64_t for each of the HPTE words, use ptex throughout the call stack
instead of pte_offset in some places (we still need that at the bottom
layer, but nowhere else).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-27 13:03:41 +08:00
|
|
|
hptes = ppc_hash64_map_hptes(cpu, ptex, 1);
|
|
|
|
v = ppc_hash64_hpte0(cpu, hptes, 0);
|
|
|
|
r = ppc_hash64_hpte1(cpu, hptes, 0);
|
|
|
|
ppc_hash64_unmap_hptes(cpu, hptes, ptex, 1);
|
2011-04-01 12:15:22 +08:00
|
|
|
|
2013-03-12 08:31:18 +08:00
|
|
|
if ((v & HPTE64_V_VALID) == 0 ||
|
2011-04-01 12:15:22 +08:00
|
|
|
((flags & H_AVPN) && (v & ~0x7fULL) != avpn) ||
|
|
|
|
((flags & H_ANDCOND) && (v & avpn) != 0)) {
|
2011-08-31 23:50:50 +08:00
|
|
|
return REMOVE_NOT_FOUND;
|
2011-04-01 12:15:22 +08:00
|
|
|
}
|
2012-09-21 01:42:30 +08:00
|
|
|
*vp = v;
|
2011-08-31 23:50:50 +08:00
|
|
|
*rp = r;
|
2016-01-14 12:33:27 +08:00
|
|
|
ppc_hash64_store_hpte(cpu, ptex, HPTE64_V_HPTE_DIRTY, 0);
|
2016-01-15 13:12:09 +08:00
|
|
|
ppc_hash64_tlb_flush_hpte(cpu, ptex, v, r);
|
2011-08-31 23:50:50 +08:00
|
|
|
return REMOVE_SUCCESS;
|
|
|
|
}
|
|
|
|
|
2015-07-02 14:23:04 +08:00
|
|
|
static target_ulong h_remove(PowerPCCPU *cpu, sPAPRMachineState *spapr,
|
2011-08-31 23:50:50 +08:00
|
|
|
target_ulong opcode, target_ulong *args)
|
|
|
|
{
|
ppc: Do some batching of TCG tlb flushes
On ppc64 especially, we flush the tlb on any slbie or tlbie instruction.
However, those instructions often come in bursts of 3 or more (context
switch will favor a series of slbie's for example to an slbia if the
SLB has less than a certain number of entries in it, and tlbie's can
happen in a series, with PAPR, H_BULK_REMOVE can remove up to 4 entries
at a time.
Doing a tlb_flush() each time is a waste of time. We end up doing a memset
of the whole TLB, reloading it for the next instruction, memset'ing again,
etc...
Those instructions don't have to take effect immediately. For slbie, they
can wait for the next context synchronizing event. For tlbie, the next
tlbsync.
This implements batching by keeping a flag that indicates that we have a
TLB in need of flushing. We check it on interrupts, rfi's, isync's and
tlbsync and flush the TLB if needed.
This reduces the number of tlb_flush() on a boot to a ubuntu installer
first dialog screen from roughly 360K down to 36K.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: added a 'CPUPPCState *' variable in h_remove() and
h_bulk_remove() ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: removed spurious whitespace change, use 0/1 not true/false
consistently, since tlb_need_flush has int type]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-04 00:03:25 +08:00
|
|
|
CPUPPCState *env = &cpu->env;
|
2011-08-31 23:50:50 +08:00
|
|
|
target_ulong flags = args[0];
|
2017-02-21 11:00:16 +08:00
|
|
|
target_ulong ptex = args[1];
|
2011-08-31 23:50:50 +08:00
|
|
|
target_ulong avpn = args[2];
|
2013-06-25 01:48:47 +08:00
|
|
|
RemoveResult ret;
|
2011-08-31 23:50:50 +08:00
|
|
|
|
2017-02-21 11:00:16 +08:00
|
|
|
ret = remove_hpte(cpu, ptex, avpn, flags,
|
2011-08-31 23:50:50 +08:00
|
|
|
&args[0], &args[1]);
|
|
|
|
|
|
|
|
switch (ret) {
|
|
|
|
case REMOVE_SUCCESS:
|
2016-09-21 00:35:00 +08:00
|
|
|
check_tlb_flush(env, true);
|
2011-08-31 23:50:50 +08:00
|
|
|
return H_SUCCESS;
|
|
|
|
|
|
|
|
case REMOVE_NOT_FOUND:
|
|
|
|
return H_NOT_FOUND;
|
|
|
|
|
|
|
|
case REMOVE_PARM:
|
|
|
|
return H_PARAMETER;
|
|
|
|
|
|
|
|
case REMOVE_HW:
|
|
|
|
return H_HARDWARE;
|
|
|
|
}
|
|
|
|
|
2013-06-29 21:47:26 +08:00
|
|
|
g_assert_not_reached();
|
2011-08-31 23:50:50 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
#define H_BULK_REMOVE_TYPE 0xc000000000000000ULL
|
|
|
|
#define H_BULK_REMOVE_REQUEST 0x4000000000000000ULL
|
|
|
|
#define H_BULK_REMOVE_RESPONSE 0x8000000000000000ULL
|
|
|
|
#define H_BULK_REMOVE_END 0xc000000000000000ULL
|
|
|
|
#define H_BULK_REMOVE_CODE 0x3000000000000000ULL
|
|
|
|
#define H_BULK_REMOVE_SUCCESS 0x0000000000000000ULL
|
|
|
|
#define H_BULK_REMOVE_NOT_FOUND 0x1000000000000000ULL
|
|
|
|
#define H_BULK_REMOVE_PARM 0x2000000000000000ULL
|
|
|
|
#define H_BULK_REMOVE_HW 0x3000000000000000ULL
|
|
|
|
#define H_BULK_REMOVE_RC 0x0c00000000000000ULL
|
|
|
|
#define H_BULK_REMOVE_FLAGS 0x0300000000000000ULL
|
|
|
|
#define H_BULK_REMOVE_ABSOLUTE 0x0000000000000000ULL
|
|
|
|
#define H_BULK_REMOVE_ANDCOND 0x0100000000000000ULL
|
|
|
|
#define H_BULK_REMOVE_AVPN 0x0200000000000000ULL
|
|
|
|
#define H_BULK_REMOVE_PTEX 0x00ffffffffffffffULL
|
|
|
|
|
|
|
|
#define H_BULK_REMOVE_MAX_BATCH 4
|
|
|
|
|
2015-07-02 14:23:04 +08:00
|
|
|
static target_ulong h_bulk_remove(PowerPCCPU *cpu, sPAPRMachineState *spapr,
|
2011-08-31 23:50:50 +08:00
|
|
|
target_ulong opcode, target_ulong *args)
|
|
|
|
{
|
ppc: Do some batching of TCG tlb flushes
On ppc64 especially, we flush the tlb on any slbie or tlbie instruction.
However, those instructions often come in bursts of 3 or more (context
switch will favor a series of slbie's for example to an slbia if the
SLB has less than a certain number of entries in it, and tlbie's can
happen in a series, with PAPR, H_BULK_REMOVE can remove up to 4 entries
at a time.
Doing a tlb_flush() each time is a waste of time. We end up doing a memset
of the whole TLB, reloading it for the next instruction, memset'ing again,
etc...
Those instructions don't have to take effect immediately. For slbie, they
can wait for the next context synchronizing event. For tlbie, the next
tlbsync.
This implements batching by keeping a flag that indicates that we have a
TLB in need of flushing. We check it on interrupts, rfi's, isync's and
tlbsync and flush the TLB if needed.
This reduces the number of tlb_flush() on a boot to a ubuntu installer
first dialog screen from roughly 360K down to 36K.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: added a 'CPUPPCState *' variable in h_remove() and
h_bulk_remove() ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: removed spurious whitespace change, use 0/1 not true/false
consistently, since tlb_need_flush has int type]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-04 00:03:25 +08:00
|
|
|
CPUPPCState *env = &cpu->env;
|
2011-08-31 23:50:50 +08:00
|
|
|
int i;
|
ppc: Do some batching of TCG tlb flushes
On ppc64 especially, we flush the tlb on any slbie or tlbie instruction.
However, those instructions often come in bursts of 3 or more (context
switch will favor a series of slbie's for example to an slbia if the
SLB has less than a certain number of entries in it, and tlbie's can
happen in a series, with PAPR, H_BULK_REMOVE can remove up to 4 entries
at a time.
Doing a tlb_flush() each time is a waste of time. We end up doing a memset
of the whole TLB, reloading it for the next instruction, memset'ing again,
etc...
Those instructions don't have to take effect immediately. For slbie, they
can wait for the next context synchronizing event. For tlbie, the next
tlbsync.
This implements batching by keeping a flag that indicates that we have a
TLB in need of flushing. We check it on interrupts, rfi's, isync's and
tlbsync and flush the TLB if needed.
This reduces the number of tlb_flush() on a boot to a ubuntu installer
first dialog screen from roughly 360K down to 36K.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: added a 'CPUPPCState *' variable in h_remove() and
h_bulk_remove() ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: removed spurious whitespace change, use 0/1 not true/false
consistently, since tlb_need_flush has int type]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-04 00:03:25 +08:00
|
|
|
target_ulong rc = H_SUCCESS;
|
2011-08-31 23:50:50 +08:00
|
|
|
|
|
|
|
for (i = 0; i < H_BULK_REMOVE_MAX_BATCH; i++) {
|
|
|
|
target_ulong *tsh = &args[i*2];
|
|
|
|
target_ulong tsl = args[i*2 + 1];
|
|
|
|
target_ulong v, r, ret;
|
|
|
|
|
|
|
|
if ((*tsh & H_BULK_REMOVE_TYPE) == H_BULK_REMOVE_END) {
|
|
|
|
break;
|
|
|
|
} else if ((*tsh & H_BULK_REMOVE_TYPE) != H_BULK_REMOVE_REQUEST) {
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
|
|
|
|
*tsh &= H_BULK_REMOVE_PTEX | H_BULK_REMOVE_FLAGS;
|
|
|
|
*tsh |= H_BULK_REMOVE_RESPONSE;
|
|
|
|
|
|
|
|
if ((*tsh & H_BULK_REMOVE_ANDCOND) && (*tsh & H_BULK_REMOVE_AVPN)) {
|
|
|
|
*tsh |= H_BULK_REMOVE_PARM;
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
|
2016-01-14 12:33:27 +08:00
|
|
|
ret = remove_hpte(cpu, *tsh & H_BULK_REMOVE_PTEX, tsl,
|
2011-08-31 23:50:50 +08:00
|
|
|
(*tsh & H_BULK_REMOVE_FLAGS) >> 26,
|
|
|
|
&v, &r);
|
|
|
|
|
|
|
|
*tsh |= ret << 60;
|
|
|
|
|
|
|
|
switch (ret) {
|
|
|
|
case REMOVE_SUCCESS:
|
2013-03-12 08:31:18 +08:00
|
|
|
*tsh |= (r & (HPTE64_R_C | HPTE64_R_R)) << 43;
|
2011-08-31 23:50:50 +08:00
|
|
|
break;
|
|
|
|
|
|
|
|
case REMOVE_PARM:
|
ppc: Do some batching of TCG tlb flushes
On ppc64 especially, we flush the tlb on any slbie or tlbie instruction.
However, those instructions often come in bursts of 3 or more (context
switch will favor a series of slbie's for example to an slbia if the
SLB has less than a certain number of entries in it, and tlbie's can
happen in a series, with PAPR, H_BULK_REMOVE can remove up to 4 entries
at a time.
Doing a tlb_flush() each time is a waste of time. We end up doing a memset
of the whole TLB, reloading it for the next instruction, memset'ing again,
etc...
Those instructions don't have to take effect immediately. For slbie, they
can wait for the next context synchronizing event. For tlbie, the next
tlbsync.
This implements batching by keeping a flag that indicates that we have a
TLB in need of flushing. We check it on interrupts, rfi's, isync's and
tlbsync and flush the TLB if needed.
This reduces the number of tlb_flush() on a boot to a ubuntu installer
first dialog screen from roughly 360K down to 36K.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: added a 'CPUPPCState *' variable in h_remove() and
h_bulk_remove() ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: removed spurious whitespace change, use 0/1 not true/false
consistently, since tlb_need_flush has int type]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-04 00:03:25 +08:00
|
|
|
rc = H_PARAMETER;
|
|
|
|
goto exit;
|
2011-08-31 23:50:50 +08:00
|
|
|
|
|
|
|
case REMOVE_HW:
|
ppc: Do some batching of TCG tlb flushes
On ppc64 especially, we flush the tlb on any slbie or tlbie instruction.
However, those instructions often come in bursts of 3 or more (context
switch will favor a series of slbie's for example to an slbia if the
SLB has less than a certain number of entries in it, and tlbie's can
happen in a series, with PAPR, H_BULK_REMOVE can remove up to 4 entries
at a time.
Doing a tlb_flush() each time is a waste of time. We end up doing a memset
of the whole TLB, reloading it for the next instruction, memset'ing again,
etc...
Those instructions don't have to take effect immediately. For slbie, they
can wait for the next context synchronizing event. For tlbie, the next
tlbsync.
This implements batching by keeping a flag that indicates that we have a
TLB in need of flushing. We check it on interrupts, rfi's, isync's and
tlbsync and flush the TLB if needed.
This reduces the number of tlb_flush() on a boot to a ubuntu installer
first dialog screen from roughly 360K down to 36K.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: added a 'CPUPPCState *' variable in h_remove() and
h_bulk_remove() ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: removed spurious whitespace change, use 0/1 not true/false
consistently, since tlb_need_flush has int type]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-04 00:03:25 +08:00
|
|
|
rc = H_HARDWARE;
|
|
|
|
goto exit;
|
2011-08-31 23:50:50 +08:00
|
|
|
}
|
|
|
|
}
|
ppc: Do some batching of TCG tlb flushes
On ppc64 especially, we flush the tlb on any slbie or tlbie instruction.
However, those instructions often come in bursts of 3 or more (context
switch will favor a series of slbie's for example to an slbia if the
SLB has less than a certain number of entries in it, and tlbie's can
happen in a series, with PAPR, H_BULK_REMOVE can remove up to 4 entries
at a time.
Doing a tlb_flush() each time is a waste of time. We end up doing a memset
of the whole TLB, reloading it for the next instruction, memset'ing again,
etc...
Those instructions don't have to take effect immediately. For slbie, they
can wait for the next context synchronizing event. For tlbie, the next
tlbsync.
This implements batching by keeping a flag that indicates that we have a
TLB in need of flushing. We check it on interrupts, rfi's, isync's and
tlbsync and flush the TLB if needed.
This reduces the number of tlb_flush() on a boot to a ubuntu installer
first dialog screen from roughly 360K down to 36K.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: added a 'CPUPPCState *' variable in h_remove() and
h_bulk_remove() ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: removed spurious whitespace change, use 0/1 not true/false
consistently, since tlb_need_flush has int type]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-04 00:03:25 +08:00
|
|
|
exit:
|
2016-09-21 00:35:00 +08:00
|
|
|
check_tlb_flush(env, true);
|
2011-08-31 23:50:50 +08:00
|
|
|
|
ppc: Do some batching of TCG tlb flushes
On ppc64 especially, we flush the tlb on any slbie or tlbie instruction.
However, those instructions often come in bursts of 3 or more (context
switch will favor a series of slbie's for example to an slbia if the
SLB has less than a certain number of entries in it, and tlbie's can
happen in a series, with PAPR, H_BULK_REMOVE can remove up to 4 entries
at a time.
Doing a tlb_flush() each time is a waste of time. We end up doing a memset
of the whole TLB, reloading it for the next instruction, memset'ing again,
etc...
Those instructions don't have to take effect immediately. For slbie, they
can wait for the next context synchronizing event. For tlbie, the next
tlbsync.
This implements batching by keeping a flag that indicates that we have a
TLB in need of flushing. We check it on interrupts, rfi's, isync's and
tlbsync and flush the TLB if needed.
This reduces the number of tlb_flush() on a boot to a ubuntu installer
first dialog screen from roughly 360K down to 36K.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: added a 'CPUPPCState *' variable in h_remove() and
h_bulk_remove() ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: removed spurious whitespace change, use 0/1 not true/false
consistently, since tlb_need_flush has int type]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-04 00:03:25 +08:00
|
|
|
return rc;
|
2011-04-01 12:15:22 +08:00
|
|
|
}
|
|
|
|
|
2015-07-02 14:23:04 +08:00
|
|
|
static target_ulong h_protect(PowerPCCPU *cpu, sPAPRMachineState *spapr,
|
2011-04-01 12:15:22 +08:00
|
|
|
target_ulong opcode, target_ulong *args)
|
|
|
|
{
|
2012-05-03 12:23:01 +08:00
|
|
|
CPUPPCState *env = &cpu->env;
|
2011-04-01 12:15:22 +08:00
|
|
|
target_ulong flags = args[0];
|
2017-02-21 11:00:16 +08:00
|
|
|
target_ulong ptex = args[1];
|
2011-04-01 12:15:22 +08:00
|
|
|
target_ulong avpn = args[2];
|
target/ppc: Cleanup HPTE accessors for 64-bit hash MMU
Accesses to the hashed page table (HPT) are complicated by the fact that
the HPT could be in one of three places:
1) Within guest memory - when we're emulating a full guest CPU at the
hardware level (e.g. powernv, mac99, g3beige)
2) Within qemu, but outside guest memory - when we're emulating user and
supervisor instructions within TCG, but instead of emulating
the CPU's hypervisor mode, we just emulate a hypervisor's behaviour
(pseries in TCG or KVM-PR)
3) Within the host kernel - a pseries machine using KVM-HV
acceleration. Mostly accesses to the HPT are handled by KVM,
but there are a few cases where qemu needs to access it via a
special fd for the purpose.
In order to batch accesses to the fd in case (3), we use a somewhat awkward
ppc_hash64_start_access() / ppc_hash64_stop_access() pair, which for case
(3) reads / releases several HPTEs from the kernel as a batch (usually a
whole PTEG). For cases (1) & (2) it just returns an address value. The
actual HPTE load helpers then need to interpret the returned token
differently in the 3 cases.
This patch keeps the same basic structure, but simplfiies the details.
First start_access() / stop_access() are renamed to map_hptes() and
unmap_hptes() to make their operation more obvious. Second, map_hptes()
now always returns a qemu pointer, which can always be used in the same way
by the load_hpte() helpers. In case (1) it comes from address_space_map()
in case (2) directly from qemu's HPT buffer and in case (3) from a
temporary buffer read from the KVM fd.
While we're at it, make things a bit more consistent in terms of types and
variable names: avoid variables named 'index' (it shadows index(3) which
can lead to confusing results), use 'hwaddr ptex' for HPTE indices and
uint64_t for each of the HPTE words, use ptex throughout the call stack
instead of pte_offset in some places (we still need that at the bottom
layer, but nowhere else).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-27 13:03:41 +08:00
|
|
|
const ppc_hash_pte64_t *hptes;
|
2016-01-15 13:12:09 +08:00
|
|
|
target_ulong v, r;
|
2011-04-01 12:15:22 +08:00
|
|
|
|
2017-02-21 11:00:16 +08:00
|
|
|
if (!valid_ptex(cpu, ptex)) {
|
2011-04-01 12:15:22 +08:00
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
|
target/ppc: Cleanup HPTE accessors for 64-bit hash MMU
Accesses to the hashed page table (HPT) are complicated by the fact that
the HPT could be in one of three places:
1) Within guest memory - when we're emulating a full guest CPU at the
hardware level (e.g. powernv, mac99, g3beige)
2) Within qemu, but outside guest memory - when we're emulating user and
supervisor instructions within TCG, but instead of emulating
the CPU's hypervisor mode, we just emulate a hypervisor's behaviour
(pseries in TCG or KVM-PR)
3) Within the host kernel - a pseries machine using KVM-HV
acceleration. Mostly accesses to the HPT are handled by KVM,
but there are a few cases where qemu needs to access it via a
special fd for the purpose.
In order to batch accesses to the fd in case (3), we use a somewhat awkward
ppc_hash64_start_access() / ppc_hash64_stop_access() pair, which for case
(3) reads / releases several HPTEs from the kernel as a batch (usually a
whole PTEG). For cases (1) & (2) it just returns an address value. The
actual HPTE load helpers then need to interpret the returned token
differently in the 3 cases.
This patch keeps the same basic structure, but simplfiies the details.
First start_access() / stop_access() are renamed to map_hptes() and
unmap_hptes() to make their operation more obvious. Second, map_hptes()
now always returns a qemu pointer, which can always be used in the same way
by the load_hpte() helpers. In case (1) it comes from address_space_map()
in case (2) directly from qemu's HPT buffer and in case (3) from a
temporary buffer read from the KVM fd.
While we're at it, make things a bit more consistent in terms of types and
variable names: avoid variables named 'index' (it shadows index(3) which
can lead to confusing results), use 'hwaddr ptex' for HPTE indices and
uint64_t for each of the HPTE words, use ptex throughout the call stack
instead of pte_offset in some places (we still need that at the bottom
layer, but nowhere else).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-27 13:03:41 +08:00
|
|
|
hptes = ppc_hash64_map_hptes(cpu, ptex, 1);
|
|
|
|
v = ppc_hash64_hpte0(cpu, hptes, 0);
|
|
|
|
r = ppc_hash64_hpte1(cpu, hptes, 0);
|
|
|
|
ppc_hash64_unmap_hptes(cpu, hptes, ptex, 1);
|
2011-04-01 12:15:22 +08:00
|
|
|
|
2013-03-12 08:31:18 +08:00
|
|
|
if ((v & HPTE64_V_VALID) == 0 ||
|
2011-04-01 12:15:22 +08:00
|
|
|
((flags & H_AVPN) && (v & ~0x7fULL) != avpn)) {
|
|
|
|
return H_NOT_FOUND;
|
|
|
|
}
|
|
|
|
|
2013-03-12 08:31:18 +08:00
|
|
|
r &= ~(HPTE64_R_PP0 | HPTE64_R_PP | HPTE64_R_N |
|
|
|
|
HPTE64_R_KEY_HI | HPTE64_R_KEY_LO);
|
|
|
|
r |= (flags << 55) & HPTE64_R_PP0;
|
|
|
|
r |= (flags << 48) & HPTE64_R_KEY_HI;
|
|
|
|
r |= flags & (HPTE64_R_PP | HPTE64_R_N | HPTE64_R_KEY_LO);
|
2017-02-21 11:00:16 +08:00
|
|
|
ppc_hash64_store_hpte(cpu, ptex,
|
2014-02-21 01:52:31 +08:00
|
|
|
(v & ~HPTE64_V_VALID) | HPTE64_V_HPTE_DIRTY, 0);
|
2017-02-21 11:00:16 +08:00
|
|
|
ppc_hash64_tlb_flush_hpte(cpu, ptex, v, r);
|
2016-09-21 00:35:01 +08:00
|
|
|
/* Flush the tlb */
|
|
|
|
check_tlb_flush(env, true);
|
2011-04-01 12:15:22 +08:00
|
|
|
/* Don't need a memory barrier, due to qemu's global lock */
|
2017-02-21 11:00:16 +08:00
|
|
|
ppc_hash64_store_hpte(cpu, ptex, v | HPTE64_V_HPTE_DIRTY, r);
|
2011-04-01 12:15:22 +08:00
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
|
2015-07-02 14:23:04 +08:00
|
|
|
static target_ulong h_read(PowerPCCPU *cpu, sPAPRMachineState *spapr,
|
2013-02-18 13:00:32 +08:00
|
|
|
target_ulong opcode, target_ulong *args)
|
|
|
|
{
|
|
|
|
target_ulong flags = args[0];
|
2017-02-21 11:00:16 +08:00
|
|
|
target_ulong ptex = args[1];
|
2013-02-18 13:00:32 +08:00
|
|
|
uint8_t *hpte;
|
|
|
|
int i, ridx, n_entries = 1;
|
|
|
|
|
2017-02-21 11:00:16 +08:00
|
|
|
if (!valid_ptex(cpu, ptex)) {
|
2013-02-18 13:00:32 +08:00
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (flags & H_READ_4) {
|
|
|
|
/* Clear the two low order bits */
|
2017-02-21 11:00:16 +08:00
|
|
|
ptex &= ~(3ULL);
|
2013-02-18 13:00:32 +08:00
|
|
|
n_entries = 4;
|
|
|
|
}
|
|
|
|
|
2017-02-23 08:39:18 +08:00
|
|
|
hpte = spapr->htab + (ptex * HASH_PTE_SIZE_64);
|
2013-02-18 13:00:32 +08:00
|
|
|
|
|
|
|
for (i = 0, ridx = 0; i < n_entries; i++) {
|
|
|
|
args[ridx++] = ldq_p(hpte);
|
|
|
|
args[ridx++] = ldq_p(hpte + (HASH_PTE_SIZE_64/2));
|
|
|
|
hpte += HASH_PTE_SIZE_64;
|
|
|
|
}
|
|
|
|
|
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
|
pseries: Implement HPT resizing
This patch implements hypercalls allowing a PAPR guest to resize its own
hash page table. This will eventually allow for more flexible memory
hotplug.
The implementation is partially asynchronous, handled in a special thread
running the hpt_prepare_thread() function. The state of a pending resize
is stored in SPAPR_MACHINE->pending_hpt.
The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or,
if one is already in progress, monitor it for completion. If there is an
existing HPT resize in progress that doesn't match the size specified in
the call, it will cancel it, replacing it with a new one matching the
given size.
The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only
be called successfully once H_RESIZE_HPT_PREPARE has successfully
completed initialization of a new HPT. The guest must ensure that there
are no concurrent accesses to the existing HPT while this is called (this
effectively means stop_machine() for Linux guests).
For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each
HPTE into the new HPT. This can have quite high latency, but it seems to
be of the order of typical migration downtime latencies for HPTs of size
up to ~2GiB (which would be used in a 256GiB guest).
In future we probably want to move more of the rehashing to the "prepare"
phase, by having H_ENTER and other hcalls update both current and
pending HPTs. That's a project for another day, but should be possible
without any changes to the guest interface.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-12 13:46:49 +08:00
|
|
|
struct sPAPRPendingHPT {
|
|
|
|
/* These fields are read-only after initialization */
|
|
|
|
int shift;
|
|
|
|
QemuThread thread;
|
|
|
|
|
|
|
|
/* These fields are protected by the BQL */
|
|
|
|
bool complete;
|
|
|
|
|
|
|
|
/* These fields are private to the preparation thread if
|
|
|
|
* !complete, otherwise protected by the BQL */
|
|
|
|
int ret;
|
|
|
|
void *hpt;
|
|
|
|
};
|
|
|
|
|
|
|
|
static void free_pending_hpt(sPAPRPendingHPT *pending)
|
|
|
|
{
|
|
|
|
if (pending->hpt) {
|
|
|
|
qemu_vfree(pending->hpt);
|
|
|
|
}
|
|
|
|
|
|
|
|
g_free(pending);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void *hpt_prepare_thread(void *opaque)
|
|
|
|
{
|
|
|
|
sPAPRPendingHPT *pending = opaque;
|
|
|
|
size_t size = 1ULL << pending->shift;
|
|
|
|
|
|
|
|
pending->hpt = qemu_memalign(size, size);
|
|
|
|
if (pending->hpt) {
|
|
|
|
memset(pending->hpt, 0, size);
|
|
|
|
pending->ret = H_SUCCESS;
|
|
|
|
} else {
|
|
|
|
pending->ret = H_NO_MEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
qemu_mutex_lock_iothread();
|
|
|
|
|
|
|
|
if (SPAPR_MACHINE(qdev_get_machine())->pending_hpt == pending) {
|
|
|
|
/* Ready to go */
|
|
|
|
pending->complete = true;
|
|
|
|
} else {
|
|
|
|
/* We've been cancelled, clean ourselves up */
|
|
|
|
free_pending_hpt(pending);
|
|
|
|
}
|
|
|
|
|
|
|
|
qemu_mutex_unlock_iothread();
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Must be called with BQL held */
|
|
|
|
static void cancel_hpt_prepare(sPAPRMachineState *spapr)
|
|
|
|
{
|
|
|
|
sPAPRPendingHPT *pending = spapr->pending_hpt;
|
|
|
|
|
|
|
|
/* Let the thread know it's cancelled */
|
|
|
|
spapr->pending_hpt = NULL;
|
|
|
|
|
|
|
|
if (!pending) {
|
|
|
|
/* Nothing to do */
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!pending->complete) {
|
|
|
|
/* thread will clean itself up */
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
free_pending_hpt(pending);
|
|
|
|
}
|
|
|
|
|
2017-07-12 15:56:55 +08:00
|
|
|
/* Convert a return code from the KVM ioctl()s implementing resize HPT
|
|
|
|
* into a PAPR hypercall return code */
|
|
|
|
static target_ulong resize_hpt_convert_rc(int ret)
|
|
|
|
{
|
|
|
|
if (ret >= 100000) {
|
|
|
|
return H_LONG_BUSY_ORDER_100_SEC;
|
|
|
|
} else if (ret >= 10000) {
|
|
|
|
return H_LONG_BUSY_ORDER_10_SEC;
|
|
|
|
} else if (ret >= 1000) {
|
|
|
|
return H_LONG_BUSY_ORDER_1_SEC;
|
|
|
|
} else if (ret >= 100) {
|
|
|
|
return H_LONG_BUSY_ORDER_100_MSEC;
|
|
|
|
} else if (ret >= 10) {
|
|
|
|
return H_LONG_BUSY_ORDER_10_MSEC;
|
|
|
|
} else if (ret > 0) {
|
|
|
|
return H_LONG_BUSY_ORDER_1_MSEC;
|
|
|
|
}
|
|
|
|
|
|
|
|
switch (ret) {
|
|
|
|
case 0:
|
|
|
|
return H_SUCCESS;
|
|
|
|
case -EPERM:
|
|
|
|
return H_AUTHORITY;
|
|
|
|
case -EINVAL:
|
|
|
|
return H_PARAMETER;
|
|
|
|
case -ENXIO:
|
|
|
|
return H_CLOSED;
|
|
|
|
case -ENOSPC:
|
|
|
|
return H_PTEG_FULL;
|
|
|
|
case -EBUSY:
|
|
|
|
return H_BUSY;
|
|
|
|
case -ENOMEM:
|
|
|
|
return H_NO_MEM;
|
|
|
|
default:
|
|
|
|
return H_HARDWARE;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-05-12 13:46:11 +08:00
|
|
|
static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
|
|
|
|
sPAPRMachineState *spapr,
|
|
|
|
target_ulong opcode,
|
|
|
|
target_ulong *args)
|
|
|
|
{
|
|
|
|
target_ulong flags = args[0];
|
pseries: Implement HPT resizing
This patch implements hypercalls allowing a PAPR guest to resize its own
hash page table. This will eventually allow for more flexible memory
hotplug.
The implementation is partially asynchronous, handled in a special thread
running the hpt_prepare_thread() function. The state of a pending resize
is stored in SPAPR_MACHINE->pending_hpt.
The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or,
if one is already in progress, monitor it for completion. If there is an
existing HPT resize in progress that doesn't match the size specified in
the call, it will cancel it, replacing it with a new one matching the
given size.
The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only
be called successfully once H_RESIZE_HPT_PREPARE has successfully
completed initialization of a new HPT. The guest must ensure that there
are no concurrent accesses to the existing HPT while this is called (this
effectively means stop_machine() for Linux guests).
For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each
HPTE into the new HPT. This can have quite high latency, but it seems to
be of the order of typical migration downtime latencies for HPTs of size
up to ~2GiB (which would be used in a 256GiB guest).
In future we probably want to move more of the rehashing to the "prepare"
phase, by having H_ENTER and other hcalls update both current and
pending HPTs. That's a project for another day, but should be possible
without any changes to the guest interface.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-12 13:46:49 +08:00
|
|
|
int shift = args[1];
|
|
|
|
sPAPRPendingHPT *pending = spapr->pending_hpt;
|
|
|
|
uint64_t current_ram_size = MACHINE(spapr)->ram_size;
|
2017-07-12 15:56:55 +08:00
|
|
|
int rc;
|
2017-05-12 13:46:11 +08:00
|
|
|
|
|
|
|
if (spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED) {
|
|
|
|
return H_AUTHORITY;
|
|
|
|
}
|
|
|
|
|
pseries: Implement HPT resizing
This patch implements hypercalls allowing a PAPR guest to resize its own
hash page table. This will eventually allow for more flexible memory
hotplug.
The implementation is partially asynchronous, handled in a special thread
running the hpt_prepare_thread() function. The state of a pending resize
is stored in SPAPR_MACHINE->pending_hpt.
The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or,
if one is already in progress, monitor it for completion. If there is an
existing HPT resize in progress that doesn't match the size specified in
the call, it will cancel it, replacing it with a new one matching the
given size.
The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only
be called successfully once H_RESIZE_HPT_PREPARE has successfully
completed initialization of a new HPT. The guest must ensure that there
are no concurrent accesses to the existing HPT while this is called (this
effectively means stop_machine() for Linux guests).
For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each
HPTE into the new HPT. This can have quite high latency, but it seems to
be of the order of typical migration downtime latencies for HPTs of size
up to ~2GiB (which would be used in a 256GiB guest).
In future we probably want to move more of the rehashing to the "prepare"
phase, by having H_ENTER and other hcalls update both current and
pending HPTs. That's a project for another day, but should be possible
without any changes to the guest interface.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-12 13:46:49 +08:00
|
|
|
if (!spapr->htab_shift) {
|
|
|
|
/* Radix guest, no HPT */
|
|
|
|
return H_NOT_AVAILABLE;
|
|
|
|
}
|
|
|
|
|
2017-05-12 13:46:11 +08:00
|
|
|
trace_spapr_h_resize_hpt_prepare(flags, shift);
|
pseries: Implement HPT resizing
This patch implements hypercalls allowing a PAPR guest to resize its own
hash page table. This will eventually allow for more flexible memory
hotplug.
The implementation is partially asynchronous, handled in a special thread
running the hpt_prepare_thread() function. The state of a pending resize
is stored in SPAPR_MACHINE->pending_hpt.
The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or,
if one is already in progress, monitor it for completion. If there is an
existing HPT resize in progress that doesn't match the size specified in
the call, it will cancel it, replacing it with a new one matching the
given size.
The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only
be called successfully once H_RESIZE_HPT_PREPARE has successfully
completed initialization of a new HPT. The guest must ensure that there
are no concurrent accesses to the existing HPT while this is called (this
effectively means stop_machine() for Linux guests).
For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each
HPTE into the new HPT. This can have quite high latency, but it seems to
be of the order of typical migration downtime latencies for HPTs of size
up to ~2GiB (which would be used in a 256GiB guest).
In future we probably want to move more of the rehashing to the "prepare"
phase, by having H_ENTER and other hcalls update both current and
pending HPTs. That's a project for another day, but should be possible
without any changes to the guest interface.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-12 13:46:49 +08:00
|
|
|
|
|
|
|
if (flags != 0) {
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (shift && ((shift < 18) || (shift > 46))) {
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
|
|
|
|
current_ram_size = pc_existing_dimms_capacity(&error_fatal);
|
|
|
|
|
|
|
|
/* We only allow the guest to allocate an HPT one order above what
|
|
|
|
* we'd normally give them (to stop a small guest claiming a huge
|
|
|
|
* chunk of resources in the HPT */
|
|
|
|
if (shift > (spapr_hpt_shift_for_ramsize(current_ram_size) + 1)) {
|
|
|
|
return H_RESOURCE;
|
|
|
|
}
|
|
|
|
|
2017-07-12 15:56:55 +08:00
|
|
|
rc = kvmppc_resize_hpt_prepare(cpu, flags, shift);
|
|
|
|
if (rc != -ENOSYS) {
|
|
|
|
return resize_hpt_convert_rc(rc);
|
|
|
|
}
|
|
|
|
|
pseries: Implement HPT resizing
This patch implements hypercalls allowing a PAPR guest to resize its own
hash page table. This will eventually allow for more flexible memory
hotplug.
The implementation is partially asynchronous, handled in a special thread
running the hpt_prepare_thread() function. The state of a pending resize
is stored in SPAPR_MACHINE->pending_hpt.
The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or,
if one is already in progress, monitor it for completion. If there is an
existing HPT resize in progress that doesn't match the size specified in
the call, it will cancel it, replacing it with a new one matching the
given size.
The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only
be called successfully once H_RESIZE_HPT_PREPARE has successfully
completed initialization of a new HPT. The guest must ensure that there
are no concurrent accesses to the existing HPT while this is called (this
effectively means stop_machine() for Linux guests).
For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each
HPTE into the new HPT. This can have quite high latency, but it seems to
be of the order of typical migration downtime latencies for HPTs of size
up to ~2GiB (which would be used in a 256GiB guest).
In future we probably want to move more of the rehashing to the "prepare"
phase, by having H_ENTER and other hcalls update both current and
pending HPTs. That's a project for another day, but should be possible
without any changes to the guest interface.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-12 13:46:49 +08:00
|
|
|
if (pending) {
|
|
|
|
/* something already in progress */
|
|
|
|
if (pending->shift == shift) {
|
|
|
|
/* and it's suitable */
|
|
|
|
if (pending->complete) {
|
|
|
|
return pending->ret;
|
|
|
|
} else {
|
|
|
|
return H_LONG_BUSY_ORDER_100_MSEC;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* not suitable, cancel and replace */
|
|
|
|
cancel_hpt_prepare(spapr);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!shift) {
|
|
|
|
/* nothing to do */
|
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* start new prepare */
|
|
|
|
|
|
|
|
pending = g_new0(sPAPRPendingHPT, 1);
|
|
|
|
pending->shift = shift;
|
|
|
|
pending->ret = H_HARDWARE;
|
|
|
|
|
|
|
|
qemu_thread_create(&pending->thread, "sPAPR HPT prepare",
|
|
|
|
hpt_prepare_thread, pending, QEMU_THREAD_DETACHED);
|
|
|
|
|
|
|
|
spapr->pending_hpt = pending;
|
|
|
|
|
|
|
|
/* In theory we could estimate the time more accurately based on
|
|
|
|
* the new size, but there's not much point */
|
|
|
|
return H_LONG_BUSY_ORDER_100_MSEC;
|
|
|
|
}
|
|
|
|
|
|
|
|
static uint64_t new_hpte_load0(void *htab, uint64_t pteg, int slot)
|
|
|
|
{
|
|
|
|
uint8_t *addr = htab;
|
|
|
|
|
|
|
|
addr += pteg * HASH_PTEG_SIZE_64;
|
|
|
|
addr += slot * HASH_PTE_SIZE_64;
|
|
|
|
return ldq_p(addr);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void new_hpte_store(void *htab, uint64_t pteg, int slot,
|
|
|
|
uint64_t pte0, uint64_t pte1)
|
|
|
|
{
|
|
|
|
uint8_t *addr = htab;
|
|
|
|
|
|
|
|
addr += pteg * HASH_PTEG_SIZE_64;
|
|
|
|
addr += slot * HASH_PTE_SIZE_64;
|
|
|
|
|
|
|
|
stq_p(addr, pte0);
|
|
|
|
stq_p(addr + HASH_PTE_SIZE_64 / 2, pte1);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int rehash_hpte(PowerPCCPU *cpu,
|
|
|
|
const ppc_hash_pte64_t *hptes,
|
|
|
|
void *old_hpt, uint64_t oldsize,
|
|
|
|
void *new_hpt, uint64_t newsize,
|
|
|
|
uint64_t pteg, int slot)
|
|
|
|
{
|
|
|
|
uint64_t old_hash_mask = (oldsize >> 7) - 1;
|
|
|
|
uint64_t new_hash_mask = (newsize >> 7) - 1;
|
|
|
|
target_ulong pte0 = ppc_hash64_hpte0(cpu, hptes, slot);
|
|
|
|
target_ulong pte1;
|
|
|
|
uint64_t avpn;
|
|
|
|
unsigned base_pg_shift;
|
|
|
|
uint64_t hash, new_pteg, replace_pte0;
|
|
|
|
|
|
|
|
if (!(pte0 & HPTE64_V_VALID) || !(pte0 & HPTE64_V_BOLTED)) {
|
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
|
|
|
|
pte1 = ppc_hash64_hpte1(cpu, hptes, slot);
|
|
|
|
|
|
|
|
base_pg_shift = ppc_hash64_hpte_page_shift_noslb(cpu, pte0, pte1);
|
|
|
|
assert(base_pg_shift); /* H_ENTER shouldn't allow a bad encoding */
|
|
|
|
avpn = HPTE64_V_AVPN_VAL(pte0) & ~(((1ULL << base_pg_shift) - 1) >> 23);
|
|
|
|
|
|
|
|
if (pte0 & HPTE64_V_SECONDARY) {
|
|
|
|
pteg = ~pteg;
|
|
|
|
}
|
|
|
|
|
|
|
|
if ((pte0 & HPTE64_V_SSIZE) == HPTE64_V_SSIZE_256M) {
|
|
|
|
uint64_t offset, vsid;
|
|
|
|
|
|
|
|
/* We only have 28 - 23 bits of offset in avpn */
|
|
|
|
offset = (avpn & 0x1f) << 23;
|
|
|
|
vsid = avpn >> 5;
|
|
|
|
/* We can find more bits from the pteg value */
|
|
|
|
if (base_pg_shift < 23) {
|
|
|
|
offset |= ((vsid ^ pteg) & old_hash_mask) << base_pg_shift;
|
|
|
|
}
|
|
|
|
|
|
|
|
hash = vsid ^ (offset >> base_pg_shift);
|
|
|
|
} else if ((pte0 & HPTE64_V_SSIZE) == HPTE64_V_SSIZE_1T) {
|
|
|
|
uint64_t offset, vsid;
|
|
|
|
|
|
|
|
/* We only have 40 - 23 bits of seg_off in avpn */
|
|
|
|
offset = (avpn & 0x1ffff) << 23;
|
|
|
|
vsid = avpn >> 17;
|
|
|
|
if (base_pg_shift < 23) {
|
|
|
|
offset |= ((vsid ^ (vsid << 25) ^ pteg) & old_hash_mask)
|
|
|
|
<< base_pg_shift;
|
|
|
|
}
|
|
|
|
|
|
|
|
hash = vsid ^ (vsid << 25) ^ (offset >> base_pg_shift);
|
|
|
|
} else {
|
|
|
|
error_report("rehash_pte: Bad segment size in HPTE");
|
|
|
|
return H_HARDWARE;
|
|
|
|
}
|
|
|
|
|
|
|
|
new_pteg = hash & new_hash_mask;
|
|
|
|
if (pte0 & HPTE64_V_SECONDARY) {
|
|
|
|
assert(~pteg == (hash & old_hash_mask));
|
|
|
|
new_pteg = ~new_pteg;
|
|
|
|
} else {
|
|
|
|
assert(pteg == (hash & old_hash_mask));
|
|
|
|
}
|
|
|
|
assert((oldsize != newsize) || (pteg == new_pteg));
|
|
|
|
replace_pte0 = new_hpte_load0(new_hpt, new_pteg, slot);
|
|
|
|
/*
|
|
|
|
* Strictly speaking, we don't need all these tests, since we only
|
|
|
|
* ever rehash bolted HPTEs. We might in future handle non-bolted
|
|
|
|
* HPTEs, though so make the logic correct for those cases as
|
|
|
|
* well.
|
|
|
|
*/
|
|
|
|
if (replace_pte0 & HPTE64_V_VALID) {
|
|
|
|
assert(newsize < oldsize);
|
|
|
|
if (replace_pte0 & HPTE64_V_BOLTED) {
|
|
|
|
if (pte0 & HPTE64_V_BOLTED) {
|
|
|
|
/* Bolted collision, nothing we can do */
|
|
|
|
return H_PTEG_FULL;
|
|
|
|
} else {
|
|
|
|
/* Discard this hpte */
|
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
new_hpte_store(new_hpt, new_pteg, slot, pte0, pte1);
|
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int rehash_hpt(PowerPCCPU *cpu,
|
|
|
|
void *old_hpt, uint64_t oldsize,
|
|
|
|
void *new_hpt, uint64_t newsize)
|
|
|
|
{
|
|
|
|
uint64_t n_ptegs = oldsize >> 7;
|
|
|
|
uint64_t pteg;
|
|
|
|
int slot;
|
|
|
|
int rc;
|
|
|
|
|
|
|
|
for (pteg = 0; pteg < n_ptegs; pteg++) {
|
|
|
|
hwaddr ptex = pteg * HPTES_PER_GROUP;
|
|
|
|
const ppc_hash_pte64_t *hptes
|
|
|
|
= ppc_hash64_map_hptes(cpu, ptex, HPTES_PER_GROUP);
|
|
|
|
|
|
|
|
if (!hptes) {
|
|
|
|
return H_HARDWARE;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (slot = 0; slot < HPTES_PER_GROUP; slot++) {
|
|
|
|
rc = rehash_hpte(cpu, hptes, old_hpt, oldsize, new_hpt, newsize,
|
|
|
|
pteg, slot);
|
|
|
|
if (rc != H_SUCCESS) {
|
|
|
|
ppc_hash64_unmap_hptes(cpu, hptes, ptex, HPTES_PER_GROUP);
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
ppc_hash64_unmap_hptes(cpu, hptes, ptex, HPTES_PER_GROUP);
|
|
|
|
}
|
|
|
|
|
|
|
|
return H_SUCCESS;
|
2017-05-12 13:46:11 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static target_ulong h_resize_hpt_commit(PowerPCCPU *cpu,
|
|
|
|
sPAPRMachineState *spapr,
|
|
|
|
target_ulong opcode,
|
|
|
|
target_ulong *args)
|
|
|
|
{
|
|
|
|
target_ulong flags = args[0];
|
|
|
|
target_ulong shift = args[1];
|
pseries: Implement HPT resizing
This patch implements hypercalls allowing a PAPR guest to resize its own
hash page table. This will eventually allow for more flexible memory
hotplug.
The implementation is partially asynchronous, handled in a special thread
running the hpt_prepare_thread() function. The state of a pending resize
is stored in SPAPR_MACHINE->pending_hpt.
The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or,
if one is already in progress, monitor it for completion. If there is an
existing HPT resize in progress that doesn't match the size specified in
the call, it will cancel it, replacing it with a new one matching the
given size.
The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only
be called successfully once H_RESIZE_HPT_PREPARE has successfully
completed initialization of a new HPT. The guest must ensure that there
are no concurrent accesses to the existing HPT while this is called (this
effectively means stop_machine() for Linux guests).
For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each
HPTE into the new HPT. This can have quite high latency, but it seems to
be of the order of typical migration downtime latencies for HPTs of size
up to ~2GiB (which would be used in a 256GiB guest).
In future we probably want to move more of the rehashing to the "prepare"
phase, by having H_ENTER and other hcalls update both current and
pending HPTs. That's a project for another day, but should be possible
without any changes to the guest interface.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-12 13:46:49 +08:00
|
|
|
sPAPRPendingHPT *pending = spapr->pending_hpt;
|
|
|
|
int rc;
|
|
|
|
size_t newsize;
|
2017-05-12 13:46:11 +08:00
|
|
|
|
|
|
|
if (spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED) {
|
|
|
|
return H_AUTHORITY;
|
|
|
|
}
|
|
|
|
|
|
|
|
trace_spapr_h_resize_hpt_commit(flags, shift);
|
pseries: Implement HPT resizing
This patch implements hypercalls allowing a PAPR guest to resize its own
hash page table. This will eventually allow for more flexible memory
hotplug.
The implementation is partially asynchronous, handled in a special thread
running the hpt_prepare_thread() function. The state of a pending resize
is stored in SPAPR_MACHINE->pending_hpt.
The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or,
if one is already in progress, monitor it for completion. If there is an
existing HPT resize in progress that doesn't match the size specified in
the call, it will cancel it, replacing it with a new one matching the
given size.
The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only
be called successfully once H_RESIZE_HPT_PREPARE has successfully
completed initialization of a new HPT. The guest must ensure that there
are no concurrent accesses to the existing HPT while this is called (this
effectively means stop_machine() for Linux guests).
For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each
HPTE into the new HPT. This can have quite high latency, but it seems to
be of the order of typical migration downtime latencies for HPTs of size
up to ~2GiB (which would be used in a 256GiB guest).
In future we probably want to move more of the rehashing to the "prepare"
phase, by having H_ENTER and other hcalls update both current and
pending HPTs. That's a project for another day, but should be possible
without any changes to the guest interface.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-12 13:46:49 +08:00
|
|
|
|
2017-07-12 15:56:55 +08:00
|
|
|
rc = kvmppc_resize_hpt_commit(cpu, flags, shift);
|
|
|
|
if (rc != -ENOSYS) {
|
|
|
|
return resize_hpt_convert_rc(rc);
|
|
|
|
}
|
|
|
|
|
pseries: Implement HPT resizing
This patch implements hypercalls allowing a PAPR guest to resize its own
hash page table. This will eventually allow for more flexible memory
hotplug.
The implementation is partially asynchronous, handled in a special thread
running the hpt_prepare_thread() function. The state of a pending resize
is stored in SPAPR_MACHINE->pending_hpt.
The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or,
if one is already in progress, monitor it for completion. If there is an
existing HPT resize in progress that doesn't match the size specified in
the call, it will cancel it, replacing it with a new one matching the
given size.
The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only
be called successfully once H_RESIZE_HPT_PREPARE has successfully
completed initialization of a new HPT. The guest must ensure that there
are no concurrent accesses to the existing HPT while this is called (this
effectively means stop_machine() for Linux guests).
For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each
HPTE into the new HPT. This can have quite high latency, but it seems to
be of the order of typical migration downtime latencies for HPTs of size
up to ~2GiB (which would be used in a 256GiB guest).
In future we probably want to move more of the rehashing to the "prepare"
phase, by having H_ENTER and other hcalls update both current and
pending HPTs. That's a project for another day, but should be possible
without any changes to the guest interface.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-12 13:46:49 +08:00
|
|
|
if (flags != 0) {
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!pending || (pending->shift != shift)) {
|
|
|
|
/* no matching prepare */
|
|
|
|
return H_CLOSED;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!pending->complete) {
|
|
|
|
/* prepare has not completed */
|
|
|
|
return H_BUSY;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Shouldn't have got past PREPARE without an HPT */
|
|
|
|
g_assert(spapr->htab_shift);
|
|
|
|
|
|
|
|
newsize = 1ULL << pending->shift;
|
|
|
|
rc = rehash_hpt(cpu, spapr->htab, HTAB_SIZE(spapr),
|
|
|
|
pending->hpt, newsize);
|
|
|
|
if (rc == H_SUCCESS) {
|
|
|
|
qemu_vfree(spapr->htab);
|
|
|
|
spapr->htab = pending->hpt;
|
|
|
|
spapr->htab_shift = pending->shift;
|
|
|
|
|
2017-07-12 15:56:55 +08:00
|
|
|
if (kvm_enabled()) {
|
|
|
|
/* For KVM PR, update the HPT pointer */
|
|
|
|
target_ulong sdr1 = (target_ulong)(uintptr_t)spapr->htab
|
|
|
|
| (spapr->htab_shift - 18);
|
|
|
|
kvmppc_update_sdr1(sdr1);
|
|
|
|
}
|
|
|
|
|
pseries: Implement HPT resizing
This patch implements hypercalls allowing a PAPR guest to resize its own
hash page table. This will eventually allow for more flexible memory
hotplug.
The implementation is partially asynchronous, handled in a special thread
running the hpt_prepare_thread() function. The state of a pending resize
is stored in SPAPR_MACHINE->pending_hpt.
The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or,
if one is already in progress, monitor it for completion. If there is an
existing HPT resize in progress that doesn't match the size specified in
the call, it will cancel it, replacing it with a new one matching the
given size.
The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only
be called successfully once H_RESIZE_HPT_PREPARE has successfully
completed initialization of a new HPT. The guest must ensure that there
are no concurrent accesses to the existing HPT while this is called (this
effectively means stop_machine() for Linux guests).
For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each
HPTE into the new HPT. This can have quite high latency, but it seems to
be of the order of typical migration downtime latencies for HPTs of size
up to ~2GiB (which would be used in a 256GiB guest).
In future we probably want to move more of the rehashing to the "prepare"
phase, by having H_ENTER and other hcalls update both current and
pending HPTs. That's a project for another day, but should be possible
without any changes to the guest interface.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-12 13:46:49 +08:00
|
|
|
pending->hpt = NULL; /* so it's not free()d */
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Clean up */
|
|
|
|
spapr->pending_hpt = NULL;
|
|
|
|
free_pending_hpt(pending);
|
|
|
|
|
|
|
|
return rc;
|
2017-05-12 13:46:11 +08:00
|
|
|
}
|
|
|
|
|
2016-02-11 20:47:18 +08:00
|
|
|
static target_ulong h_set_sprg0(PowerPCCPU *cpu, sPAPRMachineState *spapr,
|
|
|
|
target_ulong opcode, target_ulong *args)
|
|
|
|
{
|
|
|
|
cpu_synchronize_state(CPU(cpu));
|
|
|
|
cpu->env.spr[SPR_SPRG0] = args[0];
|
|
|
|
|
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
|
2015-07-02 14:23:04 +08:00
|
|
|
static target_ulong h_set_dabr(PowerPCCPU *cpu, sPAPRMachineState *spapr,
|
2011-04-01 12:15:24 +08:00
|
|
|
target_ulong opcode, target_ulong *args)
|
|
|
|
{
|
2016-02-11 20:47:19 +08:00
|
|
|
if (!has_spr(cpu, SPR_DABR)) {
|
|
|
|
return H_HARDWARE; /* DABR register not available */
|
|
|
|
}
|
|
|
|
cpu_synchronize_state(CPU(cpu));
|
|
|
|
|
|
|
|
if (has_spr(cpu, SPR_DABRX)) {
|
|
|
|
cpu->env.spr[SPR_DABRX] = 0x3; /* Use Problem and Privileged state */
|
|
|
|
} else if (!(args[0] & 0x4)) { /* Breakpoint Translation set? */
|
|
|
|
return H_RESERVED_DABR;
|
|
|
|
}
|
|
|
|
|
|
|
|
cpu->env.spr[SPR_DABR] = args[0];
|
|
|
|
return H_SUCCESS;
|
2011-04-01 12:15:24 +08:00
|
|
|
}
|
|
|
|
|
2016-02-11 20:47:20 +08:00
|
|
|
static target_ulong h_set_xdabr(PowerPCCPU *cpu, sPAPRMachineState *spapr,
|
|
|
|
target_ulong opcode, target_ulong *args)
|
|
|
|
{
|
|
|
|
target_ulong dabrx = args[1];
|
|
|
|
|
|
|
|
if (!has_spr(cpu, SPR_DABR) || !has_spr(cpu, SPR_DABRX)) {
|
|
|
|
return H_HARDWARE;
|
|
|
|
}
|
|
|
|
|
|
|
|
if ((dabrx & ~0xfULL) != 0 || (dabrx & H_DABRX_HYPERVISOR) != 0
|
|
|
|
|| (dabrx & (H_DABRX_KERNEL | H_DABRX_USER)) == 0) {
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
|
|
|
|
cpu_synchronize_state(CPU(cpu));
|
|
|
|
cpu->env.spr[SPR_DABRX] = dabrx;
|
|
|
|
cpu->env.spr[SPR_DABR] = args[0];
|
|
|
|
|
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
|
2016-02-18 17:15:54 +08:00
|
|
|
static target_ulong h_page_init(PowerPCCPU *cpu, sPAPRMachineState *spapr,
|
|
|
|
target_ulong opcode, target_ulong *args)
|
|
|
|
{
|
|
|
|
target_ulong flags = args[0];
|
|
|
|
hwaddr dst = args[1];
|
|
|
|
hwaddr src = args[2];
|
|
|
|
hwaddr len = TARGET_PAGE_SIZE;
|
|
|
|
uint8_t *pdst, *psrc;
|
|
|
|
target_long ret = H_SUCCESS;
|
|
|
|
|
|
|
|
if (flags & ~(H_ICACHE_SYNCHRONIZE | H_ICACHE_INVALIDATE
|
|
|
|
| H_COPY_PAGE | H_ZERO_PAGE)) {
|
|
|
|
qemu_log_mask(LOG_UNIMP, "h_page_init: Bad flags (" TARGET_FMT_lx "\n",
|
|
|
|
flags);
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Map-in destination */
|
|
|
|
if (!is_ram_address(spapr, dst) || (dst & ~TARGET_PAGE_MASK) != 0) {
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
pdst = cpu_physical_memory_map(dst, &len, 1);
|
|
|
|
if (!pdst || len != TARGET_PAGE_SIZE) {
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (flags & H_COPY_PAGE) {
|
|
|
|
/* Map-in source, copy to destination, and unmap source again */
|
|
|
|
if (!is_ram_address(spapr, src) || (src & ~TARGET_PAGE_MASK) != 0) {
|
|
|
|
ret = H_PARAMETER;
|
|
|
|
goto unmap_out;
|
|
|
|
}
|
|
|
|
psrc = cpu_physical_memory_map(src, &len, 0);
|
|
|
|
if (!psrc || len != TARGET_PAGE_SIZE) {
|
|
|
|
ret = H_PARAMETER;
|
|
|
|
goto unmap_out;
|
|
|
|
}
|
|
|
|
memcpy(pdst, psrc, len);
|
|
|
|
cpu_physical_memory_unmap(psrc, len, 0, len);
|
|
|
|
} else if (flags & H_ZERO_PAGE) {
|
|
|
|
memset(pdst, 0, len); /* Just clear the destination page */
|
|
|
|
}
|
|
|
|
|
|
|
|
if (kvm_enabled() && (flags & H_ICACHE_SYNCHRONIZE) != 0) {
|
|
|
|
kvmppc_dcbst_range(cpu, pdst, len);
|
|
|
|
}
|
|
|
|
if (flags & (H_ICACHE_SYNCHRONIZE | H_ICACHE_INVALIDATE)) {
|
|
|
|
if (kvm_enabled()) {
|
|
|
|
kvmppc_icbi_range(cpu, pdst, len);
|
|
|
|
} else {
|
|
|
|
tb_flush(CPU(cpu));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
unmap_out:
|
|
|
|
cpu_physical_memory_unmap(pdst, TARGET_PAGE_SIZE, 1, len);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
#define FLAGS_REGISTER_VPA 0x0000200000000000ULL
|
|
|
|
#define FLAGS_REGISTER_DTL 0x0000400000000000ULL
|
|
|
|
#define FLAGS_REGISTER_SLBSHADOW 0x0000600000000000ULL
|
|
|
|
#define FLAGS_DEREGISTER_VPA 0x0000a00000000000ULL
|
|
|
|
#define FLAGS_DEREGISTER_DTL 0x0000c00000000000ULL
|
|
|
|
#define FLAGS_DEREGISTER_SLBSHADOW 0x0000e00000000000ULL
|
|
|
|
|
|
|
|
#define VPA_MIN_SIZE 640
|
|
|
|
#define VPA_SIZE_OFFSET 0x4
|
|
|
|
#define VPA_SHARED_PROC_OFFSET 0x9
|
|
|
|
#define VPA_SHARED_PROC_VAL 0x2
|
|
|
|
|
2012-03-14 08:38:23 +08:00
|
|
|
static target_ulong register_vpa(CPUPPCState *env, target_ulong vpa)
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
{
|
2014-03-10 02:29:41 +08:00
|
|
|
CPUState *cs = CPU(ppc_env_get_cpu(env));
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
uint16_t size;
|
|
|
|
uint8_t tmp;
|
|
|
|
|
|
|
|
if (vpa == 0) {
|
|
|
|
hcall_dprintf("Can't cope with registering a VPA at logical 0\n");
|
|
|
|
return H_HARDWARE;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (vpa % env->dcache_line_size) {
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
/* FIXME: bounds check the address */
|
|
|
|
|
2013-12-17 12:33:56 +08:00
|
|
|
size = lduw_be_phys(cs->as, vpa + 0x4);
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
|
|
|
|
if (size < VPA_MIN_SIZE) {
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* VPA is not allowed to cross a page boundary */
|
|
|
|
if ((vpa / 4096) != ((vpa + size - 1) / 4096)) {
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
|
target-ppc: Rework storage of VPA registration state
With PAPR guests, hypercalls allow registration of the Virtual Processor
Area (VPA), SLB shadow and dispatch trace log (DTL), each of which allow
for certain communication between the guest and hypervisor. Currently, we
store the addresses of the three areas and the size of the dtl in
CPUPPCState.
The SLB shadow and DTL are variable sized, with the size being retrieved
from within the registered memory area at the hypercall time. This size
can later be overwritten with other information, however, so we need to
save the size as of registration time. We already do this for the DTL,
but not for the SLB shadow, so this patch fixes that.
In addition, we change the storage of the VPA information to use fixed
size integer types which will make life easier for syncing this data with
KVM, which we will need in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-09 02:17:38 +08:00
|
|
|
env->vpa_addr = vpa;
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
|
2013-12-17 12:05:40 +08:00
|
|
|
tmp = ldub_phys(cs->as, env->vpa_addr + VPA_SHARED_PROC_OFFSET);
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
tmp |= VPA_SHARED_PROC_VAL;
|
2013-12-17 13:29:06 +08:00
|
|
|
stb_phys(cs->as, env->vpa_addr + VPA_SHARED_PROC_OFFSET, tmp);
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
|
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
|
2012-03-14 08:38:23 +08:00
|
|
|
static target_ulong deregister_vpa(CPUPPCState *env, target_ulong vpa)
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
{
|
target-ppc: Rework storage of VPA registration state
With PAPR guests, hypercalls allow registration of the Virtual Processor
Area (VPA), SLB shadow and dispatch trace log (DTL), each of which allow
for certain communication between the guest and hypervisor. Currently, we
store the addresses of the three areas and the size of the dtl in
CPUPPCState.
The SLB shadow and DTL are variable sized, with the size being retrieved
from within the registered memory area at the hypercall time. This size
can later be overwritten with other information, however, so we need to
save the size as of registration time. We already do this for the DTL,
but not for the SLB shadow, so this patch fixes that.
In addition, we change the storage of the VPA information to use fixed
size integer types which will make life easier for syncing this data with
KVM, which we will need in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-09 02:17:38 +08:00
|
|
|
if (env->slb_shadow_addr) {
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
return H_RESOURCE;
|
|
|
|
}
|
|
|
|
|
target-ppc: Rework storage of VPA registration state
With PAPR guests, hypercalls allow registration of the Virtual Processor
Area (VPA), SLB shadow and dispatch trace log (DTL), each of which allow
for certain communication between the guest and hypervisor. Currently, we
store the addresses of the three areas and the size of the dtl in
CPUPPCState.
The SLB shadow and DTL are variable sized, with the size being retrieved
from within the registered memory area at the hypercall time. This size
can later be overwritten with other information, however, so we need to
save the size as of registration time. We already do this for the DTL,
but not for the SLB shadow, so this patch fixes that.
In addition, we change the storage of the VPA information to use fixed
size integer types which will make life easier for syncing this data with
KVM, which we will need in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-09 02:17:38 +08:00
|
|
|
if (env->dtl_addr) {
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
return H_RESOURCE;
|
|
|
|
}
|
|
|
|
|
target-ppc: Rework storage of VPA registration state
With PAPR guests, hypercalls allow registration of the Virtual Processor
Area (VPA), SLB shadow and dispatch trace log (DTL), each of which allow
for certain communication between the guest and hypervisor. Currently, we
store the addresses of the three areas and the size of the dtl in
CPUPPCState.
The SLB shadow and DTL are variable sized, with the size being retrieved
from within the registered memory area at the hypercall time. This size
can later be overwritten with other information, however, so we need to
save the size as of registration time. We already do this for the DTL,
but not for the SLB shadow, so this patch fixes that.
In addition, we change the storage of the VPA information to use fixed
size integer types which will make life easier for syncing this data with
KVM, which we will need in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-09 02:17:38 +08:00
|
|
|
env->vpa_addr = 0;
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
|
2012-03-14 08:38:23 +08:00
|
|
|
static target_ulong register_slb_shadow(CPUPPCState *env, target_ulong addr)
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
{
|
2014-03-10 02:29:41 +08:00
|
|
|
CPUState *cs = CPU(ppc_env_get_cpu(env));
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
uint32_t size;
|
|
|
|
|
|
|
|
if (addr == 0) {
|
|
|
|
hcall_dprintf("Can't cope with SLB shadow at logical 0\n");
|
|
|
|
return H_HARDWARE;
|
|
|
|
}
|
|
|
|
|
2013-11-15 21:46:38 +08:00
|
|
|
size = ldl_be_phys(cs->as, addr + 0x4);
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
if (size < 0x8) {
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
|
|
|
|
if ((addr / 4096) != ((addr + size - 1) / 4096)) {
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
|
target-ppc: Rework storage of VPA registration state
With PAPR guests, hypercalls allow registration of the Virtual Processor
Area (VPA), SLB shadow and dispatch trace log (DTL), each of which allow
for certain communication between the guest and hypervisor. Currently, we
store the addresses of the three areas and the size of the dtl in
CPUPPCState.
The SLB shadow and DTL are variable sized, with the size being retrieved
from within the registered memory area at the hypercall time. This size
can later be overwritten with other information, however, so we need to
save the size as of registration time. We already do this for the DTL,
but not for the SLB shadow, so this patch fixes that.
In addition, we change the storage of the VPA information to use fixed
size integer types which will make life easier for syncing this data with
KVM, which we will need in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-09 02:17:38 +08:00
|
|
|
if (!env->vpa_addr) {
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
return H_RESOURCE;
|
|
|
|
}
|
|
|
|
|
target-ppc: Rework storage of VPA registration state
With PAPR guests, hypercalls allow registration of the Virtual Processor
Area (VPA), SLB shadow and dispatch trace log (DTL), each of which allow
for certain communication between the guest and hypervisor. Currently, we
store the addresses of the three areas and the size of the dtl in
CPUPPCState.
The SLB shadow and DTL are variable sized, with the size being retrieved
from within the registered memory area at the hypercall time. This size
can later be overwritten with other information, however, so we need to
save the size as of registration time. We already do this for the DTL,
but not for the SLB shadow, so this patch fixes that.
In addition, we change the storage of the VPA information to use fixed
size integer types which will make life easier for syncing this data with
KVM, which we will need in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-09 02:17:38 +08:00
|
|
|
env->slb_shadow_addr = addr;
|
|
|
|
env->slb_shadow_size = size;
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
|
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
|
2012-03-14 08:38:23 +08:00
|
|
|
static target_ulong deregister_slb_shadow(CPUPPCState *env, target_ulong addr)
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
{
|
target-ppc: Rework storage of VPA registration state
With PAPR guests, hypercalls allow registration of the Virtual Processor
Area (VPA), SLB shadow and dispatch trace log (DTL), each of which allow
for certain communication between the guest and hypervisor. Currently, we
store the addresses of the three areas and the size of the dtl in
CPUPPCState.
The SLB shadow and DTL are variable sized, with the size being retrieved
from within the registered memory area at the hypercall time. This size
can later be overwritten with other information, however, so we need to
save the size as of registration time. We already do this for the DTL,
but not for the SLB shadow, so this patch fixes that.
In addition, we change the storage of the VPA information to use fixed
size integer types which will make life easier for syncing this data with
KVM, which we will need in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-09 02:17:38 +08:00
|
|
|
env->slb_shadow_addr = 0;
|
|
|
|
env->slb_shadow_size = 0;
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
|
2012-03-14 08:38:23 +08:00
|
|
|
static target_ulong register_dtl(CPUPPCState *env, target_ulong addr)
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
{
|
2014-03-10 02:29:41 +08:00
|
|
|
CPUState *cs = CPU(ppc_env_get_cpu(env));
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
uint32_t size;
|
|
|
|
|
|
|
|
if (addr == 0) {
|
|
|
|
hcall_dprintf("Can't cope with DTL at logical 0\n");
|
|
|
|
return H_HARDWARE;
|
|
|
|
}
|
|
|
|
|
2013-11-15 21:46:38 +08:00
|
|
|
size = ldl_be_phys(cs->as, addr + 0x4);
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
|
|
|
|
if (size < 48) {
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
|
target-ppc: Rework storage of VPA registration state
With PAPR guests, hypercalls allow registration of the Virtual Processor
Area (VPA), SLB shadow and dispatch trace log (DTL), each of which allow
for certain communication between the guest and hypervisor. Currently, we
store the addresses of the three areas and the size of the dtl in
CPUPPCState.
The SLB shadow and DTL are variable sized, with the size being retrieved
from within the registered memory area at the hypercall time. This size
can later be overwritten with other information, however, so we need to
save the size as of registration time. We already do this for the DTL,
but not for the SLB shadow, so this patch fixes that.
In addition, we change the storage of the VPA information to use fixed
size integer types which will make life easier for syncing this data with
KVM, which we will need in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-09 02:17:38 +08:00
|
|
|
if (!env->vpa_addr) {
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
return H_RESOURCE;
|
|
|
|
}
|
|
|
|
|
target-ppc: Rework storage of VPA registration state
With PAPR guests, hypercalls allow registration of the Virtual Processor
Area (VPA), SLB shadow and dispatch trace log (DTL), each of which allow
for certain communication between the guest and hypervisor. Currently, we
store the addresses of the three areas and the size of the dtl in
CPUPPCState.
The SLB shadow and DTL are variable sized, with the size being retrieved
from within the registered memory area at the hypercall time. This size
can later be overwritten with other information, however, so we need to
save the size as of registration time. We already do this for the DTL,
but not for the SLB shadow, so this patch fixes that.
In addition, we change the storage of the VPA information to use fixed
size integer types which will make life easier for syncing this data with
KVM, which we will need in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-09 02:17:38 +08:00
|
|
|
env->dtl_addr = addr;
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
env->dtl_size = size;
|
|
|
|
|
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
|
2012-04-23 15:27:56 +08:00
|
|
|
static target_ulong deregister_dtl(CPUPPCState *env, target_ulong addr)
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
{
|
target-ppc: Rework storage of VPA registration state
With PAPR guests, hypercalls allow registration of the Virtual Processor
Area (VPA), SLB shadow and dispatch trace log (DTL), each of which allow
for certain communication between the guest and hypervisor. Currently, we
store the addresses of the three areas and the size of the dtl in
CPUPPCState.
The SLB shadow and DTL are variable sized, with the size being retrieved
from within the registered memory area at the hypercall time. This size
can later be overwritten with other information, however, so we need to
save the size as of registration time. We already do this for the DTL,
but not for the SLB shadow, so this patch fixes that.
In addition, we change the storage of the VPA information to use fixed
size integer types which will make life easier for syncing this data with
KVM, which we will need in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2012-10-09 02:17:38 +08:00
|
|
|
env->dtl_addr = 0;
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
env->dtl_size = 0;
|
|
|
|
|
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
|
2015-07-02 14:23:04 +08:00
|
|
|
static target_ulong h_register_vpa(PowerPCCPU *cpu, sPAPRMachineState *spapr,
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
target_ulong opcode, target_ulong *args)
|
|
|
|
{
|
|
|
|
target_ulong flags = args[0];
|
|
|
|
target_ulong procno = args[1];
|
|
|
|
target_ulong vpa = args[2];
|
|
|
|
target_ulong ret = H_PARAMETER;
|
2012-03-14 08:38:23 +08:00
|
|
|
CPUPPCState *tenv;
|
2014-02-01 22:45:52 +08:00
|
|
|
PowerPCCPU *tcpu;
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
|
2017-08-09 13:38:56 +08:00
|
|
|
tcpu = spapr_find_cpu(procno);
|
2013-02-15 23:43:08 +08:00
|
|
|
if (!tcpu) {
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
2014-02-01 22:45:52 +08:00
|
|
|
tenv = &tcpu->env;
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
|
|
|
|
switch (flags) {
|
|
|
|
case FLAGS_REGISTER_VPA:
|
|
|
|
ret = register_vpa(tenv, vpa);
|
|
|
|
break;
|
|
|
|
|
|
|
|
case FLAGS_DEREGISTER_VPA:
|
|
|
|
ret = deregister_vpa(tenv, vpa);
|
|
|
|
break;
|
|
|
|
|
|
|
|
case FLAGS_REGISTER_SLBSHADOW:
|
|
|
|
ret = register_slb_shadow(tenv, vpa);
|
|
|
|
break;
|
|
|
|
|
|
|
|
case FLAGS_DEREGISTER_SLBSHADOW:
|
|
|
|
ret = deregister_slb_shadow(tenv, vpa);
|
|
|
|
break;
|
|
|
|
|
|
|
|
case FLAGS_REGISTER_DTL:
|
|
|
|
ret = register_dtl(tenv, vpa);
|
|
|
|
break;
|
|
|
|
|
|
|
|
case FLAGS_DEREGISTER_DTL:
|
|
|
|
ret = deregister_dtl(tenv, vpa);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2015-07-02 14:23:04 +08:00
|
|
|
static target_ulong h_cede(PowerPCCPU *cpu, sPAPRMachineState *spapr,
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
target_ulong opcode, target_ulong *args)
|
|
|
|
{
|
2012-05-03 12:23:01 +08:00
|
|
|
CPUPPCState *env = &cpu->env;
|
2012-12-17 15:02:44 +08:00
|
|
|
CPUState *cs = CPU(cpu);
|
2012-05-03 12:23:01 +08:00
|
|
|
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
env->msr |= (1ULL << MSR_EE);
|
|
|
|
hreg_compute_hflags(env);
|
2012-12-17 15:02:44 +08:00
|
|
|
if (!cpu_has_work(cs)) {
|
2013-01-18 01:51:17 +08:00
|
|
|
cs->halted = 1;
|
2013-08-26 14:31:06 +08:00
|
|
|
cs->exception_index = EXCP_HLT;
|
2012-12-17 15:02:44 +08:00
|
|
|
cs->exit_request = 1;
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
}
|
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
|
2015-07-02 14:23:04 +08:00
|
|
|
static target_ulong h_rtas(PowerPCCPU *cpu, sPAPRMachineState *spapr,
|
2011-04-01 12:15:23 +08:00
|
|
|
target_ulong opcode, target_ulong *args)
|
|
|
|
{
|
|
|
|
target_ulong rtas_r3 = args[0];
|
2013-09-27 16:10:18 +08:00
|
|
|
uint32_t token = rtas_ld(rtas_r3, 0);
|
|
|
|
uint32_t nargs = rtas_ld(rtas_r3, 1);
|
|
|
|
uint32_t nret = rtas_ld(rtas_r3, 2);
|
2011-04-01 12:15:23 +08:00
|
|
|
|
2013-06-20 04:40:30 +08:00
|
|
|
return spapr_rtas_call(cpu, spapr, token, nargs, rtas_r3 + 12,
|
2011-04-01 12:15:23 +08:00
|
|
|
nret, rtas_r3 + 12 + 4*nargs);
|
|
|
|
}
|
|
|
|
|
2015-07-02 14:23:04 +08:00
|
|
|
static target_ulong h_logical_load(PowerPCCPU *cpu, sPAPRMachineState *spapr,
|
2011-08-10 22:44:20 +08:00
|
|
|
target_ulong opcode, target_ulong *args)
|
|
|
|
{
|
2013-11-15 21:46:38 +08:00
|
|
|
CPUState *cs = CPU(cpu);
|
2011-08-10 22:44:20 +08:00
|
|
|
target_ulong size = args[0];
|
|
|
|
target_ulong addr = args[1];
|
|
|
|
|
|
|
|
switch (size) {
|
|
|
|
case 1:
|
2013-12-17 12:05:40 +08:00
|
|
|
args[0] = ldub_phys(cs->as, addr);
|
2011-08-10 22:44:20 +08:00
|
|
|
return H_SUCCESS;
|
|
|
|
case 2:
|
2013-12-17 12:33:56 +08:00
|
|
|
args[0] = lduw_phys(cs->as, addr);
|
2011-08-10 22:44:20 +08:00
|
|
|
return H_SUCCESS;
|
|
|
|
case 4:
|
2013-11-15 21:46:38 +08:00
|
|
|
args[0] = ldl_phys(cs->as, addr);
|
2011-08-10 22:44:20 +08:00
|
|
|
return H_SUCCESS;
|
|
|
|
case 8:
|
2013-12-17 12:05:40 +08:00
|
|
|
args[0] = ldq_phys(cs->as, addr);
|
2011-08-10 22:44:20 +08:00
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
|
2015-07-02 14:23:04 +08:00
|
|
|
static target_ulong h_logical_store(PowerPCCPU *cpu, sPAPRMachineState *spapr,
|
2011-08-10 22:44:20 +08:00
|
|
|
target_ulong opcode, target_ulong *args)
|
|
|
|
{
|
2013-11-28 07:11:44 +08:00
|
|
|
CPUState *cs = CPU(cpu);
|
|
|
|
|
2011-08-10 22:44:20 +08:00
|
|
|
target_ulong size = args[0];
|
|
|
|
target_ulong addr = args[1];
|
|
|
|
target_ulong val = args[2];
|
|
|
|
|
|
|
|
switch (size) {
|
|
|
|
case 1:
|
2013-12-17 13:29:06 +08:00
|
|
|
stb_phys(cs->as, addr, val);
|
2011-08-10 22:44:20 +08:00
|
|
|
return H_SUCCESS;
|
|
|
|
case 2:
|
2013-12-17 13:22:06 +08:00
|
|
|
stw_phys(cs->as, addr, val);
|
2011-08-10 22:44:20 +08:00
|
|
|
return H_SUCCESS;
|
|
|
|
case 4:
|
2013-12-17 13:07:29 +08:00
|
|
|
stl_phys(cs->as, addr, val);
|
2011-08-10 22:44:20 +08:00
|
|
|
return H_SUCCESS;
|
|
|
|
case 8:
|
2013-11-28 07:11:44 +08:00
|
|
|
stq_phys(cs->as, addr, val);
|
2011-08-10 22:44:20 +08:00
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
|
2015-07-02 14:23:04 +08:00
|
|
|
static target_ulong h_logical_memop(PowerPCCPU *cpu, sPAPRMachineState *spapr,
|
2012-06-19 04:21:37 +08:00
|
|
|
target_ulong opcode, target_ulong *args)
|
|
|
|
{
|
2013-11-15 21:46:38 +08:00
|
|
|
CPUState *cs = CPU(cpu);
|
|
|
|
|
2012-06-19 04:21:37 +08:00
|
|
|
target_ulong dst = args[0]; /* Destination address */
|
|
|
|
target_ulong src = args[1]; /* Source address */
|
|
|
|
target_ulong esize = args[2]; /* Element size (0=1,1=2,2=4,3=8) */
|
|
|
|
target_ulong count = args[3]; /* Element count */
|
|
|
|
target_ulong op = args[4]; /* 0 = copy, 1 = invert */
|
|
|
|
uint64_t tmp;
|
|
|
|
unsigned int mask = (1 << esize) - 1;
|
|
|
|
int step = 1 << esize;
|
|
|
|
|
|
|
|
if (count > 0x80000000) {
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
|
|
|
|
if ((dst & mask) || (src & mask) || (op > 1)) {
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (dst >= src && dst < (src + (count << esize))) {
|
|
|
|
dst = dst + ((count - 1) << esize);
|
|
|
|
src = src + ((count - 1) << esize);
|
|
|
|
step = -step;
|
|
|
|
}
|
|
|
|
|
|
|
|
while (count--) {
|
|
|
|
switch (esize) {
|
|
|
|
case 0:
|
2013-12-17 12:05:40 +08:00
|
|
|
tmp = ldub_phys(cs->as, src);
|
2012-06-19 04:21:37 +08:00
|
|
|
break;
|
|
|
|
case 1:
|
2013-12-17 12:33:56 +08:00
|
|
|
tmp = lduw_phys(cs->as, src);
|
2012-06-19 04:21:37 +08:00
|
|
|
break;
|
|
|
|
case 2:
|
2013-11-15 21:46:38 +08:00
|
|
|
tmp = ldl_phys(cs->as, src);
|
2012-06-19 04:21:37 +08:00
|
|
|
break;
|
|
|
|
case 3:
|
2013-12-17 12:05:40 +08:00
|
|
|
tmp = ldq_phys(cs->as, src);
|
2012-06-19 04:21:37 +08:00
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
if (op == 1) {
|
|
|
|
tmp = ~tmp;
|
|
|
|
}
|
|
|
|
switch (esize) {
|
|
|
|
case 0:
|
2013-12-17 13:29:06 +08:00
|
|
|
stb_phys(cs->as, dst, tmp);
|
2012-06-19 04:21:37 +08:00
|
|
|
break;
|
|
|
|
case 1:
|
2013-12-17 13:22:06 +08:00
|
|
|
stw_phys(cs->as, dst, tmp);
|
2012-06-19 04:21:37 +08:00
|
|
|
break;
|
|
|
|
case 2:
|
2013-12-17 13:07:29 +08:00
|
|
|
stl_phys(cs->as, dst, tmp);
|
2012-06-19 04:21:37 +08:00
|
|
|
break;
|
|
|
|
case 3:
|
2013-11-28 07:11:44 +08:00
|
|
|
stq_phys(cs->as, dst, tmp);
|
2012-06-19 04:21:37 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
dst = dst + step;
|
|
|
|
src = src + step;
|
|
|
|
}
|
|
|
|
|
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
|
2015-07-02 14:23:04 +08:00
|
|
|
static target_ulong h_logical_icbi(PowerPCCPU *cpu, sPAPRMachineState *spapr,
|
2011-08-10 22:44:20 +08:00
|
|
|
target_ulong opcode, target_ulong *args)
|
|
|
|
{
|
|
|
|
/* Nothing to do on emulation, KVM will trap this in the kernel */
|
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
|
2015-07-02 14:23:04 +08:00
|
|
|
static target_ulong h_logical_dcbf(PowerPCCPU *cpu, sPAPRMachineState *spapr,
|
2011-08-10 22:44:20 +08:00
|
|
|
target_ulong opcode, target_ulong *args)
|
|
|
|
{
|
|
|
|
/* Nothing to do on emulation, KVM will trap this in the kernel */
|
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
|
2014-07-08 23:02:26 +08:00
|
|
|
static target_ulong h_set_mode_resource_le(PowerPCCPU *cpu,
|
|
|
|
target_ulong mflags,
|
|
|
|
target_ulong value1,
|
|
|
|
target_ulong value2)
|
2013-08-19 19:04:20 +08:00
|
|
|
{
|
|
|
|
CPUState *cs;
|
|
|
|
|
2014-06-04 20:51:04 +08:00
|
|
|
if (value1) {
|
|
|
|
return H_P3;
|
|
|
|
}
|
|
|
|
if (value2) {
|
|
|
|
return H_P4;
|
|
|
|
}
|
|
|
|
|
|
|
|
switch (mflags) {
|
|
|
|
case H_SET_MODE_ENDIAN_BIG:
|
|
|
|
CPU_FOREACH(cs) {
|
|
|
|
set_spr(cs, SPR_LPCR, 0, LPCR_ILE);
|
2013-08-19 19:04:20 +08:00
|
|
|
}
|
2015-02-10 12:36:16 +08:00
|
|
|
spapr_pci_switch_vga(true);
|
2014-06-04 20:51:04 +08:00
|
|
|
return H_SUCCESS;
|
|
|
|
|
|
|
|
case H_SET_MODE_ENDIAN_LITTLE:
|
|
|
|
CPU_FOREACH(cs) {
|
|
|
|
set_spr(cs, SPR_LPCR, LPCR_ILE, LPCR_ILE);
|
2013-08-19 19:04:20 +08:00
|
|
|
}
|
2015-02-10 12:36:16 +08:00
|
|
|
spapr_pci_switch_vga(false);
|
2014-06-04 20:51:04 +08:00
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
2013-08-19 19:04:20 +08:00
|
|
|
|
2014-06-04 20:51:04 +08:00
|
|
|
return H_UNSUPPORTED_FLAG;
|
|
|
|
}
|
2013-08-19 19:04:20 +08:00
|
|
|
|
2014-07-08 23:02:26 +08:00
|
|
|
static target_ulong h_set_mode_resource_addr_trans_mode(PowerPCCPU *cpu,
|
|
|
|
target_ulong mflags,
|
|
|
|
target_ulong value1,
|
|
|
|
target_ulong value2)
|
2014-06-04 20:51:05 +08:00
|
|
|
{
|
|
|
|
CPUState *cs;
|
|
|
|
PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
|
|
|
|
|
|
|
|
if (!(pcc->insns_flags2 & PPC2_ISA207S)) {
|
|
|
|
return H_P2;
|
|
|
|
}
|
|
|
|
if (value1) {
|
|
|
|
return H_P3;
|
|
|
|
}
|
|
|
|
if (value2) {
|
|
|
|
return H_P4;
|
|
|
|
}
|
|
|
|
|
2016-04-04 01:57:50 +08:00
|
|
|
if (mflags == AIL_RESERVED) {
|
2014-06-04 20:51:05 +08:00
|
|
|
return H_UNSUPPORTED_FLAG;
|
|
|
|
}
|
|
|
|
|
|
|
|
CPU_FOREACH(cs) {
|
|
|
|
set_spr(cs, SPR_LPCR, mflags << LPCR_AIL_SHIFT, LPCR_AIL);
|
|
|
|
}
|
|
|
|
|
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
|
2015-07-02 14:23:04 +08:00
|
|
|
static target_ulong h_set_mode(PowerPCCPU *cpu, sPAPRMachineState *spapr,
|
2014-06-04 20:51:04 +08:00
|
|
|
target_ulong opcode, target_ulong *args)
|
|
|
|
{
|
|
|
|
target_ulong resource = args[1];
|
|
|
|
target_ulong ret = H_P2;
|
|
|
|
|
|
|
|
switch (resource) {
|
|
|
|
case H_SET_MODE_RESOURCE_LE:
|
2014-07-08 23:02:26 +08:00
|
|
|
ret = h_set_mode_resource_le(cpu, args[0], args[2], args[3]);
|
2014-06-04 20:51:04 +08:00
|
|
|
break;
|
2014-06-04 20:51:05 +08:00
|
|
|
case H_SET_MODE_RESOURCE_ADDR_TRANS_MODE:
|
2014-07-08 23:02:26 +08:00
|
|
|
ret = h_set_mode_resource_addr_trans_mode(cpu, args[0],
|
|
|
|
args[2], args[3]);
|
2014-06-04 20:51:05 +08:00
|
|
|
break;
|
2013-08-19 19:04:20 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2017-03-20 07:46:45 +08:00
|
|
|
static target_ulong h_clean_slb(PowerPCCPU *cpu, sPAPRMachineState *spapr,
|
|
|
|
target_ulong opcode, target_ulong *args)
|
|
|
|
{
|
|
|
|
qemu_log_mask(LOG_UNIMP, "Unimplemented SPAPR hcall 0x"TARGET_FMT_lx"%s\n",
|
|
|
|
opcode, " (H_CLEAN_SLB)");
|
|
|
|
return H_FUNCTION;
|
|
|
|
}
|
|
|
|
|
|
|
|
static target_ulong h_invalidate_pid(PowerPCCPU *cpu, sPAPRMachineState *spapr,
|
|
|
|
target_ulong opcode, target_ulong *args)
|
|
|
|
{
|
|
|
|
qemu_log_mask(LOG_UNIMP, "Unimplemented SPAPR hcall 0x"TARGET_FMT_lx"%s\n",
|
|
|
|
opcode, " (H_INVALIDATE_PID)");
|
|
|
|
return H_FUNCTION;
|
|
|
|
}
|
|
|
|
|
2017-03-20 07:46:46 +08:00
|
|
|
static void spapr_check_setup_free_hpt(sPAPRMachineState *spapr,
|
|
|
|
uint64_t patbe_old, uint64_t patbe_new)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* We have 4 Options:
|
|
|
|
* HASH->HASH || RADIX->RADIX || NOTHING->RADIX : Do Nothing
|
|
|
|
* HASH->RADIX : Free HPT
|
|
|
|
* RADIX->HASH : Allocate HPT
|
|
|
|
* NOTHING->HASH : Allocate HPT
|
|
|
|
* Note: NOTHING implies the case where we said the guest could choose
|
|
|
|
* later and so assumed radix and now it's called H_REG_PROC_TBL
|
|
|
|
*/
|
|
|
|
|
|
|
|
if ((patbe_old & PATBE1_GR) == (patbe_new & PATBE1_GR)) {
|
|
|
|
/* We assume RADIX, so this catches all the "Do Nothing" cases */
|
|
|
|
} else if (!(patbe_old & PATBE1_GR)) {
|
|
|
|
/* HASH->RADIX : Free HPT */
|
2017-05-17 11:49:20 +08:00
|
|
|
spapr_free_hpt(spapr);
|
2017-03-20 07:46:46 +08:00
|
|
|
} else if (!(patbe_new & PATBE1_GR)) {
|
|
|
|
/* RADIX->HASH || NOTHING->HASH : Allocate HPT */
|
|
|
|
spapr_setup_hpt_and_vrma(spapr);
|
|
|
|
}
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
#define FLAGS_MASK 0x01FULL
|
|
|
|
#define FLAG_MODIFY 0x10
|
|
|
|
#define FLAG_REGISTER 0x08
|
|
|
|
#define FLAG_RADIX 0x04
|
|
|
|
#define FLAG_HASH_PROC_TBL 0x02
|
|
|
|
#define FLAG_GTSE 0x01
|
|
|
|
|
2017-03-20 07:46:45 +08:00
|
|
|
static target_ulong h_register_process_table(PowerPCCPU *cpu,
|
|
|
|
sPAPRMachineState *spapr,
|
|
|
|
target_ulong opcode,
|
|
|
|
target_ulong *args)
|
|
|
|
{
|
2017-05-02 14:37:14 +08:00
|
|
|
CPUState *cs;
|
2017-03-20 07:46:46 +08:00
|
|
|
target_ulong flags = args[0];
|
|
|
|
target_ulong proc_tbl = args[1];
|
|
|
|
target_ulong page_size = args[2];
|
|
|
|
target_ulong table_size = args[3];
|
|
|
|
uint64_t cproc;
|
|
|
|
|
|
|
|
if (flags & ~FLAGS_MASK) { /* Check no reserved bits are set */
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
if (flags & FLAG_MODIFY) {
|
|
|
|
if (flags & FLAG_REGISTER) {
|
|
|
|
if (flags & FLAG_RADIX) { /* Register new RADIX process table */
|
|
|
|
if (proc_tbl & 0xfff || proc_tbl >> 60) {
|
|
|
|
return H_P2;
|
|
|
|
} else if (page_size) {
|
|
|
|
return H_P3;
|
|
|
|
} else if (table_size > 24) {
|
|
|
|
return H_P4;
|
|
|
|
}
|
|
|
|
cproc = PATBE1_GR | proc_tbl | table_size;
|
|
|
|
} else { /* Register new HPT process table */
|
|
|
|
if (flags & FLAG_HASH_PROC_TBL) { /* Hash with Segment Tables */
|
|
|
|
/* TODO - Not Supported */
|
|
|
|
/* Technically caused by flag bits => H_PARAMETER */
|
|
|
|
return H_PARAMETER;
|
|
|
|
} else { /* Hash with SLB */
|
|
|
|
if (proc_tbl >> 38) {
|
|
|
|
return H_P2;
|
|
|
|
} else if (page_size & ~0x7) {
|
|
|
|
return H_P3;
|
|
|
|
} else if (table_size > 24) {
|
|
|
|
return H_P4;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
cproc = (proc_tbl << 25) | page_size << 5 | table_size;
|
|
|
|
}
|
|
|
|
|
|
|
|
} else { /* Deregister current process table */
|
|
|
|
/* Set to benign value: (current GR) | 0. This allows
|
|
|
|
* deregistration in KVM to succeed even if the radix bit in flags
|
|
|
|
* doesn't match the radix bit in the old PATB. */
|
|
|
|
cproc = spapr->patb_entry & PATBE1_GR;
|
|
|
|
}
|
|
|
|
} else { /* Maintain current registration */
|
|
|
|
if (!(flags & FLAG_RADIX) != !(spapr->patb_entry & PATBE1_GR)) {
|
|
|
|
/* Technically caused by flag bits => H_PARAMETER */
|
|
|
|
return H_PARAMETER; /* Existing Process Table Mismatch */
|
|
|
|
}
|
|
|
|
cproc = spapr->patb_entry;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Check if we need to setup OR free the hpt */
|
|
|
|
spapr_check_setup_free_hpt(spapr, spapr->patb_entry, cproc);
|
|
|
|
|
|
|
|
spapr->patb_entry = cproc; /* Save new process table */
|
2017-05-02 14:37:14 +08:00
|
|
|
|
|
|
|
/* Update the UPRT and GTSE bits in the LPCR for all cpus */
|
|
|
|
CPU_FOREACH(cs) {
|
2017-06-05 08:49:51 +08:00
|
|
|
set_spr(cs, SPR_LPCR,
|
2017-05-02 14:37:14 +08:00
|
|
|
((flags & (FLAG_RADIX | FLAG_HASH_PROC_TBL)) ? LPCR_UPRT : 0) |
|
2017-06-05 08:49:51 +08:00
|
|
|
((flags & FLAG_GTSE) ? LPCR_GTSE : 0),
|
|
|
|
LPCR_UPRT | LPCR_GTSE);
|
2017-03-20 07:46:46 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (kvm_enabled()) {
|
|
|
|
return kvmppc_configure_v3_mmu(cpu, flags & FLAG_RADIX,
|
|
|
|
flags & FLAG_GTSE, cproc);
|
|
|
|
}
|
|
|
|
return H_SUCCESS;
|
2017-03-20 07:46:45 +08:00
|
|
|
}
|
|
|
|
|
2016-12-05 13:50:21 +08:00
|
|
|
#define H_SIGNAL_SYS_RESET_ALL -1
|
|
|
|
#define H_SIGNAL_SYS_RESET_ALLBUTSELF -2
|
|
|
|
|
|
|
|
static target_ulong h_signal_sys_reset(PowerPCCPU *cpu,
|
|
|
|
sPAPRMachineState *spapr,
|
|
|
|
target_ulong opcode, target_ulong *args)
|
|
|
|
{
|
|
|
|
target_long target = args[0];
|
|
|
|
CPUState *cs;
|
|
|
|
|
|
|
|
if (target < 0) {
|
|
|
|
/* Broadcast */
|
|
|
|
if (target < H_SIGNAL_SYS_RESET_ALLBUTSELF) {
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
|
|
|
|
CPU_FOREACH(cs) {
|
|
|
|
PowerPCCPU *c = POWERPC_CPU(cs);
|
|
|
|
|
|
|
|
if (target == H_SIGNAL_SYS_RESET_ALLBUTSELF) {
|
|
|
|
if (c == cpu) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
run_on_cpu(cs, spapr_do_system_reset_on_cpu, RUN_ON_CPU_NULL);
|
|
|
|
}
|
|
|
|
return H_SUCCESS;
|
|
|
|
|
|
|
|
} else {
|
|
|
|
/* Unicast */
|
2017-08-09 13:38:56 +08:00
|
|
|
cs = CPU(spapr_find_cpu(target));
|
2017-08-03 14:28:27 +08:00
|
|
|
if (cs) {
|
|
|
|
run_on_cpu(cs, spapr_do_system_reset_on_cpu, RUN_ON_CPU_NULL);
|
|
|
|
return H_SUCCESS;
|
2016-12-05 13:50:21 +08:00
|
|
|
}
|
|
|
|
return H_PARAMETER;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-06-11 20:33:59 +08:00
|
|
|
static uint32_t cas_check_pvr(sPAPRMachineState *spapr, PowerPCCPU *cpu,
|
2017-08-17 19:23:50 +08:00
|
|
|
target_ulong *addr, bool *raw_mode_supported,
|
|
|
|
Error **errp)
|
2014-05-23 10:26:54 +08:00
|
|
|
{
|
2016-11-16 10:54:48 +08:00
|
|
|
bool explicit_match = false; /* Matched the CPU's real PVR */
|
2017-06-11 20:33:59 +08:00
|
|
|
uint32_t max_compat = spapr->max_compat_pvr;
|
2016-11-16 10:54:48 +08:00
|
|
|
uint32_t best_compat = 0;
|
|
|
|
int i;
|
spapr: Implement processor compatibility in ibm, client-architecture-support
Modern Linux kernels support last POWERPC CPUs so when a kernel boots,
in most cases it can find a matching cpu_spec in the kernel's cpu_specs
list. However if the kernel is quite old, it may be missing a definition
of the actual CPU. To provide an ability for old kernels to work on modern
hardware, a Processor Compatibility Mode has been introduced
by the PowerISA specification.
>From the hardware prospective, it is supported by the Processor
Compatibility Register (PCR) which is defined in PowerISA. The register
enables one of the compatibility modes (2.05/2.06/2.07).
Since PCR is a hypervisor privileged register and cannot be
directly accessed from the guest, the mode selection is done via
ibm,client-architecture-support (CAS) RTAS call using which the guest
specifies what "raw" and "architected" CPU versions it supports.
QEMU works out the best match, changes a "cpu-version" property of
every CPU and notifies the guest about the change by setting these
properties in the buffer passed as a response on a custom H_CAS hypercall.
This implements ibm,client-architecture-support parameters parsing
(now only for PVRs) and cooks the device tree diff with new values for
"cpu-version", "ibm,ppc-interrupt-server#s" and
"ibm,ppc-interrupt-server#s" properties.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-23 10:26:57 +08:00
|
|
|
|
2016-11-16 10:54:48 +08:00
|
|
|
/*
|
|
|
|
* We scan the supplied table of PVRs looking for two things
|
|
|
|
* 1. Is our real CPU PVR in the list?
|
|
|
|
* 2. What's the "best" listed logical PVR
|
|
|
|
*/
|
|
|
|
for (i = 0; i < 512; ++i) {
|
spapr: Implement processor compatibility in ibm, client-architecture-support
Modern Linux kernels support last POWERPC CPUs so when a kernel boots,
in most cases it can find a matching cpu_spec in the kernel's cpu_specs
list. However if the kernel is quite old, it may be missing a definition
of the actual CPU. To provide an ability for old kernels to work on modern
hardware, a Processor Compatibility Mode has been introduced
by the PowerISA specification.
>From the hardware prospective, it is supported by the Processor
Compatibility Register (PCR) which is defined in PowerISA. The register
enables one of the compatibility modes (2.05/2.06/2.07).
Since PCR is a hypervisor privileged register and cannot be
directly accessed from the guest, the mode selection is done via
ibm,client-architecture-support (CAS) RTAS call using which the guest
specifies what "raw" and "architected" CPU versions it supports.
QEMU works out the best match, changes a "cpu-version" property of
every CPU and notifies the guest about the change by setting these
properties in the buffer passed as a response on a custom H_CAS hypercall.
This implements ibm,client-architecture-support parameters parsing
(now only for PVRs) and cooks the device tree diff with new values for
"cpu-version", "ibm,ppc-interrupt-server#s" and
"ibm,ppc-interrupt-server#s" properties.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-23 10:26:57 +08:00
|
|
|
uint32_t pvr, pvr_mask;
|
|
|
|
|
2017-05-18 12:47:44 +08:00
|
|
|
pvr_mask = ldl_be_phys(&address_space_memory, *addr);
|
|
|
|
pvr = ldl_be_phys(&address_space_memory, *addr + 4);
|
|
|
|
*addr += 8;
|
2016-11-16 10:54:48 +08:00
|
|
|
|
spapr: Implement processor compatibility in ibm, client-architecture-support
Modern Linux kernels support last POWERPC CPUs so when a kernel boots,
in most cases it can find a matching cpu_spec in the kernel's cpu_specs
list. However if the kernel is quite old, it may be missing a definition
of the actual CPU. To provide an ability for old kernels to work on modern
hardware, a Processor Compatibility Mode has been introduced
by the PowerISA specification.
>From the hardware prospective, it is supported by the Processor
Compatibility Register (PCR) which is defined in PowerISA. The register
enables one of the compatibility modes (2.05/2.06/2.07).
Since PCR is a hypervisor privileged register and cannot be
directly accessed from the guest, the mode selection is done via
ibm,client-architecture-support (CAS) RTAS call using which the guest
specifies what "raw" and "architected" CPU versions it supports.
QEMU works out the best match, changes a "cpu-version" property of
every CPU and notifies the guest about the change by setting these
properties in the buffer passed as a response on a custom H_CAS hypercall.
This implements ibm,client-architecture-support parameters parsing
(now only for PVRs) and cooks the device tree diff with new values for
"cpu-version", "ibm,ppc-interrupt-server#s" and
"ibm,ppc-interrupt-server#s" properties.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-23 10:26:57 +08:00
|
|
|
if (~pvr_mask & pvr) {
|
2016-11-16 10:54:48 +08:00
|
|
|
break; /* Terminator record */
|
spapr: Implement processor compatibility in ibm, client-architecture-support
Modern Linux kernels support last POWERPC CPUs so when a kernel boots,
in most cases it can find a matching cpu_spec in the kernel's cpu_specs
list. However if the kernel is quite old, it may be missing a definition
of the actual CPU. To provide an ability for old kernels to work on modern
hardware, a Processor Compatibility Mode has been introduced
by the PowerISA specification.
>From the hardware prospective, it is supported by the Processor
Compatibility Register (PCR) which is defined in PowerISA. The register
enables one of the compatibility modes (2.05/2.06/2.07).
Since PCR is a hypervisor privileged register and cannot be
directly accessed from the guest, the mode selection is done via
ibm,client-architecture-support (CAS) RTAS call using which the guest
specifies what "raw" and "architected" CPU versions it supports.
QEMU works out the best match, changes a "cpu-version" property of
every CPU and notifies the guest about the change by setting these
properties in the buffer passed as a response on a custom H_CAS hypercall.
This implements ibm,client-architecture-support parameters parsing
(now only for PVRs) and cooks the device tree diff with new values for
"cpu-version", "ibm,ppc-interrupt-server#s" and
"ibm,ppc-interrupt-server#s" properties.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-23 10:26:57 +08:00
|
|
|
}
|
2016-11-16 10:54:48 +08:00
|
|
|
|
|
|
|
if ((cpu->env.spr[SPR_PVR] & pvr_mask) == (pvr & pvr_mask)) {
|
|
|
|
explicit_match = true;
|
|
|
|
} else {
|
|
|
|
if (ppc_check_compat(cpu, pvr, best_compat, max_compat)) {
|
|
|
|
best_compat = pvr;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if ((best_compat == 0) && (!explicit_match || max_compat)) {
|
|
|
|
/* We couldn't find a suitable compatibility mode, and either
|
|
|
|
* the guest doesn't support "raw" mode for this CPU, or raw
|
|
|
|
* mode is disabled because a maximum compat mode is set */
|
2017-05-18 12:47:44 +08:00
|
|
|
error_setg(errp, "Couldn't negotiate a suitable PVR during CAS");
|
|
|
|
return 0;
|
spapr: Implement processor compatibility in ibm, client-architecture-support
Modern Linux kernels support last POWERPC CPUs so when a kernel boots,
in most cases it can find a matching cpu_spec in the kernel's cpu_specs
list. However if the kernel is quite old, it may be missing a definition
of the actual CPU. To provide an ability for old kernels to work on modern
hardware, a Processor Compatibility Mode has been introduced
by the PowerISA specification.
>From the hardware prospective, it is supported by the Processor
Compatibility Register (PCR) which is defined in PowerISA. The register
enables one of the compatibility modes (2.05/2.06/2.07).
Since PCR is a hypervisor privileged register and cannot be
directly accessed from the guest, the mode selection is done via
ibm,client-architecture-support (CAS) RTAS call using which the guest
specifies what "raw" and "architected" CPU versions it supports.
QEMU works out the best match, changes a "cpu-version" property of
every CPU and notifies the guest about the change by setting these
properties in the buffer passed as a response on a custom H_CAS hypercall.
This implements ibm,client-architecture-support parameters parsing
(now only for PVRs) and cooks the device tree diff with new values for
"cpu-version", "ibm,ppc-interrupt-server#s" and
"ibm,ppc-interrupt-server#s" properties.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-23 10:26:57 +08:00
|
|
|
}
|
|
|
|
|
2017-08-17 19:23:50 +08:00
|
|
|
*raw_mode_supported = explicit_match;
|
|
|
|
|
spapr: Implement processor compatibility in ibm, client-architecture-support
Modern Linux kernels support last POWERPC CPUs so when a kernel boots,
in most cases it can find a matching cpu_spec in the kernel's cpu_specs
list. However if the kernel is quite old, it may be missing a definition
of the actual CPU. To provide an ability for old kernels to work on modern
hardware, a Processor Compatibility Mode has been introduced
by the PowerISA specification.
>From the hardware prospective, it is supported by the Processor
Compatibility Register (PCR) which is defined in PowerISA. The register
enables one of the compatibility modes (2.05/2.06/2.07).
Since PCR is a hypervisor privileged register and cannot be
directly accessed from the guest, the mode selection is done via
ibm,client-architecture-support (CAS) RTAS call using which the guest
specifies what "raw" and "architected" CPU versions it supports.
QEMU works out the best match, changes a "cpu-version" property of
every CPU and notifies the guest about the change by setting these
properties in the buffer passed as a response on a custom H_CAS hypercall.
This implements ibm,client-architecture-support parameters parsing
(now only for PVRs) and cooks the device tree diff with new values for
"cpu-version", "ibm,ppc-interrupt-server#s" and
"ibm,ppc-interrupt-server#s" properties.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-23 10:26:57 +08:00
|
|
|
/* Parsing finished */
|
2016-11-16 10:54:48 +08:00
|
|
|
trace_spapr_cas_pvr(cpu->compat_pvr, explicit_match, best_compat);
|
spapr: Implement processor compatibility in ibm, client-architecture-support
Modern Linux kernels support last POWERPC CPUs so when a kernel boots,
in most cases it can find a matching cpu_spec in the kernel's cpu_specs
list. However if the kernel is quite old, it may be missing a definition
of the actual CPU. To provide an ability for old kernels to work on modern
hardware, a Processor Compatibility Mode has been introduced
by the PowerISA specification.
>From the hardware prospective, it is supported by the Processor
Compatibility Register (PCR) which is defined in PowerISA. The register
enables one of the compatibility modes (2.05/2.06/2.07).
Since PCR is a hypervisor privileged register and cannot be
directly accessed from the guest, the mode selection is done via
ibm,client-architecture-support (CAS) RTAS call using which the guest
specifies what "raw" and "architected" CPU versions it supports.
QEMU works out the best match, changes a "cpu-version" property of
every CPU and notifies the guest about the change by setting these
properties in the buffer passed as a response on a custom H_CAS hypercall.
This implements ibm,client-architecture-support parameters parsing
(now only for PVRs) and cooks the device tree diff with new values for
"cpu-version", "ibm,ppc-interrupt-server#s" and
"ibm,ppc-interrupt-server#s" properties.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-23 10:26:57 +08:00
|
|
|
|
2017-05-18 12:47:44 +08:00
|
|
|
return best_compat;
|
|
|
|
}
|
spapr: Implement processor compatibility in ibm, client-architecture-support
Modern Linux kernels support last POWERPC CPUs so when a kernel boots,
in most cases it can find a matching cpu_spec in the kernel's cpu_specs
list. However if the kernel is quite old, it may be missing a definition
of the actual CPU. To provide an ability for old kernels to work on modern
hardware, a Processor Compatibility Mode has been introduced
by the PowerISA specification.
>From the hardware prospective, it is supported by the Processor
Compatibility Register (PCR) which is defined in PowerISA. The register
enables one of the compatibility modes (2.05/2.06/2.07).
Since PCR is a hypervisor privileged register and cannot be
directly accessed from the guest, the mode selection is done via
ibm,client-architecture-support (CAS) RTAS call using which the guest
specifies what "raw" and "architected" CPU versions it supports.
QEMU works out the best match, changes a "cpu-version" property of
every CPU and notifies the guest about the change by setting these
properties in the buffer passed as a response on a custom H_CAS hypercall.
This implements ibm,client-architecture-support parameters parsing
(now only for PVRs) and cooks the device tree diff with new values for
"cpu-version", "ibm,ppc-interrupt-server#s" and
"ibm,ppc-interrupt-server#s" properties.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-23 10:26:57 +08:00
|
|
|
|
2017-05-18 12:47:44 +08:00
|
|
|
static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
|
|
|
|
sPAPRMachineState *spapr,
|
|
|
|
target_ulong opcode,
|
|
|
|
target_ulong *args)
|
|
|
|
{
|
|
|
|
/* Working address in data buffer */
|
|
|
|
target_ulong addr = ppc64_phys_to_real(args[0]);
|
|
|
|
target_ulong ov_table;
|
|
|
|
uint32_t cas_pvr;
|
|
|
|
sPAPROptionVector *ov1_guest, *ov5_guest, *ov5_cas_old, *ov5_updates;
|
|
|
|
bool guest_radix;
|
|
|
|
Error *local_err = NULL;
|
2017-08-17 19:23:50 +08:00
|
|
|
bool raw_mode_supported = false;
|
2017-05-18 12:47:44 +08:00
|
|
|
|
2017-08-17 19:23:50 +08:00
|
|
|
cas_pvr = cas_check_pvr(spapr, cpu, &addr, &raw_mode_supported, &local_err);
|
2017-05-18 12:47:44 +08:00
|
|
|
if (local_err) {
|
|
|
|
error_report_err(local_err);
|
|
|
|
return H_HARDWARE;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Update CPUs */
|
|
|
|
if (cpu->compat_pvr != cas_pvr) {
|
|
|
|
ppc_set_compat_all(cas_pvr, &local_err);
|
2016-11-10 11:37:38 +08:00
|
|
|
if (local_err) {
|
2017-08-17 19:23:50 +08:00
|
|
|
/* We fail to set compat mode (likely because running with KVM PR),
|
|
|
|
* but maybe we can fallback to raw mode if the guest supports it.
|
|
|
|
*/
|
|
|
|
if (!raw_mode_supported) {
|
|
|
|
error_report_err(local_err);
|
|
|
|
return H_HARDWARE;
|
|
|
|
}
|
|
|
|
local_err = NULL;
|
spapr: Implement processor compatibility in ibm, client-architecture-support
Modern Linux kernels support last POWERPC CPUs so when a kernel boots,
in most cases it can find a matching cpu_spec in the kernel's cpu_specs
list. However if the kernel is quite old, it may be missing a definition
of the actual CPU. To provide an ability for old kernels to work on modern
hardware, a Processor Compatibility Mode has been introduced
by the PowerISA specification.
>From the hardware prospective, it is supported by the Processor
Compatibility Register (PCR) which is defined in PowerISA. The register
enables one of the compatibility modes (2.05/2.06/2.07).
Since PCR is a hypervisor privileged register and cannot be
directly accessed from the guest, the mode selection is done via
ibm,client-architecture-support (CAS) RTAS call using which the guest
specifies what "raw" and "architected" CPU versions it supports.
QEMU works out the best match, changes a "cpu-version" property of
every CPU and notifies the guest about the change by setting these
properties in the buffer passed as a response on a custom H_CAS hypercall.
This implements ibm,client-architecture-support parameters parsing
(now only for PVRs) and cooks the device tree diff with new values for
"cpu-version", "ibm,ppc-interrupt-server#s" and
"ibm,ppc-interrupt-server#s" properties.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-23 10:26:57 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-07-13 08:34:00 +08:00
|
|
|
/* For the future use: here @ov_table points to the first option vector */
|
2017-05-18 12:47:44 +08:00
|
|
|
ov_table = addr;
|
2015-07-13 08:34:00 +08:00
|
|
|
|
2017-03-20 07:46:49 +08:00
|
|
|
ov1_guest = spapr_ovec_parse_vector(ov_table, 1);
|
2016-10-25 12:47:28 +08:00
|
|
|
ov5_guest = spapr_ovec_parse_vector(ov_table, 5);
|
2017-03-23 11:46:00 +08:00
|
|
|
if (spapr_ovec_test(ov5_guest, OV5_MMU_BOTH)) {
|
|
|
|
error_report("guest requested hash and radix MMU, which is invalid.");
|
|
|
|
exit(EXIT_FAILURE);
|
|
|
|
}
|
|
|
|
/* The radix/hash bit in byte 24 requires special handling: */
|
|
|
|
guest_radix = spapr_ovec_test(ov5_guest, OV5_MMU_RADIX_300);
|
|
|
|
spapr_ovec_clear(ov5_guest, OV5_MMU_RADIX_300);
|
2014-05-23 10:26:54 +08:00
|
|
|
|
2017-07-12 15:56:06 +08:00
|
|
|
/*
|
|
|
|
* HPT resizing is a bit of a special case, because when enabled
|
|
|
|
* we assume an HPT guest will support it until it says it
|
|
|
|
* doesn't, instead of assuming it won't support it until it says
|
|
|
|
* it does. Strictly speaking that approach could break for
|
|
|
|
* guests which don't make a CAS call, but those are so old we
|
|
|
|
* don't care about them. Without that assumption we'd have to
|
|
|
|
* make at least a temporary allocation of an HPT sized for max
|
|
|
|
* memory, which could be impossibly difficult under KVM HV if
|
|
|
|
* maxram is large.
|
|
|
|
*/
|
|
|
|
if (!guest_radix && !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
|
|
|
|
int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
|
|
|
|
|
|
|
|
if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED) {
|
|
|
|
error_report(
|
|
|
|
"h_client_architecture_support: Guest doesn't support HPT resizing, but resize-hpt=required");
|
|
|
|
exit(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (spapr->htab_shift < maxshift) {
|
|
|
|
/* Guest doesn't know about HPT resizing, so we
|
|
|
|
* pre-emptively resize for the maximum permitted RAM. At
|
|
|
|
* the point this is called, nothing should have been
|
|
|
|
* entered into the existing HPT */
|
|
|
|
spapr_reallocate_hpt(spapr, maxshift, &error_fatal);
|
2017-09-05 05:46:55 +08:00
|
|
|
if (kvm_enabled()) {
|
|
|
|
/* For KVM PR, update the HPT pointer */
|
|
|
|
target_ulong sdr1 = (target_ulong)(uintptr_t)spapr->htab
|
|
|
|
| (spapr->htab_shift - 18);
|
|
|
|
kvmppc_update_sdr1(sdr1);
|
2017-07-12 15:56:55 +08:00
|
|
|
}
|
2017-07-12 15:56:06 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-10-25 12:47:28 +08:00
|
|
|
/* NOTE: there are actually a number of ov5 bits where input from the
|
|
|
|
* guest is always zero, and the platform/QEMU enables them independently
|
|
|
|
* of guest input. To model these properly we'd want some sort of mask,
|
|
|
|
* but since they only currently apply to memory migration as defined
|
|
|
|
* by LoPAPR 1.1, 14.5.4.8, which QEMU doesn't implement, we don't need
|
spapr: add option vector handling in CAS-generated resets
In some cases, ibm,client-architecture-support calls can fail. This
could happen in the current code for situations where the modified
device tree segment exceeds the buffer size provided by the guest
via the call parameters. In these cases, QEMU will reset, allowing
an opportunity to regenerate the device tree from scratch via
boot-time handling. There are potentially other scenarios as well,
not currently reachable in the current code, but possible in theory,
such as cases where device-tree properties or nodes need to be removed.
We currently don't handle either of these properly for option vector
capabilities however. Instead of carrying the negotiated capability
beyond the reset and creating the boot-time device tree accordingly,
we start from scratch, generating the same boot-time device tree as we
did prior to the CAS-generated and the same device tree updates as we
did before. This could (in theory) cause us to get stuck in a reset
loop. This hasn't been observed, but depending on the extensiveness
of CAS-induced device tree updates in the future, could eventually
become an issue.
Address this by pulling capability-related device tree
updates resulting from CAS calls into a common routine,
spapr_dt_cas_updates(), and adding an sPAPROptionVector*
parameter that allows us to test for newly-negotiated capabilities.
We invoke it as follows:
1) When ibm,client-architecture-support gets called, we
call spapr_dt_cas_updates() with the set of capabilities
added since the previous call to ibm,client-architecture-support.
For the initial boot, or a system reset generated by something
other than the CAS call itself, this set will consist of *all*
options supported both the platform and the guest. For calls
to ibm,client-architecture-support immediately after a CAS-induced
reset, we call spapr_dt_cas_updates() with only the set
of capabilities added since the previous call, since the other
capabilities will have already been addressed by the boot-time
device-tree this time around. In the unlikely event that
capabilities are *removed* since the previous CAS, we will
generate a CAS-induced reset. In the unlikely event that we
cannot fit the device-tree updates into the buffer provided
by the guest, well generate a CAS-induced reset.
2) When a CAS update results in the need to reset the machine and
include the updates in the boot-time device tree, we call the
spapr_dt_cas_updates() using the full set of negotiated
capabilities as part of the reset path. At initial boot, or after
a reset generated by something other than the CAS call itself,
this set will be empty, resulting in what should be the same
boot-time device-tree as we generated prior to this patch. For
CAS-induced reset, this routine will be called with the full set of
capabilities negotiated by the platform/guest in the previous
CAS call, which should result in CAS updates from previous call
being accounted for in the initial boot-time device tree.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Changed an int -> bool conversion to be more explicit]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-25 12:47:29 +08:00
|
|
|
* to worry about this for now.
|
2016-10-25 12:47:28 +08:00
|
|
|
*/
|
spapr: add option vector handling in CAS-generated resets
In some cases, ibm,client-architecture-support calls can fail. This
could happen in the current code for situations where the modified
device tree segment exceeds the buffer size provided by the guest
via the call parameters. In these cases, QEMU will reset, allowing
an opportunity to regenerate the device tree from scratch via
boot-time handling. There are potentially other scenarios as well,
not currently reachable in the current code, but possible in theory,
such as cases where device-tree properties or nodes need to be removed.
We currently don't handle either of these properly for option vector
capabilities however. Instead of carrying the negotiated capability
beyond the reset and creating the boot-time device tree accordingly,
we start from scratch, generating the same boot-time device tree as we
did prior to the CAS-generated and the same device tree updates as we
did before. This could (in theory) cause us to get stuck in a reset
loop. This hasn't been observed, but depending on the extensiveness
of CAS-induced device tree updates in the future, could eventually
become an issue.
Address this by pulling capability-related device tree
updates resulting from CAS calls into a common routine,
spapr_dt_cas_updates(), and adding an sPAPROptionVector*
parameter that allows us to test for newly-negotiated capabilities.
We invoke it as follows:
1) When ibm,client-architecture-support gets called, we
call spapr_dt_cas_updates() with the set of capabilities
added since the previous call to ibm,client-architecture-support.
For the initial boot, or a system reset generated by something
other than the CAS call itself, this set will consist of *all*
options supported both the platform and the guest. For calls
to ibm,client-architecture-support immediately after a CAS-induced
reset, we call spapr_dt_cas_updates() with only the set
of capabilities added since the previous call, since the other
capabilities will have already been addressed by the boot-time
device-tree this time around. In the unlikely event that
capabilities are *removed* since the previous CAS, we will
generate a CAS-induced reset. In the unlikely event that we
cannot fit the device-tree updates into the buffer provided
by the guest, well generate a CAS-induced reset.
2) When a CAS update results in the need to reset the machine and
include the updates in the boot-time device tree, we call the
spapr_dt_cas_updates() using the full set of negotiated
capabilities as part of the reset path. At initial boot, or after
a reset generated by something other than the CAS call itself,
this set will be empty, resulting in what should be the same
boot-time device-tree as we generated prior to this patch. For
CAS-induced reset, this routine will be called with the full set of
capabilities negotiated by the platform/guest in the previous
CAS call, which should result in CAS updates from previous call
being accounted for in the initial boot-time device tree.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Changed an int -> bool conversion to be more explicit]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-25 12:47:29 +08:00
|
|
|
ov5_cas_old = spapr_ovec_clone(spapr->ov5_cas);
|
2017-09-08 22:33:43 +08:00
|
|
|
|
|
|
|
/* also clear the radix/hash bit from the current ov5_cas bits to
|
|
|
|
* be in sync with the newly ov5 bits. Else the radix bit will be
|
|
|
|
* seen as being removed and this will generate a reset loop
|
|
|
|
*/
|
|
|
|
spapr_ovec_clear(ov5_cas_old, OV5_MMU_RADIX_300);
|
|
|
|
|
spapr: add option vector handling in CAS-generated resets
In some cases, ibm,client-architecture-support calls can fail. This
could happen in the current code for situations where the modified
device tree segment exceeds the buffer size provided by the guest
via the call parameters. In these cases, QEMU will reset, allowing
an opportunity to regenerate the device tree from scratch via
boot-time handling. There are potentially other scenarios as well,
not currently reachable in the current code, but possible in theory,
such as cases where device-tree properties or nodes need to be removed.
We currently don't handle either of these properly for option vector
capabilities however. Instead of carrying the negotiated capability
beyond the reset and creating the boot-time device tree accordingly,
we start from scratch, generating the same boot-time device tree as we
did prior to the CAS-generated and the same device tree updates as we
did before. This could (in theory) cause us to get stuck in a reset
loop. This hasn't been observed, but depending on the extensiveness
of CAS-induced device tree updates in the future, could eventually
become an issue.
Address this by pulling capability-related device tree
updates resulting from CAS calls into a common routine,
spapr_dt_cas_updates(), and adding an sPAPROptionVector*
parameter that allows us to test for newly-negotiated capabilities.
We invoke it as follows:
1) When ibm,client-architecture-support gets called, we
call spapr_dt_cas_updates() with the set of capabilities
added since the previous call to ibm,client-architecture-support.
For the initial boot, or a system reset generated by something
other than the CAS call itself, this set will consist of *all*
options supported both the platform and the guest. For calls
to ibm,client-architecture-support immediately after a CAS-induced
reset, we call spapr_dt_cas_updates() with only the set
of capabilities added since the previous call, since the other
capabilities will have already been addressed by the boot-time
device-tree this time around. In the unlikely event that
capabilities are *removed* since the previous CAS, we will
generate a CAS-induced reset. In the unlikely event that we
cannot fit the device-tree updates into the buffer provided
by the guest, well generate a CAS-induced reset.
2) When a CAS update results in the need to reset the machine and
include the updates in the boot-time device tree, we call the
spapr_dt_cas_updates() using the full set of negotiated
capabilities as part of the reset path. At initial boot, or after
a reset generated by something other than the CAS call itself,
this set will be empty, resulting in what should be the same
boot-time device-tree as we generated prior to this patch. For
CAS-induced reset, this routine will be called with the full set of
capabilities negotiated by the platform/guest in the previous
CAS call, which should result in CAS updates from previous call
being accounted for in the initial boot-time device tree.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Changed an int -> bool conversion to be more explicit]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-25 12:47:29 +08:00
|
|
|
/* full range of negotiated ov5 capabilities */
|
2016-10-25 12:47:28 +08:00
|
|
|
spapr_ovec_intersect(spapr->ov5_cas, spapr->ov5, ov5_guest);
|
|
|
|
spapr_ovec_cleanup(ov5_guest);
|
spapr: add option vector handling in CAS-generated resets
In some cases, ibm,client-architecture-support calls can fail. This
could happen in the current code for situations where the modified
device tree segment exceeds the buffer size provided by the guest
via the call parameters. In these cases, QEMU will reset, allowing
an opportunity to regenerate the device tree from scratch via
boot-time handling. There are potentially other scenarios as well,
not currently reachable in the current code, but possible in theory,
such as cases where device-tree properties or nodes need to be removed.
We currently don't handle either of these properly for option vector
capabilities however. Instead of carrying the negotiated capability
beyond the reset and creating the boot-time device tree accordingly,
we start from scratch, generating the same boot-time device tree as we
did prior to the CAS-generated and the same device tree updates as we
did before. This could (in theory) cause us to get stuck in a reset
loop. This hasn't been observed, but depending on the extensiveness
of CAS-induced device tree updates in the future, could eventually
become an issue.
Address this by pulling capability-related device tree
updates resulting from CAS calls into a common routine,
spapr_dt_cas_updates(), and adding an sPAPROptionVector*
parameter that allows us to test for newly-negotiated capabilities.
We invoke it as follows:
1) When ibm,client-architecture-support gets called, we
call spapr_dt_cas_updates() with the set of capabilities
added since the previous call to ibm,client-architecture-support.
For the initial boot, or a system reset generated by something
other than the CAS call itself, this set will consist of *all*
options supported both the platform and the guest. For calls
to ibm,client-architecture-support immediately after a CAS-induced
reset, we call spapr_dt_cas_updates() with only the set
of capabilities added since the previous call, since the other
capabilities will have already been addressed by the boot-time
device-tree this time around. In the unlikely event that
capabilities are *removed* since the previous CAS, we will
generate a CAS-induced reset. In the unlikely event that we
cannot fit the device-tree updates into the buffer provided
by the guest, well generate a CAS-induced reset.
2) When a CAS update results in the need to reset the machine and
include the updates in the boot-time device tree, we call the
spapr_dt_cas_updates() using the full set of negotiated
capabilities as part of the reset path. At initial boot, or after
a reset generated by something other than the CAS call itself,
this set will be empty, resulting in what should be the same
boot-time device-tree as we generated prior to this patch. For
CAS-induced reset, this routine will be called with the full set of
capabilities negotiated by the platform/guest in the previous
CAS call, which should result in CAS updates from previous call
being accounted for in the initial boot-time device tree.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Changed an int -> bool conversion to be more explicit]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-25 12:47:29 +08:00
|
|
|
/* capabilities that have been added since CAS-generated guest reset.
|
|
|
|
* if capabilities have since been removed, generate another reset
|
|
|
|
*/
|
|
|
|
ov5_updates = spapr_ovec_new();
|
|
|
|
spapr->cas_reboot = spapr_ovec_diff(ov5_updates,
|
|
|
|
ov5_cas_old, spapr->ov5_cas);
|
2017-03-23 11:46:00 +08:00
|
|
|
/* Now that processing is finished, set the radix/hash bit for the
|
|
|
|
* guest if it requested a valid mode; otherwise terminate the boot. */
|
|
|
|
if (guest_radix) {
|
|
|
|
if (kvm_enabled() && !kvmppc_has_cap_mmu_radix()) {
|
|
|
|
error_report("Guest requested unavailable MMU mode (radix).");
|
|
|
|
exit(EXIT_FAILURE);
|
|
|
|
}
|
|
|
|
spapr_ovec_set(spapr->ov5_cas, OV5_MMU_RADIX_300);
|
|
|
|
} else {
|
|
|
|
if (kvm_enabled() && kvmppc_has_cap_mmu_radix()
|
|
|
|
&& !kvmppc_has_cap_mmu_hash_v3()) {
|
|
|
|
error_report("Guest requested unavailable MMU mode (hash).");
|
|
|
|
exit(EXIT_FAILURE);
|
|
|
|
}
|
|
|
|
}
|
2017-03-20 07:46:49 +08:00
|
|
|
spapr->cas_legacy_guest_workaround = !spapr_ovec_test(ov1_guest,
|
|
|
|
OV1_PPC_3_00);
|
spapr: add option vector handling in CAS-generated resets
In some cases, ibm,client-architecture-support calls can fail. This
could happen in the current code for situations where the modified
device tree segment exceeds the buffer size provided by the guest
via the call parameters. In these cases, QEMU will reset, allowing
an opportunity to regenerate the device tree from scratch via
boot-time handling. There are potentially other scenarios as well,
not currently reachable in the current code, but possible in theory,
such as cases where device-tree properties or nodes need to be removed.
We currently don't handle either of these properly for option vector
capabilities however. Instead of carrying the negotiated capability
beyond the reset and creating the boot-time device tree accordingly,
we start from scratch, generating the same boot-time device tree as we
did prior to the CAS-generated and the same device tree updates as we
did before. This could (in theory) cause us to get stuck in a reset
loop. This hasn't been observed, but depending on the extensiveness
of CAS-induced device tree updates in the future, could eventually
become an issue.
Address this by pulling capability-related device tree
updates resulting from CAS calls into a common routine,
spapr_dt_cas_updates(), and adding an sPAPROptionVector*
parameter that allows us to test for newly-negotiated capabilities.
We invoke it as follows:
1) When ibm,client-architecture-support gets called, we
call spapr_dt_cas_updates() with the set of capabilities
added since the previous call to ibm,client-architecture-support.
For the initial boot, or a system reset generated by something
other than the CAS call itself, this set will consist of *all*
options supported both the platform and the guest. For calls
to ibm,client-architecture-support immediately after a CAS-induced
reset, we call spapr_dt_cas_updates() with only the set
of capabilities added since the previous call, since the other
capabilities will have already been addressed by the boot-time
device-tree this time around. In the unlikely event that
capabilities are *removed* since the previous CAS, we will
generate a CAS-induced reset. In the unlikely event that we
cannot fit the device-tree updates into the buffer provided
by the guest, well generate a CAS-induced reset.
2) When a CAS update results in the need to reset the machine and
include the updates in the boot-time device tree, we call the
spapr_dt_cas_updates() using the full set of negotiated
capabilities as part of the reset path. At initial boot, or after
a reset generated by something other than the CAS call itself,
this set will be empty, resulting in what should be the same
boot-time device-tree as we generated prior to this patch. For
CAS-induced reset, this routine will be called with the full set of
capabilities negotiated by the platform/guest in the previous
CAS call, which should result in CAS updates from previous call
being accounted for in the initial boot-time device tree.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Changed an int -> bool conversion to be more explicit]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-25 12:47:29 +08:00
|
|
|
if (!spapr->cas_reboot) {
|
|
|
|
spapr->cas_reboot =
|
2016-10-28 21:01:05 +08:00
|
|
|
(spapr_h_cas_compose_response(spapr, args[1], args[2],
|
spapr: add option vector handling in CAS-generated resets
In some cases, ibm,client-architecture-support calls can fail. This
could happen in the current code for situations where the modified
device tree segment exceeds the buffer size provided by the guest
via the call parameters. In these cases, QEMU will reset, allowing
an opportunity to regenerate the device tree from scratch via
boot-time handling. There are potentially other scenarios as well,
not currently reachable in the current code, but possible in theory,
such as cases where device-tree properties or nodes need to be removed.
We currently don't handle either of these properly for option vector
capabilities however. Instead of carrying the negotiated capability
beyond the reset and creating the boot-time device tree accordingly,
we start from scratch, generating the same boot-time device tree as we
did prior to the CAS-generated and the same device tree updates as we
did before. This could (in theory) cause us to get stuck in a reset
loop. This hasn't been observed, but depending on the extensiveness
of CAS-induced device tree updates in the future, could eventually
become an issue.
Address this by pulling capability-related device tree
updates resulting from CAS calls into a common routine,
spapr_dt_cas_updates(), and adding an sPAPROptionVector*
parameter that allows us to test for newly-negotiated capabilities.
We invoke it as follows:
1) When ibm,client-architecture-support gets called, we
call spapr_dt_cas_updates() with the set of capabilities
added since the previous call to ibm,client-architecture-support.
For the initial boot, or a system reset generated by something
other than the CAS call itself, this set will consist of *all*
options supported both the platform and the guest. For calls
to ibm,client-architecture-support immediately after a CAS-induced
reset, we call spapr_dt_cas_updates() with only the set
of capabilities added since the previous call, since the other
capabilities will have already been addressed by the boot-time
device-tree this time around. In the unlikely event that
capabilities are *removed* since the previous CAS, we will
generate a CAS-induced reset. In the unlikely event that we
cannot fit the device-tree updates into the buffer provided
by the guest, well generate a CAS-induced reset.
2) When a CAS update results in the need to reset the machine and
include the updates in the boot-time device tree, we call the
spapr_dt_cas_updates() using the full set of negotiated
capabilities as part of the reset path. At initial boot, or after
a reset generated by something other than the CAS call itself,
this set will be empty, resulting in what should be the same
boot-time device-tree as we generated prior to this patch. For
CAS-induced reset, this routine will be called with the full set of
capabilities negotiated by the platform/guest in the previous
CAS call, which should result in CAS updates from previous call
being accounted for in the initial boot-time device tree.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Changed an int -> bool conversion to be more explicit]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-25 12:47:29 +08:00
|
|
|
ov5_updates) != 0);
|
|
|
|
}
|
|
|
|
spapr_ovec_cleanup(ov5_updates);
|
2015-07-13 08:34:00 +08:00
|
|
|
|
spapr: add option vector handling in CAS-generated resets
In some cases, ibm,client-architecture-support calls can fail. This
could happen in the current code for situations where the modified
device tree segment exceeds the buffer size provided by the guest
via the call parameters. In these cases, QEMU will reset, allowing
an opportunity to regenerate the device tree from scratch via
boot-time handling. There are potentially other scenarios as well,
not currently reachable in the current code, but possible in theory,
such as cases where device-tree properties or nodes need to be removed.
We currently don't handle either of these properly for option vector
capabilities however. Instead of carrying the negotiated capability
beyond the reset and creating the boot-time device tree accordingly,
we start from scratch, generating the same boot-time device tree as we
did prior to the CAS-generated and the same device tree updates as we
did before. This could (in theory) cause us to get stuck in a reset
loop. This hasn't been observed, but depending on the extensiveness
of CAS-induced device tree updates in the future, could eventually
become an issue.
Address this by pulling capability-related device tree
updates resulting from CAS calls into a common routine,
spapr_dt_cas_updates(), and adding an sPAPROptionVector*
parameter that allows us to test for newly-negotiated capabilities.
We invoke it as follows:
1) When ibm,client-architecture-support gets called, we
call spapr_dt_cas_updates() with the set of capabilities
added since the previous call to ibm,client-architecture-support.
For the initial boot, or a system reset generated by something
other than the CAS call itself, this set will consist of *all*
options supported both the platform and the guest. For calls
to ibm,client-architecture-support immediately after a CAS-induced
reset, we call spapr_dt_cas_updates() with only the set
of capabilities added since the previous call, since the other
capabilities will have already been addressed by the boot-time
device-tree this time around. In the unlikely event that
capabilities are *removed* since the previous CAS, we will
generate a CAS-induced reset. In the unlikely event that we
cannot fit the device-tree updates into the buffer provided
by the guest, well generate a CAS-induced reset.
2) When a CAS update results in the need to reset the machine and
include the updates in the boot-time device tree, we call the
spapr_dt_cas_updates() using the full set of negotiated
capabilities as part of the reset path. At initial boot, or after
a reset generated by something other than the CAS call itself,
this set will be empty, resulting in what should be the same
boot-time device-tree as we generated prior to this patch. For
CAS-induced reset, this routine will be called with the full set of
capabilities negotiated by the platform/guest in the previous
CAS call, which should result in CAS updates from previous call
being accounted for in the initial boot-time device tree.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Changed an int -> bool conversion to be more explicit]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-25 12:47:29 +08:00
|
|
|
if (spapr->cas_reboot) {
|
2017-05-16 05:41:13 +08:00
|
|
|
qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
|
2017-03-23 11:46:00 +08:00
|
|
|
} else {
|
|
|
|
/* If ppc_spapr_reset() did not set up a HPT but one is necessary
|
|
|
|
* (because the guest isn't going to use radix) then set it up here. */
|
|
|
|
if ((spapr->patb_entry & PATBE1_GR) && !guest_radix) {
|
|
|
|
/* legacy hash or new hash: */
|
|
|
|
spapr_setup_hpt_and_vrma(spapr);
|
|
|
|
}
|
2014-05-23 10:26:54 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return H_SUCCESS;
|
|
|
|
}
|
|
|
|
|
2011-05-10 14:06:21 +08:00
|
|
|
static spapr_hcall_fn papr_hypercall_table[(MAX_HCALL_OPCODE / 4) + 1];
|
|
|
|
static spapr_hcall_fn kvmppc_hypercall_table[KVMPPC_HCALL_MAX - KVMPPC_HCALL_BASE + 1];
|
2011-04-01 12:15:20 +08:00
|
|
|
|
|
|
|
void spapr_register_hypercall(target_ulong opcode, spapr_hcall_fn fn)
|
|
|
|
{
|
2011-04-01 12:15:23 +08:00
|
|
|
spapr_hcall_fn *slot;
|
|
|
|
|
|
|
|
if (opcode <= MAX_HCALL_OPCODE) {
|
|
|
|
assert((opcode & 0x3) == 0);
|
2011-04-01 12:15:20 +08:00
|
|
|
|
2011-04-01 12:15:23 +08:00
|
|
|
slot = &papr_hypercall_table[opcode / 4];
|
|
|
|
} else {
|
|
|
|
assert((opcode >= KVMPPC_HCALL_BASE) && (opcode <= KVMPPC_HCALL_MAX));
|
2011-04-01 12:15:20 +08:00
|
|
|
|
2011-04-01 12:15:23 +08:00
|
|
|
slot = &kvmppc_hypercall_table[opcode - KVMPPC_HCALL_BASE];
|
|
|
|
}
|
2011-04-01 12:15:20 +08:00
|
|
|
|
2012-10-09 02:17:36 +08:00
|
|
|
assert(!(*slot));
|
2011-04-01 12:15:23 +08:00
|
|
|
*slot = fn;
|
2011-04-01 12:15:20 +08:00
|
|
|
}
|
|
|
|
|
2012-05-03 12:13:14 +08:00
|
|
|
target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
|
2011-04-01 12:15:20 +08:00
|
|
|
target_ulong *args)
|
|
|
|
{
|
2015-07-02 14:23:04 +08:00
|
|
|
sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
|
|
|
|
|
2011-04-01 12:15:20 +08:00
|
|
|
if ((opcode <= MAX_HCALL_OPCODE)
|
|
|
|
&& ((opcode & 0x3) == 0)) {
|
2011-04-01 12:15:23 +08:00
|
|
|
spapr_hcall_fn fn = papr_hypercall_table[opcode / 4];
|
|
|
|
|
|
|
|
if (fn) {
|
2012-05-03 12:23:01 +08:00
|
|
|
return fn(cpu, spapr, opcode, args);
|
2011-04-01 12:15:23 +08:00
|
|
|
}
|
|
|
|
} else if ((opcode >= KVMPPC_HCALL_BASE) &&
|
|
|
|
(opcode <= KVMPPC_HCALL_MAX)) {
|
|
|
|
spapr_hcall_fn fn = kvmppc_hypercall_table[opcode - KVMPPC_HCALL_BASE];
|
2011-04-01 12:15:20 +08:00
|
|
|
|
|
|
|
if (fn) {
|
2012-05-03 12:23:01 +08:00
|
|
|
return fn(cpu, spapr, opcode, args);
|
2011-04-01 12:15:20 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-09-01 09:29:02 +08:00
|
|
|
qemu_log_mask(LOG_UNIMP, "Unimplemented SPAPR hcall 0x" TARGET_FMT_lx "\n",
|
|
|
|
opcode);
|
2011-04-01 12:15:20 +08:00
|
|
|
return H_FUNCTION;
|
|
|
|
}
|
2011-04-01 12:15:22 +08:00
|
|
|
|
2012-02-09 22:20:55 +08:00
|
|
|
static void hypercall_register_types(void)
|
2011-04-01 12:15:22 +08:00
|
|
|
{
|
|
|
|
/* hcall-pft */
|
|
|
|
spapr_register_hypercall(H_ENTER, h_enter);
|
|
|
|
spapr_register_hypercall(H_REMOVE, h_remove);
|
|
|
|
spapr_register_hypercall(H_PROTECT, h_protect);
|
2013-02-18 13:00:32 +08:00
|
|
|
spapr_register_hypercall(H_READ, h_read);
|
2011-04-01 12:15:23 +08:00
|
|
|
|
2011-08-31 23:50:50 +08:00
|
|
|
/* hcall-bulk */
|
|
|
|
spapr_register_hypercall(H_BULK_REMOVE, h_bulk_remove);
|
|
|
|
|
2017-05-12 13:46:11 +08:00
|
|
|
/* hcall-hpt-resize */
|
|
|
|
spapr_register_hypercall(H_RESIZE_HPT_PREPARE, h_resize_hpt_prepare);
|
|
|
|
spapr_register_hypercall(H_RESIZE_HPT_COMMIT, h_resize_hpt_commit);
|
|
|
|
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
/* hcall-splpar */
|
|
|
|
spapr_register_hypercall(H_REGISTER_VPA, h_register_vpa);
|
|
|
|
spapr_register_hypercall(H_CEDE, h_cede);
|
2016-12-05 13:50:21 +08:00
|
|
|
spapr_register_hypercall(H_SIGNAL_SYS_RESET, h_signal_sys_reset);
|
Implement PAPR VPA functions for pSeries shared processor partitions
Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-04-01 12:15:33 +08:00
|
|
|
|
2016-02-11 20:47:18 +08:00
|
|
|
/* processor register resource access h-calls */
|
|
|
|
spapr_register_hypercall(H_SET_SPRG0, h_set_sprg0);
|
2016-02-11 20:47:19 +08:00
|
|
|
spapr_register_hypercall(H_SET_DABR, h_set_dabr);
|
2016-02-11 20:47:20 +08:00
|
|
|
spapr_register_hypercall(H_SET_XDABR, h_set_xdabr);
|
2016-02-18 17:15:54 +08:00
|
|
|
spapr_register_hypercall(H_PAGE_INIT, h_page_init);
|
2016-02-11 20:47:18 +08:00
|
|
|
spapr_register_hypercall(H_SET_MODE, h_set_mode);
|
|
|
|
|
2017-03-20 07:46:45 +08:00
|
|
|
/* In Memory Table MMU h-calls */
|
|
|
|
spapr_register_hypercall(H_CLEAN_SLB, h_clean_slb);
|
|
|
|
spapr_register_hypercall(H_INVALIDATE_PID, h_invalidate_pid);
|
|
|
|
spapr_register_hypercall(H_REGISTER_PROC_TBL, h_register_process_table);
|
|
|
|
|
2011-08-10 22:44:20 +08:00
|
|
|
/* "debugger" hcalls (also used by SLOF). Note: We do -not- differenciate
|
|
|
|
* here between the "CI" and the "CACHE" variants, they will use whatever
|
|
|
|
* mapping attributes qemu is using. When using KVM, the kernel will
|
|
|
|
* enforce the attributes more strongly
|
|
|
|
*/
|
|
|
|
spapr_register_hypercall(H_LOGICAL_CI_LOAD, h_logical_load);
|
|
|
|
spapr_register_hypercall(H_LOGICAL_CI_STORE, h_logical_store);
|
|
|
|
spapr_register_hypercall(H_LOGICAL_CACHE_LOAD, h_logical_load);
|
|
|
|
spapr_register_hypercall(H_LOGICAL_CACHE_STORE, h_logical_store);
|
|
|
|
spapr_register_hypercall(H_LOGICAL_ICBI, h_logical_icbi);
|
|
|
|
spapr_register_hypercall(H_LOGICAL_DCBF, h_logical_dcbf);
|
2012-06-19 04:21:37 +08:00
|
|
|
spapr_register_hypercall(KVMPPC_H_LOGICAL_MEMOP, h_logical_memop);
|
2011-08-10 22:44:20 +08:00
|
|
|
|
2011-04-01 12:15:23 +08:00
|
|
|
/* qemu/KVM-PPC specific hcalls */
|
|
|
|
spapr_register_hypercall(KVMPPC_H_RTAS, h_rtas);
|
2013-08-19 19:04:20 +08:00
|
|
|
|
2014-05-23 10:26:54 +08:00
|
|
|
/* ibm,client-architecture-support support */
|
|
|
|
spapr_register_hypercall(KVMPPC_H_CAS, h_client_architecture_support);
|
2011-04-01 12:15:22 +08:00
|
|
|
}
|
2012-02-09 22:20:55 +08:00
|
|
|
|
|
|
|
type_init(hypercall_register_types)
|