2013-01-21 07:28:06 +08:00
|
|
|
/*
|
|
|
|
* Copyright (C) 2012 - Virtual Open Systems and Columbia University
|
|
|
|
* Author: Christoffer Dall <c.dall@virtualopensystems.com>
|
|
|
|
*
|
|
|
|
* This program is free software; you can redistribute it and/or modify
|
|
|
|
* it under the terms of the GNU General Public License, version 2, as
|
|
|
|
* published by the Free Software Foundation.
|
|
|
|
*
|
|
|
|
* This program is distributed in the hope that it will be useful,
|
|
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
* GNU General Public License for more details.
|
|
|
|
*
|
|
|
|
* You should have received a copy of the GNU General Public License
|
|
|
|
* along with this program; if not, write to the Free Software
|
|
|
|
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef __ARM_KVM_HOST_H__
|
|
|
|
#define __ARM_KVM_HOST_H__
|
|
|
|
|
2014-08-29 20:01:17 +08:00
|
|
|
#include <linux/types.h>
|
|
|
|
#include <linux/kvm_types.h>
|
2013-01-21 07:28:06 +08:00
|
|
|
#include <asm/kvm.h>
|
|
|
|
#include <asm/kvm_asm.h>
|
2013-01-21 07:43:58 +08:00
|
|
|
#include <asm/kvm_mmio.h>
|
KVM: ARM: World-switch implementation
Provides complete world-switch implementation to switch to other guests
running in non-secure modes. Includes Hyp exception handlers that
capture necessary exception information and stores the information on
the VCPU and KVM structures.
The following Hyp-ABI is also documented in the code:
Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
Switching to Hyp mode is done through a simple HVC #0 instruction. The
exception vector code will check that the HVC comes from VMID==0 and if
so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
- r0 contains a pointer to a HYP function
- r1, r2, and r3 contain arguments to the above function.
- The HYP function will be called with its arguments in r0, r1 and r2.
On HYP function return, we return directly to SVC.
A call to a function executing in Hyp mode is performed like the following:
<svc code>
ldr r0, =BSYM(my_hyp_fn)
ldr r1, =my_param
hvc #0 ; Call my_hyp_fn(my_param) from HYP mode
<svc code>
Otherwise, the world-switch is pretty straight-forward. All state that
can be modified by the guest is first backed up on the Hyp stack and the
VCPU values is loaded onto the hardware. State, which is not loaded, but
theoretically modifiable by the guest is protected through the
virtualiation features to generate a trap and cause software emulation.
Upon guest returns, all state is restored from hardware onto the VCPU
struct and the original state is restored from the Hyp-stack onto the
hardware.
SMP support using the VMPIDR calculated on the basis of the host MPIDR
and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
a separate patch into the appropriate patches introducing the
functionality. Note that the VMIDs are stored per VM as required by the ARM
architecture reference manual.
To support VFP/NEON we trap those instructions using the HPCTR. When
we trap, we switch the FPU. After a guest exit, the VFP state is
returned to the host. When disabling access to floating point
instructions, we also mask FPEXC_EN in order to avoid the guest
receiving Undefined instruction exceptions before we have a chance to
switch back the floating point state. We are reusing vfp_hard_struct,
so we depend on VFPv3 being enabled in the host kernel, if not, we still
trap cp10 and cp11 in order to inject an undefined instruction exception
whenever the guest tries to use VFP/NEON. VFP/NEON developed by
Antionios Motakis and Rusty Russell.
Aborts that are permission faults, and not stage-1 page table walk, do
not report the faulting address in the HPFAR. We have to resolve the
IPA, and store it just like the HPFAR register on the VCPU struct. If
the IPA cannot be resolved, it means another CPU is playing with the
page tables, and we simply restart the guest. This quirk was fixed by
Marc Zyngier.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
2013-01-21 07:47:42 +08:00
|
|
|
#include <asm/fpstate.h>
|
ARM: KVM: move GIC/timer code to a common location
As KVM/arm64 is looming on the horizon, it makes sense to move some
of the common code to a single location in order to reduce duplication.
The code could live anywhere. Actually, most of KVM is already built
with a bunch of ugly ../../.. hacks in the various Makefiles, so we're
not exactly talking about style here. But maybe it is time to start
moving into a less ugly direction.
The include files must be in a "public" location, as they are accessed
from non-KVM files (arch/arm/kernel/asm-offsets.c).
For this purpose, introduce two new locations:
- virt/kvm/arm/ : x86 and ia64 already share the ioapic code in
virt/kvm, so this could be seen as a (very ugly) precedent.
- include/kvm/ : there is already an include/xen, and while the
intent is slightly different, this seems as good a location as
any
Eventually, we should probably have independant Makefiles at every
levels (just like everywhere else in the kernel), but this is just
the first step.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-14 21:31:01 +08:00
|
|
|
#include <kvm/arm_arch_timer.h>
|
2013-01-21 07:28:06 +08:00
|
|
|
|
2015-03-04 18:14:34 +08:00
|
|
|
#define __KVM_HAVE_ARCH_INTC_INITIALIZED
|
|
|
|
|
2013-02-16 03:20:07 +08:00
|
|
|
#define KVM_USER_MEM_SLOTS 32
|
2013-01-21 07:28:06 +08:00
|
|
|
#define KVM_PRIVATE_MEM_SLOTS 4
|
|
|
|
#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
|
2013-01-21 07:28:10 +08:00
|
|
|
#define KVM_HAVE_ONE_REG
|
2015-09-18 18:34:53 +08:00
|
|
|
#define KVM_HALT_POLL_NS_DEFAULT 500000
|
2013-01-21 07:28:06 +08:00
|
|
|
|
2014-04-29 13:54:16 +08:00
|
|
|
#define KVM_VCPU_MAX_FEATURES 2
|
2013-01-21 07:28:06 +08:00
|
|
|
|
ARM: KVM: move GIC/timer code to a common location
As KVM/arm64 is looming on the horizon, it makes sense to move some
of the common code to a single location in order to reduce duplication.
The code could live anywhere. Actually, most of KVM is already built
with a bunch of ugly ../../.. hacks in the various Makefiles, so we're
not exactly talking about style here. But maybe it is time to start
moving into a less ugly direction.
The include files must be in a "public" location, as they are accessed
from non-KVM files (arch/arm/kernel/asm-offsets.c).
For this purpose, introduce two new locations:
- virt/kvm/arm/ : x86 and ia64 already share the ioapic code in
virt/kvm, so this could be seen as a (very ugly) precedent.
- include/kvm/ : there is already an include/xen, and while the
intent is slightly different, this seems as good a location as
any
Eventually, we should probably have independant Makefiles at every
levels (just like everywhere else in the kernel), but this is just
the first step.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-14 21:31:01 +08:00
|
|
|
#include <kvm/arm_vgic.h>
|
2013-01-22 08:36:12 +08:00
|
|
|
|
arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS'
This patch removes config option of KVM_ARM_MAX_VCPUS,
and like other ARCHs, just choose the maximum allowed
value from hardware, and follows the reasons:
1) from distribution view, the option has to be
defined as the max allowed value because it need to
meet all kinds of virtulization applications and
need to support most of SoCs;
2) using a bigger value doesn't introduce extra memory
consumption, and the help text in Kconfig isn't accurate
because kvm_vpu structure isn't allocated until request
of creating VCPU is sent from QEMU;
3) the main effect is that the field of vcpus[] in 'struct kvm'
becomes a bit bigger(sizeof(void *) per vcpu) and need more cache
lines to hold the structure, but 'struct kvm' is one generic struct,
and it has worked well on other ARCHs already in this way. Also,
the world switch frequecy is often low, for example, it is ~2000
when running kernel building load in VM from APM xgene KVM host,
so the effect is very small, and the difference can't be observed
in my test at all.
Cc: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-09-02 14:31:21 +08:00
|
|
|
#define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
|
|
|
|
|
2013-01-21 07:28:06 +08:00
|
|
|
u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
|
2014-08-26 22:13:20 +08:00
|
|
|
int __attribute_const__ kvm_target_cpu(void);
|
2013-01-21 07:28:06 +08:00
|
|
|
int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
|
|
|
|
void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
|
|
|
|
|
|
|
|
struct kvm_arch {
|
|
|
|
/* VTTBR value associated with below pgd and vmid */
|
|
|
|
u64 vttbr;
|
|
|
|
|
2013-01-24 02:21:58 +08:00
|
|
|
/* Timer */
|
|
|
|
struct arch_timer_kvm timer;
|
|
|
|
|
2013-01-21 07:28:06 +08:00
|
|
|
/*
|
|
|
|
* Anything that is not used directly from assembly code goes
|
|
|
|
* here.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* The VMID generation used for the virt. memory system */
|
|
|
|
u64 vmid_gen;
|
|
|
|
u32 vmid;
|
|
|
|
|
|
|
|
/* Stage-2 page table */
|
|
|
|
pgd_t *pgd;
|
2013-01-22 08:36:12 +08:00
|
|
|
|
|
|
|
/* Interrupt controller */
|
|
|
|
struct vgic_dist vgic;
|
2014-06-02 22:26:01 +08:00
|
|
|
int max_vcpus;
|
2013-01-21 07:28:06 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
#define KVM_NR_MEM_OBJS 40
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We don't want allocation failures within the mmu code, so we preallocate
|
|
|
|
* enough memory for a single page fault in a cache.
|
|
|
|
*/
|
|
|
|
struct kvm_mmu_memory_cache {
|
|
|
|
int nobjs;
|
|
|
|
void *objects[KVM_NR_MEM_OBJS];
|
|
|
|
};
|
|
|
|
|
2012-09-18 02:27:09 +08:00
|
|
|
struct kvm_vcpu_fault_info {
|
|
|
|
u32 hsr; /* Hyp Syndrome Register */
|
|
|
|
u32 hxfar; /* Hyp Data/Inst. Fault Address Register */
|
|
|
|
u32 hpfar; /* Hyp IPA Fault Address Register */
|
|
|
|
u32 hyp_pc; /* PC when exception was taken from Hyp mode */
|
|
|
|
};
|
|
|
|
|
2016-01-03 19:01:49 +08:00
|
|
|
struct kvm_cpu_context {
|
2016-01-03 19:26:01 +08:00
|
|
|
struct kvm_regs gp_regs;
|
2016-01-03 19:01:49 +08:00
|
|
|
struct vfp_hard_struct vfp;
|
2016-01-03 19:26:01 +08:00
|
|
|
u32 cp15[NR_CP15_REGS];
|
2016-01-03 19:01:49 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
typedef struct kvm_cpu_context kvm_cpu_context_t;
|
2012-10-28 01:23:25 +08:00
|
|
|
|
2013-01-21 07:28:06 +08:00
|
|
|
struct kvm_vcpu_arch {
|
2016-01-03 19:01:49 +08:00
|
|
|
struct kvm_cpu_context ctxt;
|
|
|
|
|
2013-01-21 07:28:06 +08:00
|
|
|
int target; /* Processor target */
|
|
|
|
DECLARE_BITMAP(features, KVM_VCPU_MAX_FEATURES);
|
|
|
|
|
|
|
|
/* The CPU type we expose to the VM */
|
|
|
|
u32 midr;
|
|
|
|
|
2014-01-22 17:43:38 +08:00
|
|
|
/* HYP trapping configuration */
|
|
|
|
u32 hcr;
|
|
|
|
|
|
|
|
/* Interrupt related fields */
|
|
|
|
u32 irq_lines; /* IRQ and FIQ levels */
|
|
|
|
|
2013-01-21 07:28:06 +08:00
|
|
|
/* Exception Information */
|
2012-09-18 02:27:09 +08:00
|
|
|
struct kvm_vcpu_fault_info fault;
|
2013-01-21 07:28:06 +08:00
|
|
|
|
2013-04-08 23:47:19 +08:00
|
|
|
/* Host FP context */
|
|
|
|
kvm_cpu_context_t *host_cpu_context;
|
KVM: ARM: World-switch implementation
Provides complete world-switch implementation to switch to other guests
running in non-secure modes. Includes Hyp exception handlers that
capture necessary exception information and stores the information on
the VCPU and KVM structures.
The following Hyp-ABI is also documented in the code:
Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
Switching to Hyp mode is done through a simple HVC #0 instruction. The
exception vector code will check that the HVC comes from VMID==0 and if
so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
- r0 contains a pointer to a HYP function
- r1, r2, and r3 contain arguments to the above function.
- The HYP function will be called with its arguments in r0, r1 and r2.
On HYP function return, we return directly to SVC.
A call to a function executing in Hyp mode is performed like the following:
<svc code>
ldr r0, =BSYM(my_hyp_fn)
ldr r1, =my_param
hvc #0 ; Call my_hyp_fn(my_param) from HYP mode
<svc code>
Otherwise, the world-switch is pretty straight-forward. All state that
can be modified by the guest is first backed up on the Hyp stack and the
VCPU values is loaded onto the hardware. State, which is not loaded, but
theoretically modifiable by the guest is protected through the
virtualiation features to generate a trap and cause software emulation.
Upon guest returns, all state is restored from hardware onto the VCPU
struct and the original state is restored from the Hyp-stack onto the
hardware.
SMP support using the VMPIDR calculated on the basis of the host MPIDR
and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
a separate patch into the appropriate patches introducing the
functionality. Note that the VMIDs are stored per VM as required by the ARM
architecture reference manual.
To support VFP/NEON we trap those instructions using the HPCTR. When
we trap, we switch the FPU. After a guest exit, the VFP state is
returned to the host. When disabling access to floating point
instructions, we also mask FPEXC_EN in order to avoid the guest
receiving Undefined instruction exceptions before we have a chance to
switch back the floating point state. We are reusing vfp_hard_struct,
so we depend on VFPv3 being enabled in the host kernel, if not, we still
trap cp10 and cp11 in order to inject an undefined instruction exception
whenever the guest tries to use VFP/NEON. VFP/NEON developed by
Antionios Motakis and Rusty Russell.
Aborts that are permission faults, and not stage-1 page table walk, do
not report the faulting address in the HPFAR. We have to resolve the
IPA, and store it just like the HPFAR register on the VCPU struct. If
the IPA cannot be resolved, it means another CPU is playing with the
page tables, and we simply restart the guest. This quirk was fixed by
Marc Zyngier.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
2013-01-21 07:47:42 +08:00
|
|
|
|
2013-01-22 08:36:12 +08:00
|
|
|
/* VGIC state */
|
|
|
|
struct vgic_cpu vgic_cpu;
|
2013-01-24 02:21:58 +08:00
|
|
|
struct arch_timer_cpu timer_cpu;
|
2013-01-22 08:36:12 +08:00
|
|
|
|
KVM: ARM: World-switch implementation
Provides complete world-switch implementation to switch to other guests
running in non-secure modes. Includes Hyp exception handlers that
capture necessary exception information and stores the information on
the VCPU and KVM structures.
The following Hyp-ABI is also documented in the code:
Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
Switching to Hyp mode is done through a simple HVC #0 instruction. The
exception vector code will check that the HVC comes from VMID==0 and if
so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
- r0 contains a pointer to a HYP function
- r1, r2, and r3 contain arguments to the above function.
- The HYP function will be called with its arguments in r0, r1 and r2.
On HYP function return, we return directly to SVC.
A call to a function executing in Hyp mode is performed like the following:
<svc code>
ldr r0, =BSYM(my_hyp_fn)
ldr r1, =my_param
hvc #0 ; Call my_hyp_fn(my_param) from HYP mode
<svc code>
Otherwise, the world-switch is pretty straight-forward. All state that
can be modified by the guest is first backed up on the Hyp stack and the
VCPU values is loaded onto the hardware. State, which is not loaded, but
theoretically modifiable by the guest is protected through the
virtualiation features to generate a trap and cause software emulation.
Upon guest returns, all state is restored from hardware onto the VCPU
struct and the original state is restored from the Hyp-stack onto the
hardware.
SMP support using the VMPIDR calculated on the basis of the host MPIDR
and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
a separate patch into the appropriate patches introducing the
functionality. Note that the VMIDs are stored per VM as required by the ARM
architecture reference manual.
To support VFP/NEON we trap those instructions using the HPCTR. When
we trap, we switch the FPU. After a guest exit, the VFP state is
returned to the host. When disabling access to floating point
instructions, we also mask FPEXC_EN in order to avoid the guest
receiving Undefined instruction exceptions before we have a chance to
switch back the floating point state. We are reusing vfp_hard_struct,
so we depend on VFPv3 being enabled in the host kernel, if not, we still
trap cp10 and cp11 in order to inject an undefined instruction exception
whenever the guest tries to use VFP/NEON. VFP/NEON developed by
Antionios Motakis and Rusty Russell.
Aborts that are permission faults, and not stage-1 page table walk, do
not report the faulting address in the HPFAR. We have to resolve the
IPA, and store it just like the HPFAR register on the VCPU struct. If
the IPA cannot be resolved, it means another CPU is playing with the
page tables, and we simply restart the guest. This quirk was fixed by
Marc Zyngier.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
2013-01-21 07:47:42 +08:00
|
|
|
/*
|
|
|
|
* Anything that is not used directly from assembly code goes
|
|
|
|
* here.
|
|
|
|
*/
|
2013-01-21 07:28:09 +08:00
|
|
|
|
2015-09-26 05:41:14 +08:00
|
|
|
/* vcpu power-off state */
|
|
|
|
bool power_off;
|
2013-01-21 07:28:13 +08:00
|
|
|
|
2015-09-26 05:41:17 +08:00
|
|
|
/* Don't run the guest (internal implementation need) */
|
|
|
|
bool pause;
|
|
|
|
|
2013-01-21 07:43:58 +08:00
|
|
|
/* IO related fields */
|
|
|
|
struct kvm_decode mmio_decode;
|
|
|
|
|
2013-01-21 07:28:06 +08:00
|
|
|
/* Cache some mmu pages needed inside spinlock regions */
|
|
|
|
struct kvm_mmu_memory_cache mmu_page_cache;
|
KVM: ARM: World-switch implementation
Provides complete world-switch implementation to switch to other guests
running in non-secure modes. Includes Hyp exception handlers that
capture necessary exception information and stores the information on
the VCPU and KVM structures.
The following Hyp-ABI is also documented in the code:
Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
Switching to Hyp mode is done through a simple HVC #0 instruction. The
exception vector code will check that the HVC comes from VMID==0 and if
so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
- r0 contains a pointer to a HYP function
- r1, r2, and r3 contain arguments to the above function.
- The HYP function will be called with its arguments in r0, r1 and r2.
On HYP function return, we return directly to SVC.
A call to a function executing in Hyp mode is performed like the following:
<svc code>
ldr r0, =BSYM(my_hyp_fn)
ldr r1, =my_param
hvc #0 ; Call my_hyp_fn(my_param) from HYP mode
<svc code>
Otherwise, the world-switch is pretty straight-forward. All state that
can be modified by the guest is first backed up on the Hyp stack and the
VCPU values is loaded onto the hardware. State, which is not loaded, but
theoretically modifiable by the guest is protected through the
virtualiation features to generate a trap and cause software emulation.
Upon guest returns, all state is restored from hardware onto the VCPU
struct and the original state is restored from the Hyp-stack onto the
hardware.
SMP support using the VMPIDR calculated on the basis of the host MPIDR
and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
a separate patch into the appropriate patches introducing the
functionality. Note that the VMIDs are stored per VM as required by the ARM
architecture reference manual.
To support VFP/NEON we trap those instructions using the HPCTR. When
we trap, we switch the FPU. After a guest exit, the VFP state is
returned to the host. When disabling access to floating point
instructions, we also mask FPEXC_EN in order to avoid the guest
receiving Undefined instruction exceptions before we have a chance to
switch back the floating point state. We are reusing vfp_hard_struct,
so we depend on VFPv3 being enabled in the host kernel, if not, we still
trap cp10 and cp11 in order to inject an undefined instruction exception
whenever the guest tries to use VFP/NEON. VFP/NEON developed by
Antionios Motakis and Rusty Russell.
Aborts that are permission faults, and not stage-1 page table walk, do
not report the faulting address in the HPFAR. We have to resolve the
IPA, and store it just like the HPFAR register on the VCPU struct. If
the IPA cannot be resolved, it means another CPU is playing with the
page tables, and we simply restart the guest. This quirk was fixed by
Marc Zyngier.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
2013-01-21 07:47:42 +08:00
|
|
|
|
|
|
|
/* Detect first run of a vcpu */
|
|
|
|
bool has_run_once;
|
2013-01-21 07:28:06 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
struct kvm_vm_stat {
|
|
|
|
u32 remote_tlb_flush;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct kvm_vcpu_stat {
|
kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-05 01:20:58 +08:00
|
|
|
u32 halt_successful_poll;
|
2015-09-16 00:27:57 +08:00
|
|
|
u32 halt_attempted_poll;
|
2013-01-21 07:28:06 +08:00
|
|
|
u32 halt_wakeup;
|
2015-11-26 18:09:43 +08:00
|
|
|
u32 hvc_exit_stat;
|
|
|
|
u64 wfe_exit_stat;
|
|
|
|
u64 wfi_exit_stat;
|
|
|
|
u64 mmio_exit_user;
|
|
|
|
u64 mmio_exit_kernel;
|
|
|
|
u64 exits;
|
2013-01-21 07:28:06 +08:00
|
|
|
};
|
|
|
|
|
2016-01-03 19:26:01 +08:00
|
|
|
#define vcpu_cp15(v,r) (v)->arch.ctxt.cp15[r]
|
|
|
|
|
2013-09-30 16:50:05 +08:00
|
|
|
int kvm_vcpu_preferred_target(struct kvm_vcpu_init *init);
|
2013-01-21 07:28:06 +08:00
|
|
|
unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
|
|
|
|
int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
|
|
|
|
int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
|
|
|
|
int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
|
2016-01-06 20:10:58 +08:00
|
|
|
unsigned long kvm_call_hyp(void *hypfn, ...);
|
KVM: ARM: World-switch implementation
Provides complete world-switch implementation to switch to other guests
running in non-secure modes. Includes Hyp exception handlers that
capture necessary exception information and stores the information on
the VCPU and KVM structures.
The following Hyp-ABI is also documented in the code:
Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
Switching to Hyp mode is done through a simple HVC #0 instruction. The
exception vector code will check that the HVC comes from VMID==0 and if
so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
- r0 contains a pointer to a HYP function
- r1, r2, and r3 contain arguments to the above function.
- The HYP function will be called with its arguments in r0, r1 and r2.
On HYP function return, we return directly to SVC.
A call to a function executing in Hyp mode is performed like the following:
<svc code>
ldr r0, =BSYM(my_hyp_fn)
ldr r1, =my_param
hvc #0 ; Call my_hyp_fn(my_param) from HYP mode
<svc code>
Otherwise, the world-switch is pretty straight-forward. All state that
can be modified by the guest is first backed up on the Hyp stack and the
VCPU values is loaded onto the hardware. State, which is not loaded, but
theoretically modifiable by the guest is protected through the
virtualiation features to generate a trap and cause software emulation.
Upon guest returns, all state is restored from hardware onto the VCPU
struct and the original state is restored from the Hyp-stack onto the
hardware.
SMP support using the VMPIDR calculated on the basis of the host MPIDR
and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
a separate patch into the appropriate patches introducing the
functionality. Note that the VMIDs are stored per VM as required by the ARM
architecture reference manual.
To support VFP/NEON we trap those instructions using the HPCTR. When
we trap, we switch the FPU. After a guest exit, the VFP state is
returned to the host. When disabling access to floating point
instructions, we also mask FPEXC_EN in order to avoid the guest
receiving Undefined instruction exceptions before we have a chance to
switch back the floating point state. We are reusing vfp_hard_struct,
so we depend on VFPv3 being enabled in the host kernel, if not, we still
trap cp10 and cp11 in order to inject an undefined instruction exception
whenever the guest tries to use VFP/NEON. VFP/NEON developed by
Antionios Motakis and Rusty Russell.
Aborts that are permission faults, and not stage-1 page table walk, do
not report the faulting address in the HPFAR. We have to resolve the
IPA, and store it just like the HPFAR register on the VCPU struct. If
the IPA cannot be resolved, it means another CPU is playing with the
page tables, and we simply restart the guest. This quirk was fixed by
Marc Zyngier.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
2013-01-21 07:47:42 +08:00
|
|
|
void force_vm_exit(const cpumask_t *mask);
|
2013-01-21 07:28:07 +08:00
|
|
|
|
|
|
|
#define KVM_ARCH_WANT_MMU_NOTIFIER
|
|
|
|
int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
|
|
|
|
int kvm_unmap_hva_range(struct kvm *kvm,
|
|
|
|
unsigned long start, unsigned long end);
|
|
|
|
void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
|
|
|
|
|
2013-01-21 07:28:10 +08:00
|
|
|
unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
|
|
|
|
int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
|
2015-03-13 02:16:51 +08:00
|
|
|
int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
|
|
|
|
int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
|
2013-01-21 07:28:10 +08:00
|
|
|
|
2013-01-21 07:28:07 +08:00
|
|
|
/* We do not have shadow page tables, hence the empty hooks */
|
2014-09-24 15:57:57 +08:00
|
|
|
static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
|
|
|
|
unsigned long address)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2013-01-22 08:36:11 +08:00
|
|
|
struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
|
|
|
|
struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
|
|
|
|
|
|
|
|
int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
|
|
|
|
unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu);
|
|
|
|
int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
|
|
|
|
int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
|
|
|
|
|
2012-10-05 18:11:11 +08:00
|
|
|
int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
|
|
|
|
int exception_index);
|
|
|
|
|
2013-05-14 19:11:37 +08:00
|
|
|
static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
|
|
|
|
phys_addr_t pgd_ptr,
|
2012-10-05 22:10:44 +08:00
|
|
|
unsigned long hyp_stack_ptr,
|
|
|
|
unsigned long vector_ptr)
|
|
|
|
{
|
|
|
|
/*
|
ARM: KVM: switch to a dual-step HYP init code
Our HYP init code suffers from two major design issues:
- it cannot support CPU hotplug, as we tear down the idmap very early
- it cannot perform a TLB invalidation when switching from init to
runtime mappings, as pages are manipulated from PL1 exclusively
The hotplug problem mandates that we keep two sets of page tables
(boot and runtime). The TLB problem mandates that we're able to
transition from one PGD to another while in HYP, invalidating the TLBs
in the process.
To be able to do this, we need to share a page between the two page
tables. A page that will have the same VA in both configurations. All we
need is a VA that has the following properties:
- This VA can't be used to represent a kernel mapping.
- This VA will not conflict with the physical address of the kernel text
The vectors page seems to satisfy this requirement:
- The kernel never maps anything else there
- The kernel text being copied at the beginning of the physical memory,
it is unlikely to use the last 64kB (I doubt we'll ever support KVM
on a system with something like 4MB of RAM, but patches are very
welcome).
Let's call this VA the trampoline VA.
Now, we map our init page at 3 locations:
- idmap in the boot pgd
- trampoline VA in the boot pgd
- trampoline VA in the runtime pgd
The init scenario is now the following:
- We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
runtime stack, runtime vectors
- Enable the MMU with the boot pgd
- Jump to a target into the trampoline page (remember, this is the same
physical page!)
- Now switch to the runtime pgd (same VA, and still the same physical
page!)
- Invalidate TLBs
- Set stack and vectors
- Profit! (or eret, if you only care about the code).
Note that we keep the boot mapping permanently (it is not strictly an
idmap anymore) to allow for CPU hotplug in later patches.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-13 02:12:06 +08:00
|
|
|
* Call initialization code, and switch to the full blown HYP
|
|
|
|
* code. The init code doesn't need to preserve these
|
|
|
|
* registers as r0-r3 are already callee saved according to
|
|
|
|
* the AAPCS.
|
|
|
|
* Note that we slightly misuse the prototype by casing the
|
|
|
|
* stack pointer to a void *.
|
|
|
|
*
|
|
|
|
* We don't have enough registers to perform the full init in
|
|
|
|
* one go. Install the boot PGD first, and then install the
|
|
|
|
* runtime PGD, stack pointer and vectors. The PGDs are always
|
|
|
|
* passed as the third argument, in order to be passed into
|
|
|
|
* r2-r3 to the init code (yes, this is compliant with the
|
|
|
|
* PCS!).
|
2012-10-05 22:10:44 +08:00
|
|
|
*/
|
ARM: KVM: switch to a dual-step HYP init code
Our HYP init code suffers from two major design issues:
- it cannot support CPU hotplug, as we tear down the idmap very early
- it cannot perform a TLB invalidation when switching from init to
runtime mappings, as pages are manipulated from PL1 exclusively
The hotplug problem mandates that we keep two sets of page tables
(boot and runtime). The TLB problem mandates that we're able to
transition from one PGD to another while in HYP, invalidating the TLBs
in the process.
To be able to do this, we need to share a page between the two page
tables. A page that will have the same VA in both configurations. All we
need is a VA that has the following properties:
- This VA can't be used to represent a kernel mapping.
- This VA will not conflict with the physical address of the kernel text
The vectors page seems to satisfy this requirement:
- The kernel never maps anything else there
- The kernel text being copied at the beginning of the physical memory,
it is unlikely to use the last 64kB (I doubt we'll ever support KVM
on a system with something like 4MB of RAM, but patches are very
welcome).
Let's call this VA the trampoline VA.
Now, we map our init page at 3 locations:
- idmap in the boot pgd
- trampoline VA in the boot pgd
- trampoline VA in the runtime pgd
The init scenario is now the following:
- We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd,
runtime stack, runtime vectors
- Enable the MMU with the boot pgd
- Jump to a target into the trampoline page (remember, this is the same
physical page!)
- Now switch to the runtime pgd (same VA, and still the same physical
page!)
- Invalidate TLBs
- Set stack and vectors
- Profit! (or eret, if you only care about the code).
Note that we keep the boot mapping permanently (it is not strictly an
idmap anymore) to allow for CPU hotplug in later patches.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-13 02:12:06 +08:00
|
|
|
|
|
|
|
kvm_call_hyp(NULL, 0, boot_pgd_ptr);
|
|
|
|
|
|
|
|
kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
|
2012-10-05 22:10:44 +08:00
|
|
|
}
|
|
|
|
|
2016-02-02 01:54:35 +08:00
|
|
|
static inline void __cpu_init_stage2(void)
|
|
|
|
{
|
2016-02-02 03:56:31 +08:00
|
|
|
kvm_call_hyp(__init_stage2_translation);
|
2016-02-02 01:54:35 +08:00
|
|
|
}
|
|
|
|
|
2013-04-08 23:47:18 +08:00
|
|
|
static inline int kvm_arch_dev_ioctl_check_extension(long ext)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2013-03-05 11:18:00 +08:00
|
|
|
int kvm_perf_init(void);
|
|
|
|
int kvm_perf_teardown(void);
|
|
|
|
|
2015-01-16 07:58:56 +08:00
|
|
|
void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
|
|
|
|
|
2014-06-02 21:37:13 +08:00
|
|
|
struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
|
|
|
|
|
2014-08-28 21:13:03 +08:00
|
|
|
static inline void kvm_arch_hardware_disable(void) {}
|
2014-08-28 21:13:02 +08:00
|
|
|
static inline void kvm_arch_hardware_unsetup(void) {}
|
|
|
|
static inline void kvm_arch_sync_events(struct kvm *kvm) {}
|
|
|
|
static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
|
|
|
|
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
|
|
|
|
|
2015-07-08 00:29:56 +08:00
|
|
|
static inline void kvm_arm_init_debug(void) {}
|
|
|
|
static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
|
|
|
|
static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
|
2015-07-08 00:30:00 +08:00
|
|
|
static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
|
2015-07-08 00:29:56 +08:00
|
|
|
|
2013-01-21 07:28:06 +08:00
|
|
|
#endif /* __ARM_KVM_HOST_H__ */
|