Commit 1fcf7ce0c6 (arm: kvm: implement CPU PM notifier) added
support for CPU power-management, using a cpu_notifier to re-init
KVM on a CPU that entered CPU idle.
The code assumed that a CPU entering idle would actually be powered
off, loosing its state entierely, and would then need to be
reinitialized. It turns out that this is not always the case, and
some HW performs CPU PM without actually killing the core. In this
case, we try to reinitialize KVM while it is still live. It ends up
badly, as reported by Andre Przywara (using a Calxeda Midway):
[ 3.663897] Kernel panic - not syncing: unexpected prefetch abort in Hyp mode at: 0x685760
[ 3.663897] unexpected data abort in Hyp mode at: 0xc067d150
[ 3.663897] unexpected HVC/SVC trap in Hyp mode at: 0xc0901dd0
The trick here is to detect if we've been through a full re-init or
not by looking at HVBAR (VBAR_EL2 on arm64). This involves
implementing the backend for __hyp_get_vectors in the main KVM HYP
code (rather small), and checking the return value against the
default one when the CPU notifier is called on CPU_PM_EXIT.
Reported-by: Andre Przywara <osp@andrep.de>
Tested-by: Andre Przywara <osp@andrep.de>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Rob Herring <rob.herring@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull ARM updates from Russell King:
"This set includes adding support for Neon acceleration of RAID6 XOR
code from Ard Biesheuvel, cache flushing and barrier updates from Will
Deacon, and a cleanup to the ARM debug code which reduces the amount
of code by about 500 lines.
A few other cleanups, such as constifying the machine descriptors
which already shouldn't be written to, cleaning up the printing of the
L2 cache size"
* 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (55 commits)
ARM: 7826/1: debug: support debug ll on hisilicon soc
ARM: 7830/1: delay: don't bother reporting bogomips in /proc/cpuinfo
ARM: 7829/1: Add ".text.unlikely" and ".text.hot" to arm unwind tables
ARM: 7828/1: ARMv7-M: implement restart routine common to all v7-M machines
ARM: 7827/1: highbank: fix debug uart virtual address for LPAE
ARM: 7823/1: errata: workaround Cortex-A15 erratum 773022
ARM: 7806/1: allow DEBUG_UNCOMPRESS for Tegra
ARM: 7793/1: debug: use generic option for ep93xx PL10x debug port
ARM: debug: move SPEAr debug to generic PL01x code
ARM: debug: move davinci debug to generic 8250 code
ARM: debug: move keystone debug to generic 8250 code
ARM: debug: remove DEBUG_ROCKCHIP_UART
ARM: debug: provide generic option choices for 8250 and PL01x ports
ARM: debug: move PL01X debug include into arch/arm/include/debug/
ARM: debug: provide PL01x debug uart phys/virt address configuration options
ARM: debug: add support for word accesses to debug/8250.S
ARM: debug: move 8250 debug include into arch/arm/include/debug/
ARM: debug: provide 8250 debug uart phys/virt address configuration options
ARM: debug: provide 8250 debug uart register shift configuration option
ARM: debug: provide 8250 debug uart flow control configuration option
...
The panic strings are hard to read and on narrow terminals some
characters are simply truncated off the panic message.
Make is slightly prettier with a newline in the Hyp panic strings.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
When flushing the TLB at PL2 in response to remapping at stage-2 or VMID
rollover, we have a dsb instruction to ensure completion of the command
before continuing.
Since we only care about other processors for TLB invalidation, use the
inner-shareable variant of the dsb instruction instead.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Make sure we clear the exclusive monitor on all exception returns,
which otherwise could lead to lock corruptions.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
When performing a Stage-2 TLB invalidation, it is necessary to
make sure the write to the page tables is observable by all CPUs.
For this purpose, add a dsb instruction to __kvm_tlb_flush_vmid_ipa
before doing the TLB invalidation itself.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Not saving PAR is an unfortunate oversight. If the guest performs
an AT* operation and gets scheduled out before reading the result
of the translation from PAR, it could become corrupted by another
guest or the host.
Saving this register is made slightly more complicated as KVM also
uses it on the permission fault handling path, leading to an ugly
"stash and restore" sequence. Fortunately, this is already a slow
path so we don't really care. Also, Linux doesn't do any AT*
operation, so Linux guests are not impacted by this bug.
[ Slightly tweaked to use an even register as first operand to ldrd
and strd operations in interrupts_head.S - Christoffer ]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
v8 is capable of invalidating Stage-2 by IPA, but v7 is not.
Change kvm_tlb_flush_vmid() to take an IPA parameter, which is
then ignored by the invalidation code (and nuke the whole TLB
as it always did).
This allows v8 to implement a more optimized strategy.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
hyp_hvc vector offset is 0x14 and hyp_svc vector offset is 0x8.
Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
Add some the architected timer related infrastructure, and support timer
interrupt injection, which can happen as a resultof three possible
events:
- The virtual timer interrupt has fired while we were still
executing the guest
- The timer interrupt hasn't fired, but it expired while we
were doing the world switch
- A hrtimer we programmed earlier has fired
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Wire the basic framework code for VGIC support and the initial in-kernel
MMIO support code for the VGIC, used for the distributor emulation.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Provides complete world-switch implementation to switch to other guests
running in non-secure modes. Includes Hyp exception handlers that
capture necessary exception information and stores the information on
the VCPU and KVM structures.
The following Hyp-ABI is also documented in the code:
Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
Switching to Hyp mode is done through a simple HVC #0 instruction. The
exception vector code will check that the HVC comes from VMID==0 and if
so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
- r0 contains a pointer to a HYP function
- r1, r2, and r3 contain arguments to the above function.
- The HYP function will be called with its arguments in r0, r1 and r2.
On HYP function return, we return directly to SVC.
A call to a function executing in Hyp mode is performed like the following:
<svc code>
ldr r0, =BSYM(my_hyp_fn)
ldr r1, =my_param
hvc #0 ; Call my_hyp_fn(my_param) from HYP mode
<svc code>
Otherwise, the world-switch is pretty straight-forward. All state that
can be modified by the guest is first backed up on the Hyp stack and the
VCPU values is loaded onto the hardware. State, which is not loaded, but
theoretically modifiable by the guest is protected through the
virtualiation features to generate a trap and cause software emulation.
Upon guest returns, all state is restored from hardware onto the VCPU
struct and the original state is restored from the Hyp-stack onto the
hardware.
SMP support using the VMPIDR calculated on the basis of the host MPIDR
and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
a separate patch into the appropriate patches introducing the
functionality. Note that the VMIDs are stored per VM as required by the ARM
architecture reference manual.
To support VFP/NEON we trap those instructions using the HPCTR. When
we trap, we switch the FPU. After a guest exit, the VFP state is
returned to the host. When disabling access to floating point
instructions, we also mask FPEXC_EN in order to avoid the guest
receiving Undefined instruction exceptions before we have a chance to
switch back the floating point state. We are reusing vfp_hard_struct,
so we depend on VFPv3 being enabled in the host kernel, if not, we still
trap cp10 and cp11 in order to inject an undefined instruction exception
whenever the guest tries to use VFP/NEON. VFP/NEON developed by
Antionios Motakis and Rusty Russell.
Aborts that are permission faults, and not stage-1 page table walk, do
not report the faulting address in the HPFAR. We have to resolve the
IPA, and store it just like the HPFAR register on the VCPU struct. If
the IPA cannot be resolved, it means another CPU is playing with the
page tables, and we simply restart the guest. This quirk was fixed by
Marc Zyngier.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
This commit introduces the framework for guest memory management
through the use of 2nd stage translation. Each VM has a pointer
to a level-1 table (the pgd field in struct kvm_arch) which is
used for the 2nd stage translations. Entries are added when handling
guest faults (later patch) and the table itself can be allocated and
freed through the following functions implemented in
arch/arm/kvm/arm_mmu.c:
- kvm_alloc_stage2_pgd(struct kvm *kvm);
- kvm_free_stage2_pgd(struct kvm *kvm);
Each entry in TLBs and caches are tagged with a VMID identifier in
addition to ASIDs. The VMIDs are assigned consecutively to VMs in the
order that VMs are executed, and caches and tlbs are invalidated when
the VMID space has been used to allow for more than 255 simultaenously
running guests.
The 2nd stage pgd is allocated in kvm_arch_init_vm(). The table is
freed in kvm_arch_destroy_vm(). Both functions are called from the main
KVM code.
We pre-allocate page table memory to be able to synchronize using a
spinlock and be called under rcu_read_lock from the MMU notifiers. We
steal the mmu_memory_cache implementation from x86 and adapt for our
specific usage.
We support MMU notifiers (thanks to Marc Zyngier) through
kvm_unmap_hva and kvm_set_spte_hva.
Finally, define kvm_phys_addr_ioremap() to map a device at a guest IPA,
which is used by VGIC support to map the virtual CPU interface registers
to the guest. This support is added by Marc Zyngier.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Sets up KVM code to handle all exceptions taken to Hyp mode.
When the kernel is booted in Hyp mode, calling an hvc instruction with r0
pointing to the new vectors, the HVBAR is changed to the the vector pointers.
This allows subsystems (like KVM here) to execute code in Hyp-mode with the
MMU disabled.
We initialize other Hyp-mode registers and enables the MMU for Hyp-mode from
the id-mapped hyp initialization code. Afterwards, the HVBAR is changed to
point to KVM Hyp vectors used to catch guest faults and to switch to Hyp mode
to perform a world-switch into a KVM guest.
Also provides memory mapping code to map required code pages, data structures,
and I/O regions accessed in Hyp mode at the same virtual address as the host
kernel virtual addresses, but which conforms to the architectural requirements
for translations in Hyp mode. This interface is added in arch/arm/kvm/arm_mmu.c
and comprises:
- create_hyp_mappings(from, to);
- create_hyp_io_mappings(from, to, phys_addr);
- free_hyp_pmds();
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Targets KVM support for Cortex A-15 processors.
Contains all the framework components, make files, header files, some
tracing functionality, and basic user space API.
Only supported core is Cortex-A15 for now.
Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>