Commit Graph

127 Commits

Author SHA1 Message Date
Miaohe Lin 7e0befd521 KVM: arm/arm64: Get rid of unused arg in cpu_init_hyp_mode()
As arg dummy is not really needed, there's no need to pass
NULL when calling cpu_init_hyp_mode(). So clean it up.

Fixes: 67f6919766 ("arm64: kvm: allows kvm cpu hotplug")
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1574320559-5662-1-git-send-email-linmiaohe@huawei.com
2019-12-06 11:41:18 +00:00
Marc Zyngier cd7056ae34 Merge remote-tracking branch 'kvmarm/misc-5.5' into kvmarm/next 2019-11-08 11:27:29 +00:00
Marc Zyngier ef2e78ddad KVM: arm64: Opportunistically turn off WFI trapping when using direct LPI injection
Just like we do for WFE trapping, it can be useful to turn off
WFI trapping when the physical CPU is not oversubscribed (that
is, the vcpu is the only runnable process on this CPU) *and*
that we're using direct injection of interrupts.

The conditions are reevaluated on each vcpu_load(), ensuring that
we don't switch to this mode on a busy system.

On a GICv4 system, this has the effect of reducing the generation
of doorbell interrupts to zero when the right conditions are
met, which is a huge improvement over the current situation
(where the doorbells are screaming if the CPU ever hits a
blocking WFI).

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Link: https://lore.kernel.org/r/20191107160412.30301-3-maz@kernel.org
2019-11-08 11:14:36 +00:00
Marc Zyngier 8e01d9a396 KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put
When the VHE code was reworked, a lot of the vgic stuff was moved around,
but the GICv4 residency code did stay untouched, meaning that we come
in and out of residency on each flush/sync, which is obviously suboptimal.

To address this, let's move things around a bit:

- Residency entry (flush) moves to vcpu_load
- Residency exit (sync) moves to vcpu_put
- On blocking (entry to WFI), we "put"
- On unblocking (exit from WFI), we "load"

Because these can nest (load/block/put/load/unblock/put, for example),
we now have per-VPE tracking of the residency state.

Additionally, vgic_v4_put gains a "need doorbell" parameter, which only
gets set to true when blocking because of a WFI. This allows a finer
control of the doorbell, which now also gets disabled as soon as
it gets signaled.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20191027144234.8395-2-maz@kernel.org
2019-10-28 16:20:58 +00:00
Marc Zyngier a4b28f5c67 Merge remote-tracking branch 'kvmarm/kvm-arm64/stolen-time' into kvmarm-master/next 2019-10-24 15:04:09 +01:00
Steven Price 8564d6372a KVM: arm64: Support stolen time reporting via shared structure
Implement the service call for configuring a shared structure between a
VCPU and the hypervisor in which the hypervisor can write the time
stolen from the VCPU's execution time by other tasks on the host.

User space allocates memory which is placed at an IPA also chosen by user
space. The hypervisor then updates the shared structure using
kvm_put_guest() to ensure single copy atomicity of the 64-bit value
reporting the stolen time in nanoseconds.

Whenever stolen time is enabled by the guest, the stolen time counter is
reset.

The stolen time itself is retrieved from the sched_info structure
maintained by the Linux scheduler code. We enable SCHEDSTATS when
selecting KVM Kconfig to ensure this value is meaningful.

Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21 19:20:28 +01:00
Christoffer Dall da345174ce KVM: arm/arm64: Allow user injection of external data aborts
In some scenarios, such as buggy guest or incorrect configuration of the
VMM and firmware description data, userspace will detect a memory access
to a portion of the IPA, which is not mapped to any MMIO region.

For this purpose, the appropriate action is to inject an external abort
to the guest.  The kernel already has functionality to inject an
external abort, but we need to wire up a signal from user space that
lets user space tell the kernel to do this.

It turns out, we already have the set event functionality which we can
perfectly reuse for this.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21 18:59:51 +01:00
Christoffer Dall c726200dd1 KVM: arm/arm64: Allow reporting non-ISV data aborts to userspace
For a long time, if a guest accessed memory outside of a memslot using
any of the load/store instructions in the architecture which doesn't
supply decoding information in the ESR_EL2 (the ISV bit is not set), the
kernel would print the following message and terminate the VM as a
result of returning -ENOSYS to userspace:

  load/store instruction decoding not implemented

The reason behind this message is that KVM assumes that all accesses
outside a memslot is an MMIO access which should be handled by
userspace, and we originally expected to eventually implement some sort
of decoding of load/store instructions where the ISV bit was not set.

However, it turns out that many of the instructions which don't provide
decoding information on abort are not safe to use for MMIO accesses, and
the remaining few that would potentially make sense to use on MMIO
accesses, such as those with register writeback, are not used in
practice.  It also turns out that fetching an instruction from guest
memory can be a pretty horrible affair, involving stopping all CPUs on
SMP systems, handling multiple corner cases of address translation in
software, and more.  It doesn't appear likely that we'll ever implement
this in the kernel.

What is much more common is that a user has misconfigured his/her guest
and is actually not accessing an MMIO region, but just hitting some
random hole in the IPA space.  In this scenario, the error message above
is almost misleading and has led to a great deal of confusion over the
years.

It is, nevertheless, ABI to userspace, and we therefore need to
introduce a new capability that userspace explicitly enables to change
behavior.

This patch introduces KVM_CAP_ARM_NISV_TO_USER (NISV meaning Non-ISV)
which does exactly that, and introduces a new exit reason to report the
event to userspace.  User space can then emulate an exception to the
guest, restart the guest, suspend the guest, or take any other
appropriate action as per the policy of the running system.

Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21 18:59:44 +01:00
Marc Zyngier 92f35b751c KVM: arm/arm64: vgic: Allow more than 256 vcpus for KVM_IRQ_LINE
While parts of the VGIC support a large number of vcpus (we
bravely allow up to 512), other parts are more limited.

One of these limits is visible in the KVM_IRQ_LINE ioctl, which
only allows 256 vcpus to be signalled when using the CPU or PPI
types. Unfortunately, we've cornered ourselves badly by allocating
all the bits in the irq field.

Since the irq_type subfield (8 bit wide) is currently only taking
the values 0, 1 and 2 (and we have been careful not to allow anything
else), let's reduce this field to only 4 bits, and allocate the
remaining 4 bits to a vcpu2_index, which acts as a multiplier:

  vcpu_id = 256 * vcpu2_index + vcpu_index

With that, and a new capability (KVM_CAP_ARM_IRQ_LINE_LAYOUT_2)
allowing this to be discovered, it becomes possible to inject
PPIs to up to 4096 vcpus. But please just don't.

Whilst we're there, add a clarification about the use of KVM_IRQ_LINE
on arm, which is not completely conditionned by KVM_CAP_IRQCHIP.

Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-09-09 12:29:09 +01:00
Paolo Bonzini a738b5e75b KVM/arm fixes for 5.3, take #2
- Fix our system register reset so that we stop writing
   non-sensical values to them, and track which registers
   get reset instead.
 - Sync VMCR back from the GIC on WFI so that KVM has an
   exact vue of PMR.
 - Reevaluate state of HW-mapped, level triggered interrupts
   on enable.
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl1NHNIPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDITgP/irkNHkSxRPBswncMb9oYguoGOUIStxSApfD
 E/9wH5khyXBvThqPmaY65kJok9YZwd+PbqDvp6umCkWGUeIbWI4bzM2bD974La8N
 qaogJxOQubqg9r/TYf5f3EI8V67CS+E/ixK6zrz2XvqvFEmQ3OMhrRFjoFkEK1wA
 yQwV15h/4IbebI2D1pkVbi8S/s54wAc3dr20cS3KfqrLYrvEIQrCm7lIOVwKk8fo
 wjk8D+2IuF9LBsClQUOZwHG26yxcbd7FNwm4qEneR3UmtT14AgHnPJO3uQUTxM+o
 bRajO3tXx4hIe1PAG7l4RxW47+AfjhAD1kzWtmM+xN5NdIfoew/pGMf3HXLhYhYO
 9dM5YOPrF5CmZGNbSnrNPwD5mejLngNLtWBUvUKMxjtxILhADxGZIOgfM08pr+XO
 9I1gzhPpczc3IIDEmDspkQnylQBPXyu2uOORPllHmmuZwMeODQ2pyyw9leBXq7lx
 5tUybLPR2N1uxAIfkN6gEJKq0YgyPcapcVBr9vgyEFA6YKKyGZqthDMTIvm1TlGI
 3ha2lUdRjYnYVUZ+WY8RqZVhJ3mB5qgCW4GUUeDx1nm5Ttskx1ic0WGIiErD21vs
 Eo//2N9ROp+E1iJC8dCdVZjKy0ZbOJwdW9t9Y89Coo6PF6SqiXBlxrqR+GIQdKlo
 P1hdjjRI
 =pZYJ
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-for-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm fixes for 5.3, take #2

- Fix our system register reset so that we stop writing
  non-sensical values to them, and track which registers
  get reset instead.
- Sync VMCR back from the GIC on WFI so that KVM has an
  exact vue of PMR.
- Reevaluate state of HW-mapped, level triggered interrupts
  on enable.
2019-08-09 16:53:50 +02:00
Paolo Bonzini 0e1c438c44 KVM/arm fixes for 5.3
- A bunch of switch/case fall-through annotation, fixing one actual bug
 - Fix PMU reset bug
 - Add missing exception class debug strings
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl1Bzw8PHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDlXYP/ixqJzqpJetTrvpiUpmLjhp4YwjjOxqyeQvo
 bWy/EFz8bSWbTZlwAAstFDVmtGenuwaiOakChvV8GH6USYqRsYdvc/sJu0evQplJ
 JQtOzGhyv1NuM0s9wYBcstAH+YAW+gBK5YFnowreheuidK/1lo3C/EnR2DxCtNal
 gpV3qQt8qfw3ysGlpC/fDjjOYw4lDkFa6CSx9uk3/587fPBqHANRY/i87nJxmhhX
 lGeCJcOrY3cy1HhbedFwxVt4Q/ZbHf0UhTfgwvsBYw7BaWmB1ymoEOoktQcUWoKb
 LL0rBe+OxNQgRnJpn3fMEHiCAmXaI9qE4dohFOl1J3dQvCElcV/jWjkXDD1+KgzW
 S2XZGB6yxet93Fh1x6xv4i6ATJvmZeTIDUXi9KkjcDiycB9YMCDYY2ejTbQv5VUP
 V0DghGGDd3d8sY7dEjxwBakuJ6nqKixSouQaNsWuBTm7tVpEVS8yW+hqWs/IVI5b
 48SDbxaNpKvx7sAyhuWAjCFbZeIm0hd//JN3JoxazF9i9PKuqnZLbNv/ME6hmzj+
 LrETwaAbjsw5Au+ST+OdT2UiauiBm9C6Kg62qagHrKJviuK941+3hjH8aj/e0pYk
 a0DQxumiyofXPQ0pVe8ZfqlPptONz+EKyAsrOm8AjLJ+bBdRUNHLcZKYj7em7YiE
 pANc8/T+
 =kcDj
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm fixes for 5.3

- A bunch of switch/case fall-through annotation, fixing one actual bug
- Fix PMU reset bug
- Add missing exception class debug strings
2019-08-09 16:53:39 +02:00
Marc Zyngier 5eeaf10eec KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block
Since commit commit 328e566479 ("KVM: arm/arm64: vgic: Defer
touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or
its GICv2 equivalent) loaded as long as we can, only syncing it
back when we're scheduled out.

There is a small snag with that though: kvm_vgic_vcpu_pending_irq(),
which is indirectly called from kvm_vcpu_check_block(), needs to
evaluate the guest's view of ICC_PMR_EL1. At the point were we
call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever
changes to PMR is not visible in memory until we do a vcpu_put().

Things go really south if the guest does the following:

	mov x0, #0	// or any small value masking interrupts
	msr ICC_PMR_EL1, x0

	[vcpu preempted, then rescheduled, VMCR sampled]

	mov x0, #ff	// allow all interrupts
	msr ICC_PMR_EL1, x0
	wfi		// traps to EL2, so samping of VMCR

	[interrupt arrives just after WFI]

Here, the hypervisor's view of PMR is zero, while the guest has enabled
its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no
interrupts are pending (despite an interrupt being received) and we'll
block for no reason. If the guest doesn't have a periodic interrupt
firing once it has blocked, it will stay there forever.

To avoid this unfortuante situation, let's resync VMCR from
kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block()
will observe the latest value of PMR.

This has been found by booting an arm64 Linux guest with the pseudo NMI
feature, and thus using interrupt priorities to mask interrupts instead
of the usual PSTATE masking.

Cc: stable@vger.kernel.org # 4.12
Fixes: 328e566479 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put")
Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-05 15:36:46 +01:00
Paolo Bonzini 741cbbae07 KVM: remove kvm_arch_has_vcpu_debugfs()
There is no need for this function as all arches have to implement
kvm_arch_create_vcpu_debugfs() no matter what.  A #define symbol
let us actually simplify the code.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-05 12:55:48 +02:00
Christoph Hellwig 2f5947dfca Documentation: move Documentation/virtual to Documentation/virt
Renaming docs seems to be en vogue at the moment, so fix on of the
grossly misnamed directories.  We usually never use "virtual" as
a shortcut for virtualization in the kernel, but always virt,
as seen in the virt/ top-level directory.  Fix up the documentation
to match that.

Fixes: ed16648eb5 ("Move kvm, uml, and lguest subdirectories under a common "virtual" directory, I.E:")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-24 10:52:11 +02:00
Zenghui Yu bca031e2c8 KVM: arm/arm64: Introduce kvm_pmu_vcpu_init() to setup PMU counter index
We use "pmc->idx" and the "chained" bitmap to determine if the pmc is
chained, in kvm_pmu_pmc_is_chained().  But idx might be uninitialized
(and random) when we doing this decision, through a KVM_ARM_VCPU_INIT
ioctl -> kvm_pmu_vcpu_reset(). And the test_bit() against this random
idx will potentially hit a KASAN BUG [1].

In general, idx is the static property of a PMU counter that is not
expected to be modified across resets, as suggested by Julien.  It
looks more reasonable if we can setup the PMU counter idx for a vcpu
in its creation time. Introduce a new function - kvm_pmu_vcpu_init()
for this basic setup. Oh, and the KASAN BUG will get fixed this way.

[1] https://www.spinics.net/lists/kvm-arm/msg36700.html

Fixes: 80f393a23b ("KVM: arm/arm64: Support chained PMU counters")
Suggested-by: Andrew Murray <andrew.murray@arm.com>
Suggested-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-07-23 09:23:44 +01:00
Paolo Bonzini a45ff5994c KVM/arm updates for 5.3
- Add support for chained PMU counters in guests
 - Improve SError handling
 - Handle Neoverse N1 erratum #1349291
 - Allow side-channel mitigation status to be migrated
 - Standardise most AArch64 system register accesses to msr_s/mrs_s
 - Fix host MPIDR corruption on 32bit
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl0kge4VHG1hcmMuenlu
 Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDYyQP/3XY5tFcLKkp/h9rnGaCXwAxhNzn
 TyF/IZEFBKFTSoDMXKLLc8KllvoPQ7aUl03heYbuayYpyKR1+LCx7lDwu1MYyEf+
 aSSuOKlbG//tLUEGp09pTRCgjs2mhhZYqOj5GF2mZ7xpovFVSNOPzTazbXDNQ7tw
 zUAs43YNg+bUMwj+SLWpBlizjrLr7T34utIr6daKJE/GSfmIrcYXhGbZqUh0zbO0
 z5LNasebws8/pHyeGI7+/yoMIKaQ8foMgywTpsRpBsx6YI+AbOLjEmCk2IBOPcEK
 pm9KkSIBZEO2CSxZKl3NQiEow/Qd/lnz2xLMCSfh4XrYoI2Th4gNcsbJpiBDWP5a
 0eZ5jSiexxKngIbM+to7jR3m0yc9RgcuzceJg3Uly7Ya0vb5RqKwOX4Ge4XP4VDT
 DzIVFdQjxDKdVIf3EvGp1cj4P7dRUU3xbZcbzyuRPEmT3vgjEnbxawmPLs3QMAl1
 31Wd2wIsPB86kSxzSMel27Vs5VgMhgyHE26zN91R745CvhDXaDKydIWjGjdVMHsB
 GuX/h2kL+ohx+N/OpZPgwsVUAGLSOQFP3pE/EcGtqc2kkfqa+bx12DKcZ3zdmJvy
 +cu5ixU8q5thPH/pZob/C3hKUY/eLy02emS34RK0Jh2sZHbQgAOtMsiqUxNHEjUm
 6TkpdWa5SRd7CtGV
 =yfCs
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm updates for 5.3

- Add support for chained PMU counters in guests
- Improve SError handling
- Handle Neoverse N1 erratum #1349291
- Allow side-channel mitigation status to be migrated
- Standardise most AArch64 system register accesses to msr_s/mrs_s
- Fix host MPIDR corruption on 32bit
2019-07-11 15:14:16 +02:00
Marc Zyngier 1e0cf16cda KVM: arm/arm64: Initialise host's MPIDRs by reading the actual register
As part of setting up the host context, we populate its
MPIDR by using cpu_logical_map(). It turns out that contrary
to arm64, cpu_logical_map() on 32bit ARM doesn't return the
*full* MPIDR, but a truncated version.

This leaves the host MPIDR slightly corrupted after the first
run of a VM, since we won't correctly restore the MPIDR on
exit. Oops.

Since we cannot trust cpu_logical_map(), let's adopt a different
strategy. We move the initialization of the host CPU context as
part of the per-CPU initialization (which, in retrospect, makes
a lot of sense), and directly read the MPIDR from the HW. This
is guaranteed to work on both arm and arm64.

Reported-by: Andre Przywara <Andre.Przywara@arm.com>
Tested-by: Andre Przywara <Andre.Przywara@arm.com>
Fixes: 32f1395519 ("arm/arm64: KVM: Statically configure the host's view of MPIDR")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-07-08 16:29:48 +01:00
Thomas Gleixner d94d71cb45 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 266
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation this program is
  distributed in the hope that it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details you should have received a copy of the gnu general
  public license along with this program if not write to the free
  software foundation 51 franklin street fifth floor boston ma 02110
  1301 usa

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 67 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141333.953658117@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05 17:30:28 +02:00
Sean Christopherson f257d6dcda KVM: Directly return result from kvm_arch_check_processor_compat()
Add a wrapper to invoke kvm_arch_check_processor_compat() so that the
boilerplate ugliness of checking virtualization support on all CPUs is
hidden from the arch specific code.  x86's implementation in particular
is quite heinous, as it unnecessarily propagates the out-param pattern
into kvm_x86_ops.

While the x86 specific issue could be resolved solely by changing
kvm_x86_ops, make the change for all architectures as returning a value
directly is prettier and technically more robust, e.g. s390 doesn't set
the out param, which could lead to subtle breakage in the (highly
unlikely) scenario where the out-param was not pre-initialized by the
caller.

Opportunistically annotate svm_check_processor_compat() with __init.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-04 19:27:32 +02:00
Thomas Huth a86cb413f4 KVM: s390: Do not report unusabled IDs via KVM_CAP_MAX_VCPU_ID
KVM_CAP_MAX_VCPU_ID is currently always reporting KVM_MAX_VCPU_ID on all
architectures. However, on s390x, the amount of usable CPUs is determined
during runtime - it is depending on the features of the machine the code
is running on. Since we are using the vcpu_id as an index into the SCA
structures that are defined by the hardware (see e.g. the sca_add_vcpu()
function), it is not only the amount of CPUs that is limited by the hard-
ware, but also the range of IDs that we can use.
Thus KVM_CAP_MAX_VCPU_ID must be determined during runtime on s390x, too.
So the handling of KVM_CAP_MAX_VCPU_ID has to be moved from the common
code into the architecture specific code, and on s390x we have to return
the same value here as for KVM_CAP_MAX_VCPUS.
This problem has been discovered with the kvm_create_max_vcpus selftest.
With this change applied, the selftest now passes on s390x, too.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20190523164309.13345-9-thuth@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2019-05-28 15:52:19 +02:00
Linus Torvalds 0ef0fd3515 * ARM: support for SVE and Pointer Authentication in guests, PMU improvements
* POWER: support for direct access to the POWER9 XIVE interrupt controller,
 memory and performance optimizations.
 
 * x86: support for accessing memory not backed by struct page, fixes and refactoring
 
 * Generic: dirty page tracking improvements
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJc3qV/AAoJEL/70l94x66Dn3QH/jX1Bn0P/RZAIt4w0SySklSg
 PqxUKDyBQqB9vN9Qeb9jWXAKPH2CtM3+up/rz7oRnBWp7qA6vXcC/R/QJYAvzdXE
 nklsR/oYCsflR1KdlVYuDvvPCPP2fLBU5zfN83OsaBQ8fNRkm3gN+N5XQ2SbXbLy
 Mo9tybS4otY201UAC96e8N0ipwwyCRpDneQpLcl+F5nH3RBt63cVbs04O+70MXn7
 eT4I+8K3+Go7LATzT8hglD21D/7uvE31qQb6yr5L33IfhU4GB51RZzBXTNaAdY8n
 hT1rMrRkAMAFWYZPQDfoMadjWU3i5DIfstKjDxOr9oTfuOEp5Z+GvJwvVnUDg1I=
 =D0+p
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:
   - support for SVE and Pointer Authentication in guests
   - PMU improvements

  POWER:
   - support for direct access to the POWER9 XIVE interrupt controller
   - memory and performance optimizations

  x86:
   - support for accessing memory not backed by struct page
   - fixes and refactoring

  Generic:
   - dirty page tracking improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (155 commits)
  kvm: fix compilation on aarch64
  Revert "KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU"
  kvm: x86: Fix L1TF mitigation for shadow MMU
  KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible
  KVM: PPC: Book3S: Remove useless checks in 'release' method of KVM device
  KVM: PPC: Book3S HV: XIVE: Fix spelling mistake "acessing" -> "accessing"
  KVM: PPC: Book3S HV: Make sure to load LPID for radix VCPUs
  kvm: nVMX: Set nested_run_pending in vmx_set_nested_state after checks complete
  tests: kvm: Add tests for KVM_SET_NESTED_STATE
  KVM: nVMX: KVM_SET_NESTED_STATE - Tear down old EVMCS state before setting new state
  tests: kvm: Add tests for KVM_CAP_MAX_VCPUS and KVM_CAP_MAX_CPU_ID
  tests: kvm: Add tests to .gitignore
  KVM: Introduce KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
  KVM: Fix kvm_clear_dirty_log_protect off-by-(minus-)one
  KVM: Fix the bitmap range to copy during clear dirty
  KVM: arm64: Fix ptrauth ID register masking logic
  KVM: x86: use direct accessors for RIP and RSP
  KVM: VMX: Use accessors for GPRs outside of dedicated caching logic
  KVM: x86: Omit caching logic for always-available GPRs
  kvm, x86: Properly check whether a pfn is an MMIO or not
  ...
2019-05-17 10:33:30 -07:00
Paolo Bonzini dd53f6102c KVM/arm updates for 5.2
- guest SVE support
 - guest Pointer Authentication support
 - Better discrimination of perf counters between host and guests
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAlzMM9kVHG1hcmMuenlu
 Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDEp8P/iqZvvZlLdlnWQwluWh237c28kAo
 zELO0L7Wl+OJ66v2hzM+NPBi5kv/9pSv7AoKNLv3398YmKFt0n7yUB+MHi0BC9xi
 ZEp4etCOiVcqcWWeDiAXLdR9OQlb7IDBDc56s4V9HQgK3sEb4u8aEJIy/nDBVniv
 GVLMh1EOsrviIYso6UVxI1X7lPQevpCS0kv9/llhhzEj8QDxnQThjDuW3wrAyhQi
 F9XNVjAMW8rft7vvok9cxT4v+TR1HgUajquoSrjXuonWHgKnC9tSH/dHILNK8Zij
 5OApojGlZQrXIa5Sk3JOhGahVVY9Y+ewsw58J5bJxd0/xrKXnWk/Lann7NE+UcBf
 RJMHfanIO/+JJRzHhagejK7pqnYXD1PWBwF8z3Hefs1IVw4eBvPBGuhIULJ6+eSP
 +3JCwiOiwshG43gZlGmHcgvhPdeX4r/BlopWV9+0X/gAjcU1+3+ZG6J3jeAcC1Kx
 i481dSzlZ7Ar7VWDCk7WgcmDvUwHXtxq0HbqzQjPBO04kkakjdPZZrZIX3+Qhlem
 GpkPVb2z5h5KTk9Fx03ZXxPVdiOQh1UmNC8jlsYZPWcJVTLkySs7HWXZJe+WTs4Z
 NLuen/eA4/NCon+UA6XdIG5Ddn/J39UuF1lCApHPHn576rwz+HmqpcN59XiU6y4h
 XHIxzajFcXNpn802
 =fjph
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-for-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm updates for 5.2

- guest SVE support
- guest Pointer Authentication support
- Better discrimination of perf counters between host and guests

Conflicts:
	include/uapi/linux/kvm.h
2019-05-15 23:41:43 +02:00
Andrew Jones 811328fc32 KVM: arm/arm64: Ensure vcpu target is unset on reset failure
A failed KVM_ARM_VCPU_INIT should not set the vcpu target,
as the vcpu target is used by kvm_vcpu_initialized() to
determine if other vcpu ioctls may proceed. We need to set
the target before calling kvm_reset_vcpu(), but if that call
fails, we should then unset it and clear the feature bitmap
while we're at it.

Signed-off-by: Andrew Jones <drjones@redhat.com>
[maz: Simplified patch, completed commit message]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-25 11:50:31 +01:00
Andrew Murray 435e53fb5e arm64: KVM: Enable VHE support for :G/:H perf event modifiers
With VHE different exception levels are used between the host (EL2) and
guest (EL1) with a shared exception level for userpace (EL0). We can take
advantage of this and use the PMU's exception level filtering to avoid
enabling/disabling counters in the world-switch code. Instead we just
modify the counter type to include or exclude EL0 at vcpu_{load,put} time.

We also ensure that trapped PMU system register writes do not re-enable
EL0 when reconfiguring the backing perf events.

This approach completely avoids blackout windows seen with !VHE.

Suggested-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-24 15:46:26 +01:00
Andrew Murray 630a16854d arm64: KVM: Encapsulate kvm_cpu_context in kvm_host_data
The virt/arm core allocates a kvm_cpu_context_t percpu, at present this is
a typedef to kvm_cpu_context and is used to store host cpu context. The
kvm_cpu_context structure is also used elsewhere to hold vcpu context.
In order to use the percpu to hold additional future host information we
encapsulate kvm_cpu_context in a new structure and rename the typedef and
percpu to match.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-24 15:35:24 +01:00
Mark Rutland 384b40caa8 KVM: arm/arm64: Context-switch ptrauth registers
When pointer authentication is supported, a guest may wish to use it.
This patch adds the necessary KVM infrastructure for this to work, with
a semi-lazy context switch of the pointer auth state.

Pointer authentication feature is only enabled when VHE is built
in the kernel and present in the CPU implementation so only VHE code
paths are modified.

When we schedule a vcpu, we disable guest usage of pointer
authentication instructions and accesses to the keys. While these are
disabled, we avoid context-switching the keys. When we trap the guest
trying to use pointer authentication functionality, we change to eagerly
context-switching the keys, and enable the feature. The next time the
vcpu is scheduled out/in, we start again. However the host key save is
optimized and implemented inside ptrauth instruction/register access
trap.

Pointer authentication consists of address authentication and generic
authentication, and CPUs in a system might have varied support for
either. Where support for either feature is not uniform, it is hidden
from guests via ID register emulation, as a result of the cpufeature
framework in the host.

Unfortunately, address authentication and generic authentication cannot
be trapped separately, as the architecture provides a single EL2 trap
covering both. If we wish to expose one without the other, we cannot
prevent a (badly-written) guest from intermittently using a feature
which is not uniformly supported (when scheduled on a physical CPU which
supports the relevant feature). Hence, this patch expects both type of
authentication to be present in a cpu.

This switch of key is done from guest enter/exit assembly as preparation
for the upcoming in-kernel pointer authentication support. Hence, these
key switching routines are not implemented in C code as they may cause
pointer authentication key signing error in some situations.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
[Only VHE, key switch in full assembly, vcpu_has_ptrauth checks
, save host key in ptrauth exception trap]
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
[maz: various fixups]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-24 15:30:40 +01:00
Dave Martin a3be836df7 KVM: arm/arm64: Demote kvm_arm_init_arch_resources() to just set up SVE
The introduction of kvm_arm_init_arch_resources() looks like
premature factoring, since nothing else uses this hook yet and it
is not clear what will use it in the future.

For now, let's not pretend that this is a general thing:

This patch simply renames the function to kvm_arm_init_sve(),
retaining the arm stub version under the new name.

Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-18 17:14:01 +01:00
Paolo Bonzini c110ae578c kvm: move KVM_CAP_NR_MEMSLOTS to common code
All architectures except MIPS were defining it in the same way,
and memory slots are handled entirely by common code so there
is no point in keeping the definition per-architecture.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16 15:39:08 +02:00
Dave Martin 7dd32a0d01 KVM: arm/arm64: Add KVM_ARM_VCPU_FINALIZE ioctl
Some aspects of vcpu configuration may be too complex to be
completed inside KVM_ARM_VCPU_INIT.  Thus, there may be a
requirement for userspace to do some additional configuration
before various other ioctls will work in a consistent way.

In particular this will be the case for SVE, where userspace will
need to negotiate the set of vector lengths to be made available to
the guest before the vcpu becomes fully usable.

In order to provide an explicit way for userspace to confirm that
it has finished setting up a particular vcpu feature, this patch
adds a new ioctl KVM_ARM_VCPU_FINALIZE.

When userspace has opted into a feature that requires finalization,
typically by means of a feature flag passed to KVM_ARM_VCPU_INIT, a
matching call to KVM_ARM_VCPU_FINALIZE is now required before
KVM_RUN or KVM_GET_REG_LIST is allowed.  Individual features may
impose additional restrictions where appropriate.

No existing vcpu features are affected by this, so current
userspace implementations will continue to work exactly as before,
with no need to issue KVM_ARM_VCPU_FINALIZE.

As implemented in this patch, KVM_ARM_VCPU_FINALIZE is currently a
placeholder: no finalizable features exist yet, so ioctl is not
required and will always yield EINVAL.  Subsequent patches will add
the finalization logic to make use of this ioctl for SVE.

No functional change for existing userspace.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-03-29 14:41:54 +00:00
Dave Martin 0f062bfe36 KVM: arm/arm64: Add hook for arch-specific KVM initialisation
This patch adds a kvm_arm_init_arch_resources() hook to perform
subarch-specific initialisation when starting up KVM.

This will be used in a subsequent patch for global SVE-related
setup on arm64.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-03-29 14:41:54 +00:00
Linus Torvalds 636deed6c0 ARM: some cleanups, direct physical timer assignment, cache sanitization
for 32-bit guests
 
 s390: interrupt cleanup, introduction of the Guest Information Block,
 preparation for processor subfunctions in cpu models
 
 PPC: bug fixes and improvements, especially related to machine checks
 and protection keys
 
 x86: many, many cleanups, including removing a bunch of MMU code for
 unnecessary optimizations; plus AVIC fixes.
 
 Generic: memcg accounting
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJci+7XAAoJEL/70l94x66DUMkIAKvEefhceySHYiTpfefjLjIC
 16RewgHa+9CO4Oo5iXiWd90fKxtXLXmxDQOS4VGzN0rxvLGRw/fyXIxL1MDOkaAO
 l8SLSNuewY4XBUgISL3PMz123r18DAGOuy9mEcYU/IMesYD2F+wy5lJ17HIGq6X2
 RpoF1p3qO1jfkPTKOob6Ixd4H5beJNPKpdth7LY3PJaVhDxgouj32fxnLnATVSnN
 gENQ10fnt8BCjshRYW6Z2/9bF15JCkUFR1xdBW2/xh1oj+kvPqqqk2bEN1eVQzUy
 2hT/XkwtpthqjSbX8NNavWRSFnOnbMLTRKQyIXmFVsM5VoSrwtiGsCFzBgcT++I=
 =XIzU
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:
   - some cleanups
   - direct physical timer assignment
   - cache sanitization for 32-bit guests

  s390:
   - interrupt cleanup
   - introduction of the Guest Information Block
   - preparation for processor subfunctions in cpu models

  PPC:
   - bug fixes and improvements, especially related to machine checks
     and protection keys

  x86:
   - many, many cleanups, including removing a bunch of MMU code for
     unnecessary optimizations
   - AVIC fixes

  Generic:
   - memcg accounting"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (147 commits)
  kvm: vmx: fix formatting of a comment
  KVM: doc: Document the life cycle of a VM and its resources
  MAINTAINERS: Add KVM selftests to existing KVM entry
  Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()"
  KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char()
  KVM: PPC: Fix compilation when KVM is not enabled
  KVM: Minor cleanups for kvm_main.c
  KVM: s390: add debug logging for cpu model subfunctions
  KVM: s390: implement subfunction processor calls
  arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2
  KVM: arm/arm64: Remove unused timer variable
  KVM: PPC: Book3S: Improve KVM reference counting
  KVM: PPC: Book3S HV: Fix build failure without IOMMU support
  Revert "KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()"
  x86: kvmguest: use TSC clocksource if invariant TSC is exposed
  KVM: Never start grow vCPU halt_poll_ns from value below halt_poll_ns_grow_start
  KVM: Expose the initial start value in grow_halt_poll_ns() as a module parameter
  KVM: grow_halt_poll_ns() should never shrink vCPU halt_poll_ns
  KVM: x86/mmu: Consolidate kvm_mmu_zap_all() and kvm_mmu_zap_mmio_sptes()
  KVM: x86/mmu: WARN if zapping a MMIO spte results in zapping children
  ...
2019-03-15 15:00:28 -07:00
Colin Ian King a37f0c3c46 KVM: arm/arm64: fix spelling mistake: "auxilary" -> "auxiliary"
There is a spelling mistake in a kvm_err error message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-02-19 21:05:54 +00:00
Christoffer Dall accb99bcd0 KVM: arm/arm64: Simplify bg_timer programming
Instead of calling into kvm_timer_[un]schedule from the main kvm
blocking path, test if the VCPU is on the wait queue from the load/put
path and perform the background timer setup/cancel in this path.

This has the distinct advantage that we no longer race between load/put
and schedule/unschedule and programming and canceling of the bg_timer
always happens when the timer state is not loaded.

Note that we must now remove the checks in kvm_timer_blocking that do
not schedule a background timer if one of the timers can fire, because
we no longer have a guarantee that kvm_vcpu_check_block() will be called
before kvm_timer_blocking.

Reported-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-02-19 21:05:36 +00:00
Christoffer Dall e329fb75d5 KVM: arm/arm64: Factor out VMID into struct kvm_vmid
In preparation for nested virtualization where we are going to have more
than a single VMID per VM, let's factor out the VMID data into a
separate VMID data structure and change the VMID allocator to operate on
this new structure instead of using a struct kvm.

This also means that udate_vttbr now becomes update_vmid, and that the
vttbr itself is generated on the fly based on the stage 2 page table
base address and the vmid.

We cache the physical address of the pgd when allocating the pgd to
avoid doing the calculation on every entry to the guest and to avoid
calling into potentially non-hyp-mapped code from hyp/EL2.

If we wanted to merge the VMID allocator with the arm64 ASID allocator
at some point in the future, it should actually become easier to do that
after this patch.

Note that to avoid mapping the kvm_vmid_bits variable into hyp, we
simply forego the masking of the vmid value in kvm_get_vttbr and rely on
update_vmid to always assign a valid vmid value (within the supported
range).

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
[maz: minor cleanups]
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-02-19 21:05:35 +00:00
Marc Zyngier 32f1395519 arm/arm64: KVM: Statically configure the host's view of MPIDR
We currently eagerly save/restore MPIDR. It turns out to be
slightly pointless:
- On the host, this value is known as soon as we're scheduled on a
  physical CPU
- In the guest, this value cannot change, as it is set by KVM
  (and this is a read-only register)

The result of the above is that we can perfectly avoid the eager
saving of MPIDR_EL1, and only keep the restore. We just have
to setup the host contexts appropriately at boot time.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2019-02-19 21:05:35 +00:00
Marc Zyngier 7aa8d14641 arm/arm64: KVM: Introduce kvm_call_hyp_ret()
Until now, we haven't differentiated between HYP calls that
have a return value and those who don't. As we're about to
change this, introduce kvm_call_hyp_ret(), and change all
call sites that actually make use of a return value.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2019-02-19 21:05:24 +00:00
Marc Zyngier 358b28f09f arm/arm64: KVM: Allow a VCPU to fully reset itself
The current kvm_psci_vcpu_on implementation will directly try to
manipulate the state of the VCPU to reset it.  However, since this is
not done on the thread that runs the VCPU, we can end up in a strangely
corrupted state when the source and target VCPUs are running at the same
time.

Fix this by factoring out all reset logic from the PSCI implementation
and forwarding the required information along with a request to the
target VCPU.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2019-02-07 11:44:13 +00:00
Linus Torvalds 42b00f122c * ARM: selftests improvements, large PUD support for HugeTLB,
single-stepping fixes, improved tracing, various timer and vGIC
 fixes
 
 * x86: Processor Tracing virtualization, STIBP support, some correctness fixes,
 refactorings and splitting of vmx.c, use the Hyper-V range TLB flush hypercall,
 reduce order of vcpu struct, WBNOINVD support, do not use -ftrace for __noclone
 functions, nested guest support for PAUSE filtering on AMD, more Hyper-V
 enlightenments (direct mode for synthetic timers)
 
 * PPC: nested VFIO
 
 * s390: bugfixes only this time
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJcH0vFAAoJEL/70l94x66Dw/wH/2FZp1YOM5OgiJzgqnXyDbyf
 dNEfWo472MtNiLsuf+ZAfJojVIu9cv7wtBfXNzW+75XZDfh/J88geHWNSiZDm3Fe
 aM4MOnGG0yF3hQrRQyEHe4IFhGFNERax8Ccv+OL44md9CjYrIrsGkRD08qwb+gNh
 P8T/3wJEKwUcVHA/1VHEIM8MlirxNENc78p6JKd/C7zb0emjGavdIpWFUMr3SNfs
 CemabhJUuwOYtwjRInyx1y34FzYwW3Ejuc9a9UoZ+COahUfkuxHE8u+EQS7vLVF6
 2VGVu5SA0PqgmLlGhHthxLqVgQYo+dB22cRnsLtXlUChtVAq8q9uu5sKzvqEzuE=
 =b4Jx
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:
   - selftests improvements
   - large PUD support for HugeTLB
   - single-stepping fixes
   - improved tracing
   - various timer and vGIC fixes

  x86:
   - Processor Tracing virtualization
   - STIBP support
   - some correctness fixes
   - refactorings and splitting of vmx.c
   - use the Hyper-V range TLB flush hypercall
   - reduce order of vcpu struct
   - WBNOINVD support
   - do not use -ftrace for __noclone functions
   - nested guest support for PAUSE filtering on AMD
   - more Hyper-V enlightenments (direct mode for synthetic timers)

  PPC:
   -  nested VFIO

  s390:
   - bugfixes only this time"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits)
  KVM: x86: Add CPUID support for new instruction WBNOINVD
  kvm: selftests: ucall: fix exit mmio address guessing
  Revert "compiler-gcc: disable -ftracer for __noclone functions"
  KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines
  KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobs
  KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup
  MAINTAINERS: Add arch/x86/kvm sub-directories to existing KVM/x86 entry
  KVM/x86: Use SVM assembly instruction mnemonics instead of .byte streams
  KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()
  KVM/MMU: Flush tlb directly in kvm_set_pte_rmapp()
  KVM/MMU: Move tlb flush in kvm_set_pte_rmapp() to kvm_mmu_notifier_change_pte()
  KVM: Make kvm_set_spte_hva() return int
  KVM: Replace old tlb flush function with new one to flush a specified range.
  KVM/MMU: Add tlb flush with range helper function
  KVM/VMX: Add hv tlb range flush support
  x86/hyper-v: Add HvFlushGuestAddressList hypercall support
  KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops
  KVM: x86: Disable Intel PT when VMXON in L1 guest
  KVM: x86: Set intercept for Intel PT MSRs read/write
  KVM: x86: Implement Intel PT MSRs read/write emulation
  ...
2018-12-26 11:46:28 -08:00
Paolo Bonzini 8c5e14f438 KVM/arm updates for 4.21
- Large PUD support for HugeTLB
 - Single-stepping fixes
 - Improved tracing
 - Various timer and vgic fixups
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAlwahf0VHG1hcmMuenlu
 Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDl3MQAKTJ2+vA1vCln2OiKJLZ0TzsSVFB
 EXiJfQ6QghD+BHeXw/XU/4X8sD8NjzIP833RvmAgQ/Gm2BpEY/Fj4CmTKaoA5wfJ
 UMEvLUFGWb19d0hbf7AllSXg3FvkpAMVof7zfKIyI7tHem6sWHmyXDiXzEfpX2un
 bS3x8OBbdVhHcjCvgc1U6Jbii0KUR8Ac5PJBJny1PWkKHFe8NYf/cX+Ii//FMdCm
 7zihQAFOpksVOI7y9wYwpmMeI52vDwesergqBBJXkklsAFAda56a2NuoG6oim3BJ
 FH/cavGGfrwcdN6Dh5tkvubfxIL5sKF57ZW0Jpy7MPK7u2Zzr7ZvRBHdvYqE+kp3
 +jieKr6t1MVnpYfOOZRZgnTqio3Cp++2GzZr283IH0WjDTnN7hhEWbU6/o8DHSge
 H/nDyxSycbUZtrGVAOm6oPoy4hNElvW8S71+rLqXVc46aKs3YheNg5MqkLawRA0q
 5U9Lw5Um/IvcjfM8DESpmYnugZV8FkzEcMZ3SQjQRYafXdjq2V2NjSMtl2+dyeDh
 KthCujhK0F1KBgw7FocNOwh2M7q6mIjw93HrX30CcT6cu2q+0apty+tjXZapP+dc
 l7Tad8iFGzAGvW0i3yNWADXhMGk721YrGmptWZh4M9B8CZr2pPzuB4nUPDMeyMYl
 XlgIgVGv24MKDjnW
 =SiUI
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-for-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm updates for 4.21

- Large PUD support for HugeTLB
- Single-stepping fixes
- Improved tracing
- Various timer and vgic fixups
2018-12-19 20:33:55 +01:00
Christoffer Dall fb544d1ca6 KVM: arm/arm64: Fix VMID alloc race by reverting to lock-less
We recently addressed a VMID generation race by introducing a read/write
lock around accesses and updates to the vmid generation values.

However, kvm_arch_vcpu_ioctl_run() also calls need_new_vmid_gen() but
does so without taking the read lock.

As far as I can tell, this can lead to the same kind of race:

  VM 0, VCPU 0			VM 0, VCPU 1
  ------------			------------
  update_vttbr (vmid 254)
  				update_vttbr (vmid 1) // roll over
				read_lock(kvm_vmid_lock);
				force_vm_exit()
  local_irq_disable
  need_new_vmid_gen == false //because vmid gen matches

  enter_guest (vmid 254)
  				kvm_arch.vttbr = <PGD>:<VMID 1>
				read_unlock(kvm_vmid_lock);

  				enter_guest (vmid 1)

Which results in running two VCPUs in the same VM with different VMIDs
and (even worse) other VCPUs from other VMs could now allocate clashing
VMID 254 from the new generation as long as VCPU 0 is not exiting.

Attempt to solve this by making sure vttbr is updated before another CPU
can observe the updated VMID generation.

Cc: stable@vger.kernel.org
Fixes: f0cf47d939 "KVM: arm/arm64: Close VMID generation race"
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18 15:14:45 +00:00
Mark Rutland bd7d95cafb arm64: KVM: Consistently advance singlestep when emulating instructions
When we emulate a guest instruction, we don't advance the hardware
singlestep state machine, and thus the guest will receive a software
step exception after a next instruction which is not emulated by the
host.

We bodge around this in an ad-hoc fashion. Sometimes we explicitly check
whether userspace requested a single step, and fake a debug exception
from within the kernel. Other times, we advance the HW singlestep state
rely on the HW to generate the exception for us. Thus, the observed step
behaviour differs for host and guest.

Let's make this simpler and consistent by always advancing the HW
singlestep state machine when we skip an instruction. Thus we can rely
on the hardware to generate the singlestep exception for us, and never
need to explicitly check for an active-pending step, nor do we need to
fake a debug exception from the guest.

Cc: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18 14:11:37 +00:00
Paolo Bonzini 2a31b9db15 kvm: introduce manual dirty log reprotect
There are two problems with KVM_GET_DIRTY_LOG.  First, and less important,
it can take kvm->mmu_lock for an extended period of time.  Second, its user
can actually see many false positives in some cases.  The latter is due
to a benign race like this:

  1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
     them.
  2. The guest modifies the pages, causing them to be marked ditry.
  3. Userspace actually copies the pages.
  4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
     they were not written to since (3).

This is especially a problem for large guests, where the time between
(1) and (3) can be substantial.  This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns.  Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page.  The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 12:34:19 +01:00
Paolo Bonzini 8fe65a8299 kvm: rename last argument to kvm_get_dirty_log_protect
When manual dirty log reprotect will be enabled, kvm_get_dirty_log_protect's
pointer argument will always be false on exit, because no TLB flush is needed
until the manual re-protection operation.  Rename it from "is_dirty" to "flush",
which more accurately tells the caller what they have to do with it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 12:34:18 +01:00
Marc Zyngier 33e5f4e509 KVM: arm64: Rework detection of SVE, !VHE systems
An SVE system is so far the only case where we mandate VHE. As we're
starting to grow this requirements, let's slightly rework the way we
deal with that situation, allowing for easy extension of this check.

Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-10 11:57:52 +00:00
Linus Torvalds 0d1e8b8d2b KVM updates for v4.20
ARM:
  - Improved guest IPA space support (32 to 52 bits)
 
  - RAS event delivery for 32bit
 
  - PMU fixes
 
  - Guest entry hardening
 
  - Various cleanups
 
  - Port of dirty_log_test selftest
 
 PPC:
  - Nested HV KVM support for radix guests on POWER9.  The performance is
    much better than with PR KVM.  Migration and arbitrary level of
    nesting is supported.
 
  - Disable nested HV-KVM on early POWER9 chips that need a particular hardware
    bug workaround
 
  - One VM per core mode to prevent potential data leaks
 
  - PCI pass-through optimization
 
  - merge ppc-kvm topic branch and kvm-ppc-fixes to get a better base
 
 s390:
  - Initial version of AP crypto virtualization via vfio-mdev
 
  - Improvement for vfio-ap
 
  - Set the host program identifier
 
  - Optimize page table locking
 
 x86:
  - Enable nested virtualization by default
 
  - Implement Hyper-V IPI hypercalls
 
  - Improve #PF and #DB handling
 
  - Allow guests to use Enlightened VMCS
 
  - Add migration selftests for VMCS and Enlightened VMCS
 
  - Allow coalesced PIO accesses
 
  - Add an option to perform nested VMCS host state consistency check
    through hardware
 
  - Automatic tuning of lapic_timer_advance_ns
 
  - Many fixes, minor improvements, and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJb0FINAAoJEED/6hsPKofoI60IAJRS3vOAQ9Fav8cJsO1oBHcX
 3+NexfnBke1bzrjIR3SUcHKGZbdnVPNZc+Q4JjIbPpPmmOMU5jc9BC1dmd5f4Vzh
 BMnQ0yCvgFv3A3fy/Icx1Z8NJppxosdmqdQLrQrNo8aD3cjnqY2yQixdXrAfzLzw
 XEgKdIFCCz8oVN/C9TT4wwJn6l9OE7BM5bMKGFy5VNXzMu7t64UDOLbbjZxNgi1g
 teYvfVGdt5mH0N7b2GPPWRbJmgnz5ygVVpVNQUEFrdKZoCm6r5u9d19N+RRXAwan
 ZYFj10W2T8pJOUf3tryev4V33X7MRQitfJBo4tP5hZfi9uRX89np5zP1CFE7AtY=
 =yEPW
 -----END PGP SIGNATURE-----

Merge tag 'kvm-4.20-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Radim Krčmář:
 "ARM:
   - Improved guest IPA space support (32 to 52 bits)

   - RAS event delivery for 32bit

   - PMU fixes

   - Guest entry hardening

   - Various cleanups

   - Port of dirty_log_test selftest

  PPC:
   - Nested HV KVM support for radix guests on POWER9. The performance
     is much better than with PR KVM. Migration and arbitrary level of
     nesting is supported.

   - Disable nested HV-KVM on early POWER9 chips that need a particular
     hardware bug workaround

   - One VM per core mode to prevent potential data leaks

   - PCI pass-through optimization

   - merge ppc-kvm topic branch and kvm-ppc-fixes to get a better base

  s390:
   - Initial version of AP crypto virtualization via vfio-mdev

   - Improvement for vfio-ap

   - Set the host program identifier

   - Optimize page table locking

  x86:
   - Enable nested virtualization by default

   - Implement Hyper-V IPI hypercalls

   - Improve #PF and #DB handling

   - Allow guests to use Enlightened VMCS

   - Add migration selftests for VMCS and Enlightened VMCS

   - Allow coalesced PIO accesses

   - Add an option to perform nested VMCS host state consistency check
     through hardware

   - Automatic tuning of lapic_timer_advance_ns

   - Many fixes, minor improvements, and cleanups"

* tag 'kvm-4.20-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
  KVM/nVMX: Do not validate that posted_intr_desc_addr is page aligned
  Revert "kvm: x86: optimize dr6 restore"
  KVM: PPC: Optimize clearing TCEs for sparse tables
  x86/kvm/nVMX: tweak shadow fields
  selftests/kvm: add missing executables to .gitignore
  KVM: arm64: Safety check PSTATE when entering guest and handle IL
  KVM: PPC: Book3S HV: Don't use streamlined entry path on early POWER9 chips
  arm/arm64: KVM: Enable 32 bits kvm vcpu events support
  arm/arm64: KVM: Rename function kvm_arch_dev_ioctl_check_extension()
  KVM: arm64: Fix caching of host MDCR_EL2 value
  KVM: VMX: enable nested virtualization by default
  KVM/x86: Use 32bit xor to clear registers in svm.c
  kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD
  kvm: vmx: Defer setting of DR6 until #DB delivery
  kvm: x86: Defer setting of CR2 until #PF delivery
  kvm: x86: Add payload operands to kvm_multiple_exception
  kvm: x86: Add exception payload fields to kvm_vcpu_events
  kvm: x86: Add has_payload and payload to kvm_queued_exception
  KVM: Documentation: Fix omission in struct kvm_vcpu_events
  KVM: selftests: add Enlightened VMCS test
  ...
2018-10-25 17:57:35 -07:00
Dongjiu Geng 58bf437ff6 arm/arm64: KVM: Enable 32 bits kvm vcpu events support
The commit 539aee0edb ("KVM: arm64: Share the parts of
get/set events useful to 32bit") shares the get/set events
helper for arm64 and arm32, but forgot to share the cap
extension code.

User space will check whether KVM supports vcpu events by
checking the KVM_CAP_VCPU_EVENTS extension

Acked-by: James Morse <james.morse@arm.com>
Reviewed-by : Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-18 10:14:03 +01:00
Dongjiu Geng 375bdd3b5d arm/arm64: KVM: Rename function kvm_arch_dev_ioctl_check_extension()
Rename kvm_arch_dev_ioctl_check_extension() to
kvm_arch_vm_ioctl_check_extension(), because it does
not have any relationship with device.

Renaming this function can make code readable.

Cc: James Morse <james.morse@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-18 10:12:53 +01:00
Mark Rutland da5a3ce66b KVM: arm64: Fix caching of host MDCR_EL2 value
At boot time, KVM stashes the host MDCR_EL2 value, but only does this
when the kernel is not running in hyp mode (i.e. is non-VHE). In these
cases, the stashed value of MDCR_EL2.HPMN happens to be zero, which can
lead to CONSTRAINED UNPREDICTABLE behaviour.

Since we use this value to derive the MDCR_EL2 value when switching
to/from a guest, after a guest have been run, the performance counters
do not behave as expected. This has been observed to result in accesses
via PMXEVTYPER_EL0 and PMXEVCNTR_EL0 not affecting the relevant
counters, resulting in events not being counted. In these cases, only
the fixed-purpose cycle counter appears to work as expected.

Fix this by always stashing the host MDCR_EL2 value, regardless of VHE.

Cc: Christopher Dall <christoffer.dall@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@vger.kernel.org
Fixes: 1e947bad0b ("arm64: KVM: Skip HYP setup when already running in HYP")
Tested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-17 18:32:46 +01:00
Marc Zyngier 9d47bb0d9e KVM: arm64: Drop __cpu_init_stage2 on the VHE path
__cpu_init_stage2 doesn't do anything anymore on arm64, and is
totally non-sensical if running VHE (as VHE is 64bit only).

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-03 11:48:30 +01:00
Marc Zyngier bca607ebc7 KVM: arm/arm64: Rename kvm_arm_config_vm to kvm_arm_setup_stage2
VM tends to be a very overloaded term in KVM, so let's keep it
to describe the virtual machine. For the virtual memory setup,
let's use the "stage2" suffix.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-03 11:45:29 +01:00