Scatter-gather lists passed to UpdateCapsule() should be cleaned
from the D-cache to ensure that they are visible to the CPU after a
warm reboot before the MMU is enabled. On ARM and arm64 systems, this
implies a D-cache clean by virtual address to the point of coherency.
However, due to the fact that the firmware itself is not able to map
physical addresses back to virtual addresses when running under the OS,
this must be done by the caller.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
With protected nVHE hyp code interception host's PSCI SMCs, the host
starts seeing new CPUs boot in EL1 instead of EL2. The kernel logic
that keeps track of the boot mode needs to be adjusted.
Add a static key enabled if KVM protected mode initialization is
successful.
When the key is enabled, is_hyp_mode_available continues to report
`true` because its users either treat it as a check whether KVM will be
/ was initialized, or whether stub HVCs can be made (eg. hibernate).
is_hyp_mode_mismatched is changed to report `false` when the key is
enabled. That's because all cores' modes matched at the point of KVM
init and KVM will not allow cores not present at init to boot. That
said, the function is never used after KVM is initialized.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-27-dbrazdil@google.com
While protected KVM is installed, start trapping all host SMCs.
For now these are simply forwarded to EL3, except PSCI
CPU_ON/CPU_SUSPEND/SYSTEM_SUSPEND which are intercepted and the
hypervisor installed on newly booted cores.
Create new constant HCR_HOST_NVHE_PROTECTED_FLAGS with the new set of HCR
flags to use while the nVHE vector is installed when the kernel was
booted with the protected flag enabled. Switch back to the default HCR
flags when switching back to the stub vector.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-26-dbrazdil@google.com
Add a handler of PSCI SMCs in nVHE hyp code. The handler is initialized
with the version used by the host's PSCI driver and the function IDs it
was configured with. If the SMC function ID matches one of the
configured PSCI calls (for v0.1) or falls into the PSCI function ID
range (for v0.2+), the SMC is handled by the PSCI handler. For now, all
SMCs return PSCI_RET_NOT_SUPPORTED.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-17-dbrazdil@google.com
When compiling with __KVM_NVHE_HYPERVISOR__, redefine per_cpu_offset()
to __hyp_per_cpu_offset() which looks up the base of the nVHE per-CPU
region of the given cpu and computes its offset from the
.hyp.data..percpu section.
This enables use of per_cpu_ptr() helpers in nVHE hyp code. Until now
only this_cpu_ptr() was supported by setting TPIDR_EL2.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-14-dbrazdil@google.com
Add rules for renaming the .data..ro_after_init ELF section in KVM nVHE
object files to .hyp.data..ro_after_init, linking it into the kernel
and mapping it in hyp at runtime.
The section is RW to the host, then mapped RO in hyp. The expectation is
that the host populates the variables in the section and they are never
changed by hyp afterwards.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-13-dbrazdil@google.com
MAIR_EL2 and TCR_EL2 are currently initialized from their _EL1 values.
This will not work once KVM starts intercepting PSCI ON/SUSPEND SMCs
and initializing EL2 state before EL1 state.
Obtain the EL1 values during KVM init and store them in the init params
struct. The struct will stay in memory and can be used when booting new
cores.
Take the opportunity to move copying the T0SZ value from idmap_t0sz in
KVM init rather than in .hyp.idmap.text. This avoids the need for the
idmap_t0sz symbol alias.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-12-dbrazdil@google.com
Once we start initializing KVM on newly booted cores before the rest of
the kernel, parameters to __do_hyp_init will need to be provided by EL2
rather than EL1. At that point it will not be possible to pass its three
arguments directly because PSCI_CPU_ON only supports one context
argument.
Refactor __do_hyp_init to accept its parameters in a struct. This
prepares the code for KVM booting cores as well as removes any limits on
the number of __do_hyp_init arguments.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-11-dbrazdil@google.com
KVM precomputes the hyp VA of __kvm_hyp_host_vector, essentially a
constant (minus ASLR), before passing it to __kvm_hyp_init.
Now that we have alternatives for converting kimg VA to hyp VA, replace
this with computing the constant inside __kvm_hyp_init, thus removing
the need for an argument.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-10-dbrazdil@google.com
When a CPU is booted in EL2, the kernel checks for VHE support and
initializes the CPU core accordingly. For nVHE it also installs the stub
vectors and drops down to EL1.
Once KVM gains the ability to boot cores without going through the
kernel entry point, it will need to initialize the CPU the same way.
Extract the relevant bits of el2_setup into an init_el2_state macro
with an argument specifying whether to initialize for VHE or nVHE.
The following ifdefs are removed:
* CONFIG_ARM_GIC_V3 - always selected on arm64
* CONFIG_COMPAT - hstr_el2 can be set even without 32-bit support
No functional change intended. Size of el2_setup increased by
148 bytes due to duplication.
Signed-off-by: David Brazdil <dbrazdil@google.com>
[maz: reworked to fit the new PSTATE initial setup code]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-9-dbrazdil@google.com
CPU index should never be negative. Change the signature of
(set_)cpu_logical_map to take an unsigned int.
This still works even if the users treat the CPU index as an int,
and will allow the hypervisor's implementation to check that the index
is valid with a single upper-bound check.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-8-dbrazdil@google.com
Expose the boolean value whether the system is running with KVM in
protected mode (nVHE + kernel param). CPU capability was selected over
a global variable to allow use in alternatives.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-3-dbrazdil@google.com
Add an early parameter that allows users to select the mode of operation
for KVM/arm64.
For now, the only supported value is "protected". By passing this flag
users opt into the hypervisor placing additional restrictions on the
host kernel. These allow the hypervisor to spawn guests whose state is
kept private from the host. Restrictions will include stage-2 address
translation to prevent host from accessing guest memory, filtering its
SMC calls, etc.
Without this parameter, the default behaviour remains selecting VHE/nVHE
based on hardware support and CONFIG_ARM64_VHE.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-2-dbrazdil@google.com
ARM64 has non-pagetable aligned large page support with PTE_CONT, when
this bit is set the page is part of a super-page. Match the hugetlb
code and support these super pages for PTE and PMD levels.
This enables PERF_SAMPLE_{DATA,CODE}_PAGE_SIZE to report accurate
pagetable leaf sizes.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20201126125747.GG2414@hirez.programming.kicks-ass.net
- Fix numerous issues with instrumentation and exception entry
- Fix hideous typo in unused register field definition
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl/HvbwQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNFTMB/oD0ucfP6CH65w+7Sbsv2L8FABfYzSrA9gP
f1cmeh1+MyRN4Nbx2ves5wcRGoX1CgZ8KFAmLXG6yyn7UDA/q27CTELknwobhOft
tQIPB2hFDW9qq3VBXFReL3aoXLnWUiRL3nBxQFt7LG1Xor/ivEb1ZFht351UklDh
u1P6NVptpjXFuGPvdqxkHo2WzT0QHI57MRuc1l7I1FRo4dV1nKSlwohu0Ydii4q9
8oLhx77Ga1SWK80IztNmpo7CSMP/FLGDwbUE3vAaftUJx5CBt+lYR1CeWNACSEvy
22y7CkJWKGQccG62oHI7zQaZm1+fum70ndP5dDlfQW/BcCaz8vRH
=KSDc
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"I'm sad to say that we've got an unusually large arm64 fixes pull for
rc7 which addresses numerous significant instrumentation issues with
our entry code.
Without these patches, lockdep is hopelessly unreliable in some
configurations [1,2] and syzkaller is therefore not a lot of use
because it's so noisy.
Although much of this has always been broken, it appears to have been
exposed more readily by other changes such as 044d0d6de9 ("lockdep:
Only trace IRQ edges") and general lockdep improvements around IRQ
tracing and NMIs.
Fixing this properly required moving much of the instrumentation hooks
from our entry assembly into C, which Mark has been working on for the
last few weeks. We're not quite ready to move to the recently added
generic functions yet, but the code here has been deliberately written
to mimic that closely so we can look at cleaning things up once we
have a bit more breathing room.
Having said all that, the second version of these patches was posted
last week and I pushed it into our CI (kernelci and cki) along with a
commit which forced on PROVE_LOCKING, NOHZ_FULL and
CONTEXT_TRACKING_FORCE. The result? We found a real bug in the
md/raid10 code [3].
Oh, and there's also a really silly typo patch that's unrelated.
Summary:
- Fix numerous issues with instrumentation and exception entry
- Fix hideous typo in unused register field definition"
[1] https://lore.kernel.org/r/CACT4Y+aAzoJ48Mh1wNYD17pJqyEcDnrxGfApir=-j171TnQXhw@mail.gmail.com
[2] https://lore.kernel.org/r/20201119193819.GA2601289@elver.google.com
[3] https://lore.kernel.org/r/94c76d5e-466a-bc5f-e6c2-a11b65c39f83@redhat.com
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mte: Fix typo in macro definition
arm64: entry: fix EL1 debug transitions
arm64: entry: fix NMI {user, kernel}->kernel transitions
arm64: entry: fix non-NMI kernel<->kernel transitions
arm64: ptrace: prepare for EL1 irq/rcu tracking
arm64: entry: fix non-NMI user<->kernel transitions
arm64: entry: move el1 irq/nmi logic to C
arm64: entry: prepare ret_to_user for function call
arm64: entry: move enter_from_user_mode to entry-common.c
arm64: entry: mark entry code as noinstr
arm64: mark idle code as noinstr
arm64: syscall: exit userspace before unmasking exceptions
Now that arm64 no longer uses UAO, remove the vestigal feature detection
code and Kconfig text.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-13-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Some code (e.g. futex) needs to make privileged accesses to userspace
memory, and uses uaccess_{enable,disable}_privileged() in order to
permit this. All other uaccess primitives use LDTR/STTR, and never need
to toggle PAN.
Remove the redundant PAN toggling.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-12-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that set_fs() is gone, addr_limit_user_check() is redundant. Remove
the checks and associated thread flag.
To ensure that _TIF_WORK_MASK can be used as an immediate value in an
AND instruction (as it is in `ret_to_user`), TIF_MTE_ASYNC_FAULT is
renumbered to keep the constituent bits of _TIF_WORK_MASK contiguous.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-11-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that the uaccess primitives dont take addr_limit into account, we
have no need to manipulate this via set_fs() and get_fs(). Remove
support for these, along with some infrastructure this renders
redundant.
We no longer need to flip UAO to access kernel memory under KERNEL_DS,
and head.S unconditionally clears UAO for all kernel configurations via
an ERET in init_kernel_el. Thus, we don't need to dynamically flip UAO,
nor do we need to context-switch it. However, we still need to adjust
PAN during SDEI entry.
Masking of __user pointers no longer needs to use the dynamic value of
addr_limit, and can use a constant derived from the maximum possible
userspace task size. A new TASK_SIZE_MAX constant is introduced for
this, which is also used by core code. In configurations supporting
52-bit VAs, this may include a region of unusable VA space above a
48-bit TTBR0 limit, but never includes any portion of TTBR1.
Note that TASK_SIZE_MAX is an exclusive limit, while USER_DS and
KERNEL_DS were inclusive limits, and is converted to a mask by
subtracting one.
As the SDEI entry code repurposes the otherwise unnecessary
pt_regs::orig_addr_limit field to store the TTBR1 of the interrupted
context, for now we rename that to pt_regs::sdei_ttbr1. In future we can
consider factoring that out.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: James Morse <james.morse@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-10-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now the uaccess primitives use LDTR/STTR unconditionally, the
uao_{ldp,stp,user_alternative} asm macros are misnamed, and have a
redundant argument. Let's remove the redundant argument and rename these
to user_{ldp,stp,ldst} respectively to clean this up.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Robin Murohy <robin.murphy@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-9-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch separates arm64's user and kernel memory access primitives
into distinct routines, adding new __{get,put}_kernel_nofault() helpers
to access kernel memory, upon which core code builds larger copy
routines.
The kernel access routines (using LDR/STR) are not affected by PAN (when
legitimately accessing kernel memory), nor are they affected by UAO.
Switching to KERNEL_DS may set UAO, but this does not adversely affect
the kernel access routines.
The user access routines (using LDTR/STTR) are not affected by PAN (when
legitimately accessing user memory), but are affected by UAO. As these
are only legitimate to use under USER_DS with UAO clear, this should not
be problematic.
Routines performing atomics to user memory (futex and deprecated
instruction emulation) still need to transiently clear PAN, and these
are left as-is. These are never used on kernel memory.
Subsequent patches will refactor the uaccess helpers to remove redundant
code, and will also remove the redundant PAN/UAO manipulation.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-8-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
As a step towards implementing __{get,put}_kernel_nofault(), this patch
splits most user-memory specific logic out of __{get,put}_user(), with
the memory access and fault handling in new __{raw_get,put}_mem()
helpers.
For now the LDR/LDTR patching is left within the *get_mem() helpers, and
will be removed in a subsequent patch.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-7-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We currently have many uaccess_*{enable,disable}*() variants, which
subsequent patches will cut down as part of removing set_fs() and
friends. Once this simplification is made, most uaccess routines will
only need to ensure that the user page tables are mapped in TTBR0, as is
currently dealt with by uaccess_ttbr0_{enable,disable}().
The existing uaccess_{enable,disable}() routines ensure that user page
tables are mapped in TTBR0, and also disable PAN protections, which is
necessary to be able to use atomics on user memory, but also permit
unrelated privileged accesses to access user memory.
As preparatory step, let's rename uaccess_{enable,disable}() to
uaccess_{enable,disable}_privileged(), highlighting this caveat and
discouraging wider misuse. Subsequent patches can reuse the
uaccess_{enable,disable}() naming for the common case of ensuring the
user page tables are mapped in TTBR0.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In preparation for removing addr_limit and set_fs() we must decouple the
SDEI PAN/UAO manipulation from the uaccess code, and explicitly
reinitialize these as required.
SDEI enters the kernel with a non-architectural exception, and prior to
the most recent revision of the specification (ARM DEN 0054B), PSTATE
bits (e.g. PAN, UAO) are not manipulated in the same way as for
architectural exceptions. Notably, older versions of the spec can be
read ambiguously as to whether PSTATE bits are inherited unchanged from
the interrupted context or whether they are generated from scratch, with
TF-A doing the latter.
We have three cases to consider:
1) The existing TF-A implementation of SDEI will clear PAN and clear UAO
(along with other bits in PSTATE) when delivering an SDEI exception.
2) In theory, implementations of SDEI prior to revision B could inherit
PAN and UAO (along with other bits in PSTATE) unchanged from the
interrupted context. However, in practice such implementations do not
exist.
3) Going forward, new implementations of SDEI must clear UAO, and
depending on SCTLR_ELx.SPAN must either inherit or set PAN.
As we can ignore (2) we can assume that upon SDEI entry, UAO is always
clear, though PAN may be clear, inherited, or set per SCTLR_ELx.SPAN.
Therefore, we must explicitly initialize PAN, but do not need to do
anything for UAO.
Considering what we need to do:
* When set_fs() is removed, force_uaccess_begin() will have no HW
side-effects. As this only clears UAO, which we can assume has already
been cleared upon entry, this is not a problem. We do not need to add
code to manipulate UAO explicitly.
* PAN may be cleared upon entry (in case 1 above), so where a kernel is
built to use PAN and this is supported by all CPUs, the kernel must
set PAN upon entry to ensure expected behaviour.
* PAN may be inherited from the interrupted context (in case 3 above),
and so where a kernel is not built to use PAN or where PAN support is
not uniform across CPUs, the kernel must clear PAN to ensure expected
behaviour.
This patch reworks the SDEI code accordingly, explicitly setting PAN to
the expected state in all cases. To cater for the cases where the kernel
does not use PAN or this is not uniformly supported by hardware we add a
new cpu_has_pan() helper which can be used regardless of whether the
kernel is built to use PAN.
The existing system_uses_ttbr0_pan() is redefined in terms of
system_uses_hw_pan() both for clarity and as a minor optimization when
HW PAN is not selected.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
As with SCTLR_ELx and other control registers, some PSTATE bits are
UNKNOWN out-of-reset, and we may not be able to rely on hardware or
firmware to initialize them to our liking prior to entry to the kernel,
e.g. in the primary/secondary boot paths and return from idle/suspend.
It would be more robust (and easier to reason about) if we consistently
initialized PSTATE to a default value, as we do with control registers.
This will ensure that the kernel is not adversely affected by bits it is
not aware of, e.g. when support for a feature such as PAN/UAO is
disabled.
This patch ensures that PSTATE is consistently initialized at boot time
via an ERET. This is not intended to relax the existing requirements
(e.g. DAIF bits must still be set prior to entering the kernel). For
features detected dynamically (which may require system-wide support),
it is still necessary to subsequently modify PSTATE.
As ERET is not always a Context Synchronization Event, an ISB is placed
before each exception return to ensure updates to control registers have
taken effect. This handles the kernel being entered with SCTLR_ELx.EOS
clear (or any future control bits being in an UNKNOWN state).
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201113124937.20574-6-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Let's make SCTLR_ELx initialization a bit clearer by using meaningful
names for the initialization values, following the same scheme for
SCTLR_EL1 and SCTLR_EL2.
These definitions will be used more widely in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201113124937.20574-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
To make callsites easier to read, add trivial C wrappers for the
SET_PSTATE_*() helpers, and convert trivial uses over to these. The new
wrappers will be used further in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201113124937.20574-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
If we get a FSC_PERM fault, just using (logging_active && writable) to
determine calling kvm_pgtable_stage2_map(). There will be two more cases
we should consider.
(1) After logging_active is configged back to false from true. When we
get a FSC_PERM fault with write_fault and adjustment of hugepage is needed,
we should merge tables back to a block entry. This case is ignored by still
calling kvm_pgtable_stage2_relax_perms(), which will lead to an endless
loop and guest panic due to soft lockup.
(2) We use (FSC_PERM && logging_active && writable) to determine
collapsing a block entry into a table by calling kvm_pgtable_stage2_map().
But sometimes we may only need to relax permissions when trying to write
to a page other than a block.
In this condition,using kvm_pgtable_stage2_relax_perms() will be fine.
The ISS filed bit[1:0] in ESR_EL2 regesiter indicates the stage2 lookup
level at which a D-abort or I-abort occurred. By comparing granule of
the fault lookup level with vma_pagesize, we can strictly distinguish
conditions of calling kvm_pgtable_stage2_relax_perms() or
kvm_pgtable_stage2_map(), and the above two cases will be well considered.
Suggested-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201201201034.116760-4-wangyanan55@huawei.com
UL in the definition of SYS_TFSR_EL1_TF1 was misspelled causing
compilation issues when trying to implement in kernel MTE async
mode.
Fix the macro correcting the typo.
Note: MTE async mode will be introduced with a future series.
Fixes: c058b1c4a5 ("arm64: mte: system register definitions")
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20201130170709.22309-1-vincenzo.frascino@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Cores that predate the introduction of ID_AA64PFR0_EL1.CSV3 to
the ARMv8 architecture have this field set to 0, even of some of
them are not affected by the vulnerability.
The kernel maintains a list of unaffected cores (A53, A55 and a few
others) so that it doesn't impose an expensive mitigation uncessarily.
As we do for CSV2, let's expose the CSV3 property to guests that run
on HW that is effectively not vulnerable. This can be reset to zero
by writing to the ID register from userspace, ensuring that VMs can
be migrated despite the new property being set.
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Exceptions which can be taken at (almost) any time are consdiered to be
NMIs. On arm64 that includes:
* SDEI events
* GICv3 Pseudo-NMIs
* Kernel stack overflows
* Unexpected/unhandled exceptions
... but currently debug exceptions (BRKs, breakpoints, watchpoints,
single-step) are not considered NMIs.
As these can be taken at any time, kernel features (lockdep, RCU,
ftrace) may not be in a consistent kernel state. For example, we may
take an NMI from the idle code or partway through an entry/exit path.
While nmi_enter() and nmi_exit() handle most of this state, notably they
don't save/restore the lockdep state across an NMI being taken and
handled. When interrupts are enabled and an NMI is taken, lockdep may
see interrupts become disabled within the NMI code, but not see
interrupts become enabled when returning from the NMI, leaving lockdep
believing interrupts are disabled when they are actually disabled.
The x86 code handles this in idtentry_{enter,exit}_nmi(), which will
shortly be moved to the generic entry code. As we can't use either yet,
we copy the x86 approach in arm64-specific helpers. All the NMI
entrypoints are marked as noinstr to prevent any instrumentation
handling code being invoked before the state has been corrected.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-11-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
There are periods in kernel mode when RCU is not watching and/or the
scheduler tick is disabled, but we can still take exceptions such as
interrupts. The arm64 exception handlers do not account for this, and
it's possible that RCU is not watching while an exception handler runs.
The x86/generic entry code handles this by ensuring that all (non-NMI)
kernel exception handlers call irqentry_enter() and irqentry_exit(),
which handle RCU, lockdep, and IRQ flag tracing. We can't yet move to
the generic entry code, and already hadnle the user<->kernel transitions
elsewhere, so we add new kernel<->kernel transition helpers alog the
lines of the generic entry code.
Since we now track interrupts becoming masked when an exception is
taken, local_daif_inherit() is modified to track interrupts becoming
re-enabled when the original context is inherited. To balance the
entry/exit paths, each handler masks all DAIF exceptions before
exit_to_kernel_mode().
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-10-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Exceptions from EL1 may be taken when RCU isn't watching (e.g. in idle
sequences), or when the lockdep hardirqs transiently out-of-sync with
the hardware state (e.g. in the middle of local_irq_enable()). To
correctly handle these cases, we'll need to save/restore this state
across some exceptions taken from EL1.
A series of subsequent patches will update EL1 exception handlers to
handle this. In preparation for this, and to avoid dependencies between
those patches, this patch adds two new fields to struct pt_regs so that
exception handlers can track this state.
Note that this is placed in pt_regs as some entry/exit sequences such as
el1_irq are invoked from assembly, which makes it very difficult to add
a separate structure as with the irqentry_state used by x86. We can
separate this once more of the exception logic is moved to C. While the
fields only need to be bool, they are both made u64 to keep pt_regs
16-byte aligned.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-9-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
When built with PROVE_LOCKING, NO_HZ_FULL, and CONTEXT_TRACKING_FORCE
will WARN() at boot time that interrupts are enabled when we call
context_tracking_user_enter(), despite the DAIF flags indicating that
IRQs are masked.
The problem is that we're not tracking IRQ flag changes accurately, and
so lockdep believes interrupts are enabled when they are not (and
vice-versa). We can shuffle things so to make this more accurate. For
kernel->user transitions there are a number of constraints we need to
consider:
1) When we call __context_tracking_user_enter() HW IRQs must be disabled
and lockdep must be up-to-date with this.
2) Userspace should be treated as having IRQs enabled from the PoV of
both lockdep and tracing.
3) As context_tracking_user_enter() stops RCU from watching, we cannot
use RCU after calling it.
4) IRQ flag tracing and lockdep have state that must be manipulated
before RCU is disabled.
... with similar constraints applying for user->kernel transitions, with
the ordering reversed.
The generic entry code has enter_from_user_mode() and
exit_to_user_mode() helpers to handle this. We can't use those directly,
so we add arm64 copies for now (without the instrumentation markers
which aren't used on arm64). These replace the existing user_exit() and
user_exit_irqoff() calls spread throughout handlers, and the exception
unmasking is left as-is.
Note that:
* The accounting for debug exceptions from userspace now happens in
el0_dbg() and ret_to_user(), so this is removed from
debug_exception_enter() and debug_exception_exit(). As
user_exit_irqoff() wakes RCU, the userspace-specific check is removed.
* The accounting for syscalls now happens in el0_svc(),
el0_svc_compat(), and ret_to_user(), so this is removed from
el0_svc_common(). This does not adversely affect the workaround for
erratum 1463225, as this does not depend on any of the state tracking.
* In ret_to_user() we mask interrupts with local_daif_mask(), and so we
need to inform lockdep and tracing. Here a trace_hardirqs_off() is
sufficient and safe as we have not yet exited kernel context and RCU
is usable.
* As PROVE_LOCKING selects TRACE_IRQFLAGS, the ifdeferry in entry.S only
needs to check for the latter.
* EL0 SError handling will be dealt with in a subsequent patch, as this
needs to be treated as an NMI.
Prior to this patch, booting an appropriately-configured kernel would
result in spats as below:
| DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
| WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5280 check_flags.part.54+0x1dc/0x1f0
| Modules linked in:
| CPU: 2 PID: 1 Comm: init Not tainted 5.10.0-rc3 #3
| Hardware name: linux,dummy-virt (DT)
| pstate: 804003c5 (Nzcv DAIF +PAN -UAO -TCO BTYPE=--)
| pc : check_flags.part.54+0x1dc/0x1f0
| lr : check_flags.part.54+0x1dc/0x1f0
| sp : ffff80001003bd80
| x29: ffff80001003bd80 x28: ffff66ce801e0000
| x27: 00000000ffffffff x26: 00000000000003c0
| x25: 0000000000000000 x24: ffffc31842527258
| x23: ffffc31842491368 x22: ffffc3184282d000
| x21: 0000000000000000 x20: 0000000000000001
| x19: ffffc318432ce000 x18: 0080000000000000
| x17: 0000000000000000 x16: ffffc31840f18a78
| x15: 0000000000000001 x14: ffffc3184285c810
| x13: 0000000000000001 x12: 0000000000000000
| x11: ffffc318415857a0 x10: ffffc318406614c0
| x9 : ffffc318415857a0 x8 : ffffc31841f1d000
| x7 : 647261685f706564 x6 : ffffc3183ff7c66c
| x5 : ffff66ce801e0000 x4 : 0000000000000000
| x3 : ffffc3183fe00000 x2 : ffffc31841500000
| x1 : e956dc24146b3500 x0 : 0000000000000000
| Call trace:
| check_flags.part.54+0x1dc/0x1f0
| lock_is_held_type+0x10c/0x188
| rcu_read_lock_sched_held+0x70/0x98
| __context_tracking_enter+0x310/0x350
| context_tracking_enter.part.3+0x5c/0xc8
| context_tracking_user_enter+0x6c/0x80
| finish_ret_to_user+0x2c/0x13cr
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-8-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
In preparation for reworking the EL1 irq/nmi entry code, move the
existing logic to C. We no longer need the asm_nmi_enter() and
asm_nmi_exit() wrappers, so these are removed. The new C functions are
marked noinstr, which prevents compiler instrumentation and runtime
probing.
In subsequent patches we'll want the new C helpers to be called in all
cases, so we don't bother wrapping the calls with ifdeferry. Even when
the new C functions are stubs the trivial calls are unlikely to have a
measurable impact on the IRQ or NMI paths anyway.
Prototypes are added to <asm/exception.h> as otherwise (in some
configurations) GCC will complain about the lack of a forward
declaration. We already do this for existing function, e.g.
enter_from_user_mode().
The new helpers are marked as noinstr (which prevents all
instrumentation, tracing, and kprobes). Otherwise, there should be no
functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-7-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Our Meltdown mitigation state isn't exposed outside of cpufeature.c,
contrary to the rest of the Spectre mitigation state. As we are going
to use it in KVM, expose a arm64_get_meltdown_state() helper which
returns the same possible values as arm64_get_spectre_v?_state().
Signed-off-by: Marc Zyngier <maz@kernel.org>
__extended_idmap_trampoline() was removed a long time ago by
3421e9d88d ("arm64: KVM: Simplify HYP init/teardown") so remove the
unused function prototype.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201118194402.2892-4-will@kernel.org
kvm_arch_vm_ioctl_check_extension() is only called from
kvm_vm_ioctl_check_extension(), so we can inline it and remove the extra
function.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201118194402.2892-3-will@kernel.org
'struct kvm_arch_memory_slot' isn't part of the user ABI, so move it out
of the uapi/ headers in case we start using it in future and accidentally
back ourselves into a corner.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201118194402.2892-2-will@kernel.org
- Fix kerneldoc warnings generated by ACPI IORT code
- Fix pte_accessible() so that access flag is ignored
- Fix missing header #include
- Fix loss of software dirty bit across pte_wrprotect() when HW DBM is enabled
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl/A4moQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNKhtB/4kdY/Do6h8ivkVNy1oId9mWSHrk9HwoEmS
wFvyTIuyhaS2NlNW2/9V662e+wWhzv0ULkDu6aSVsnuw88b1yz1X2ytuh/t8beMJ
PZGWO9UUuVuf+YjcaI7ygppXHILQWdau8eyOVMTciMOYYZRdIqex8WPApxpATRO4
7TphGUuo0IH0GGwq4u6JhPrT0Ihr9txWnvqnM8+fP/y3IbFncH+GKjxZq0k0BwcK
1fkIINwRFHmvkPouPyRPzE6DBudIlHPMGS68CXJV5/T4+JQ5bjc/bmAm8AYVVgip
xSUeKg/x7AMF/8vho0pnvxAqh0FF6c370EjiKBk6AIGNJhjdROxt
=e2/i
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The main changes are relating to our handling of access/dirty bits,
where our low-level page-table helpers could lead to stale young
mappings and loss of the dirty bit in some cases (the latter has not
been observed in practice, but could happen when clearing "soft-dirty"
if we enabled that). These were posted as part of a larger series, but
the rest of that is less urgent and needs a v2 which I'll get to
shortly.
In other news, we've now got a set of fixes to resolve the
lockdep/tracing problems that have been plaguing us for a while, but
they're still a bit "fresh" and I plan to send them to you next week
after we've got some more confidence in them (although initial CI
results look good).
Summary:
- Fix kerneldoc warnings generated by ACPI IORT code
- Fix pte_accessible() so that access flag is ignored
- Fix missing header #include
- Fix loss of software dirty bit across pte_wrprotect() when HW DBM
is enabled"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: pgtable: Ensure dirty bit is preserved across pte_wrprotect()
arm64: pgtable: Fix pte_accessible()
ACPI/IORT: Fix doc warnings in iort.c
arm64/fpsimd: add <asm/insn.h> to <asm/kprobes.h> to fix fpsimd build
There are a number of places where we check for the KVM_ARM_VCPU_PMU_V3
feature. Wrap this check into a new kvm_vcpu_has_pmu(), and use
it at the existing locations.
No functional change.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Move the setting of SSBS directly into the HVC handler, using
the C helpers rather than the inline asssembly code.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Directly using the kimage_voffset variable is fine for now, but
will become more problematic as we start distrusting EL1.
Instead, patch the kimage_voffset into the HYP text, ensuring
we don't have to load an untrusted value later on.
Signed-off-by: Marc Zyngier <maz@kernel.org>
With hardware dirty bit management, calling pte_wrprotect() on a writable,
dirty PTE will lose the dirty state and return a read-only, clean entry.
Move the logic from ptep_set_wrprotect() into pte_wrprotect() to ensure that
the dirty bit is preserved for writable entries, as this is required for
soft-dirty bit management if we enable it in the future.
Cc: <stable@vger.kernel.org>
Fixes: 2f4b829c62 ("arm64: Add support for hardware updates of the access and dirty pte bits")
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20201120143557.6715-3-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
pte_accessible() is used by ptep_clear_flush() to figure out whether TLB
invalidation is necessary when unmapping pages for reclaim. Although our
implementation is correct according to the architecture, returning true
only for valid, young ptes in the absence of racing page-table
modifications, this is in fact flawed due to lazy invalidation of old
ptes in ptep_clear_flush_young() where we elide the expensive DSB
instruction for completing the TLB invalidation.
Rather than penalise the aging path, adjust pte_accessible() to return
true for any valid pte, even if the access flag is cleared.
Cc: <stable@vger.kernel.org>
Fixes: 76c714be0e ("arm64: pgtable: implement pte_accessible()")
Reported-by: Yu Zhao <yuzhao@google.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20201120143557.6715-2-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Adding <asm/exception.h> brought in <asm/kprobes.h> which uses
<asm/probes.h>, which uses 'pstate_check_t' so the latter needs to
#include <asm/insn.h> for this typedef.
Fixes this build error:
In file included from arch/arm64/include/asm/kprobes.h:24,
from arch/arm64/include/asm/exception.h:11,
from arch/arm64/kernel/fpsimd.c:35:
arch/arm64/include/asm/probes.h:16:2: error: unknown type name 'pstate_check_t'
16 | pstate_check_t *pstate_cc;
Fixes: c6b90d5cf6 ("arm64/fpsimd: Fix missing-prototypes in fpsimd.c")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Tian Tao <tiantao6@hisilicon.com>
Link: https://lore.kernel.org/r/20201123044510.9942-1-rdunlap@infradead.org
Signed-off-by: Will Deacon <will@kernel.org>
irq_cpustat_t is exactly the same as the asm-generic one. Define
ack_bad_irq so the generic header does not emit the generic version of it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201113141733.392015387@linutronix.de
To enable seccomp constant action bitmaps, we need to have a static
mapping to the audit architecture and system call table size. Add these
for arm64.
Signed-off-by: Kees Cook <keescook@chromium.org>
The hyp vectors entry corresponding to HYP_VECTOR_DIRECT (i.e. when
neither Spectre-v2 nor Spectre-v3a are present) is unused, as we can
simply dispatch straight to __kvm_hyp_vector in this case.
Remove the redundant vector, and massage the logic for resolving a slot
to a vectors entry.
Reported-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201113113847.21619-11-will@kernel.org
The spectre-v3a mitigation is split between cpu_errata.c and spectre.c,
with the former handling detection of the problem and the latter handling
enabling of the workaround.
Move the detection logic alongside the enabling logic, like we do for the
other spectre mitigations.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20201113113847.21619-10-will@kernel.org
Since ARM64_HARDEN_EL2_VECTORS is really a mitigation for Spectre-v3a,
rename it accordingly for consistency with the v2 and v4 mitigation.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20201113113847.21619-9-will@kernel.org
The EL2 vectors installed when a guest is running point at one of the
following configurations for a given CPU:
- Straight at __kvm_hyp_vector
- A trampoline containing an SMC sequence to mitigate Spectre-v2 and
then a direct branch to __kvm_hyp_vector
- A dynamically-allocated trampoline which has an indirect branch to
__kvm_hyp_vector
- A dynamically-allocated trampoline containing an SMC sequence to
mitigate Spectre-v2 and then an indirect branch to __kvm_hyp_vector
The indirect branches mean that VA randomization at EL2 isn't trivially
bypassable using Spectre-v3a (where the vector base is readable by the
guest).
Rather than populate these vectors dynamically, configure everything
statically and use an enumerated type to identify the vector "slot"
corresponding to one of the configurations above. This both simplifies
the code, but also makes it much easier to implement at EL2 later on.
Signed-off-by: Will Deacon <will@kernel.org>
[maz: fixed double call to kvm_init_vector_slots() on nVHE]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20201113113847.21619-8-will@kernel.org
The BP hardening helpers are an integral part of the Spectre-v2
mitigation, so move them into asm/spectre.h and inline the
arm64_get_bp_hardening_data() function at the same time.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20201113113847.21619-6-will@kernel.org
Branch predictor hardening of the hyp vectors is partially driven by a
couple of global variables ('__kvm_bp_vect_base' and
'__kvm_harden_el2_vector_slot'). However, these are only used within a
single compilation unit, so internalise them there instead.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20201113113847.21619-5-will@kernel.org
kvm_get_hyp_vector() has only one caller, so move it out of kvm_mmu.h
and inline it into a new function, cpu_set_hyp_vector(), for setting
the vector.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20201113113847.21619-4-will@kernel.org
without two-dimensional paging (EPT/NPT).
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl+xQ54UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMzeQf+JP9NpXgeB7dhiODhmO5SyLdw0u9j
kVOM6+kHcEvG6o0yU1uUZr2ZPh9vIAwIjXi8Luiodcazdp6jvxvJ32CeMYJz2lel
y+3Gjp3WS2+FExOjBephBztaMHLihlWQt3E0EKuCc7StyfMhaZooiTRMpvrmiLWe
HQ/epM9oLMyrCqG9MKkvTwH0lDyB5CprV1BNt6YyKjt7d5swEqC75A6lOXnmdAah
utgx1agSIVQPv6vDF9HLaQaoelHT7ucudx+zIkvOAmoQ56AJMPfCr0+Af3ZVW+f/
I5tXVfBhoOV3BVSIsJS7Px0HcZt7siVtl6ISZZos8ox85S4ysjWm2vXFcQ==
=MiOr
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Fixes for ARM and x86, the latter especially for old processors
without two-dimensional paging (EPT/NPT)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: mmu: fix is_tdp_mmu_check when the TDP MMU is not in use
KVM: SVM: Update cr3_lm_rsvd_bits for AMD SEV guests
KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch
KVM: x86: clflushopt should be treated as a no-op by emulation
KVM: arm64: Handle SCXTNUM_ELx traps
KVM: arm64: Unify trap handlers injecting an UNDEF
KVM: arm64: Allow setting of ID_AA64PFR0_EL1.CSV2 from userspace
If Activity Monitors (AMUs) are present, two of the counters can be used
to implement support for CPPC's (Collaborative Processor Performance
Control) delivered and reference performance monitoring functionality
using FFH (Functional Fixed Hardware).
Given that counters for a certain CPU can only be read from that CPU,
while FFH operations can be called from any CPU for any of the CPUs, use
smp_call_function_single() to provide the requested values.
Therefore, depending on the register addresses, the following values
are returned:
- 0x0 (DeliveredPerformanceCounterRegister): AMU core counter
- 0x1 (ReferencePerformanceCounterRegister): AMU constant counter
The use of Activity Monitors is hidden behind the generic
cpu_read_{corecnt,constcnt}() functions.
Read functionality for these two registers represents the only current
FFH support for CPPC. Read operations for other register values or write
operation for all registers are unsupported. Therefore, keep CPPC's FFH
unsupported if no CPUs have valid AMU frequency counters. For this
purpose, the get_cpu_with_amu_feat() is introduced.
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201106125334.21570-4-ionela.voinescu@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In preparation for other uses of Activity Monitors (AMU) cycle counters,
place counter read functionality in generic functions that can reused:
read_corecnt() and read_constcnt().
As a result, implement update_freq_counters_refs() to replace
init_cpu_freq_invariance_counters() and both initialise and update
the per-cpu reference variables.
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201106125334.21570-2-ionela.voinescu@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
- Spectre/Meltdown safelisting for some Qualcomm KRYO cores
- Fix RCU splat when failing to online a CPU due to a feature mismatch
- Fix a recently introduced sparse warning in kexec()
- Fix handling of CPU erratum 1418040 for late CPUs
- Ensure hot-added memory falls within linear-mapped region
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl+ubogQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNPD7B/9i5ao44AEJwjz0a68S/jD7kUD7i3xVkCNN
Y8i/i9mx44IAcf8pmyQh3ngaFlJuF2C6oC/SQFiDbmVeGeZXLXvXV7uGAqXG5Xjm
O2Svgr1ry176JWpsB7MNnZwzAatQffdkDjbjQCcUnUIKYcLvge8H2fICljujGcfQ
094vNmT9VerTWRbWDti3Ck/ug+sanVHuzk5BWdKx3jamjeTqo+sBZK/wgBr6UoYQ
mT3BFX42kLIGg+AzwXRDPlzkJymjYgQDbSwGsvny8qKdOEJbAUwWXYZ5sTs9J/gU
E9PT3VJI7BYtTd1uPEWkD645U3arfx3Pf2JcJlbkEp86qx4CUF9s
=T6k4
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
- Spectre/Meltdown safelisting for some Qualcomm KRYO cores
- Fix RCU splat when failing to online a CPU due to a feature mismatch
- Fix a recently introduced sparse warning in kexec()
- Fix handling of CPU erratum 1418040 for late CPUs
- Ensure hot-added memory falls within linear-mapped region
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: cpu_errata: Apply Erratum 845719 to KRYO2XX Silver
arm64: proton-pack: Add KRYO2XX silver CPUs to spectre-v2 safe-list
arm64: kpti: Add KRYO2XX gold/silver CPU cores to kpti safelist
arm64: Add MIDR value for KRYO2XX gold/silver CPU cores
arm64/mm: Validate hotplug range before creating linear mapping
arm64: smp: Tell RCU about CPUs that fail to come online
arm64: psci: Avoid printing in cpu_psci_cpu_die()
arm64: kexec_file: Fix sparse warning
arm64: errata: Fix handling of 1418040 with late CPU onlining
Add MIDR value for KRYO2XX gold (big) and silver (LITTLE)
CPU cores which are used in Qualcomm Technologies, Inc.
SoCs. This will be used to identify and apply errata
which are applicable for these CPU cores.
Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
Link: https://lore.kernel.org/r/20201104232218.198800-2-konrad.dybcio@somainline.org
Signed-off-by: Will Deacon <will@kernel.org>
As the kernel never sets HCR_EL2.EnSCXT, accesses to SCXTNUM_ELx
will trap to EL2. Let's handle that as gracefully as possible
by injecting an UNDEF exception into the guest. This is consistent
with the guest's view of ID_AA64PFR0_EL1.CSV2 being at most 1.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201110141308.451654-4-maz@kernel.org
We now expose ID_AA64PFR0_EL1.CSV2=1 to guests running on hosts
that are immune to Spectre-v2, but that don't have this field set,
most likely because they predate the specification.
However, this prevents the migration of guests that have started on
a host the doesn't fake this CSV2 setting to one that does, as KVM
rejects the write to ID_AA64PFR0_EL2 on the grounds that it isn't
what is already there.
In order to fix this, allow userspace to set this field as long as
this doesn't result in a promising more than what is already there
(setting CSV2 to 0 is acceptable, but setting it to 1 when it is
already set to 0 isn't).
Fixes: e1026237f9 ("KVM: arm64: Set CSV2 for guests on hardware unaffected by Spectre-v2")
Reported-by: Peng Liang <liangpeng10@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201110141308.451654-2-maz@kernel.org
Commit 8c96400d6a simplified the page-to-virt and virt-to-page
conversions, based on the assumption that struct page is always 64
bytes in size, in which case we can use a single signed shift to
perform the conversion (provided that the vmemmap array is placed
appropriately in the kernel VA space)
Unfortunately, this assumption turns out not to hold, and so we need
to revert part of this commit, and go back to an affine transformation.
Given that all the quantities involved are compile time constants,
this should not make any practical difference.
Fixes: 8c96400d6a ("arm64: mm: make vmemmap region a projection of the linear region")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20201110180511.29083-1-ardb@kernel.org
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Depending on configuration options and specific code paths, we either
use the empty_zero_page or the configuration-dependent reserved_ttbr0
as a reserved value for TTBR{0,1}_EL1.
To simplify this code, let's always allocate and use the same
reserved_pg_dir, replacing reserved_ttbr0. Note that this is allocated
(and hence pre-zeroed), and is also marked as read-only in the kernel
Image mapping.
Keeping this separate from the empty_zero_page potentially helps with
robustness as the empty_zero_page is used in a number of cases where a
failure to map it read-only could allow it to become corrupted.
The (presently unused) swapper_pg_end symbol is also removed, and
comments are added wherever we rely on the offsets between the
pre-allocated pg_dirs to keep these cases easily identifiable.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201103102229.8542-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The kprobe_step_ctx (kcb->ss_ctx) has ss_pending and match_addr, but
those are redundant because those can be replaced by KPROBE_HIT_SS and
&cur_kprobe->ainsn.api.insn[1] respectively.
To simplify the code, remove the kprobe_step_ctx.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201103134900.337243-2-jean-philippe@linaro.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In a surprising turn of events, it transpires that CPU capabilities
configured as ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE are never set as the
result of late-onlining. Therefore our handling of erratum 1418040 does
not get activated if it is not required by any of the boot CPUs, even
though we allow late-onlining of an affected CPU.
In order to get things working again, replace the cpus_have_const_cap()
invocation with an explicit check for the current CPU using
this_cpu_has_cap().
Cc: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201106114952.10032-1-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
kvm_coproc.h used to serve as a compatibility layer for the files
shared between the 32 and 64 bit ports.
Another one bites the dust...
Signed-off-by: Marc Zyngier <maz@kernel.org>
Similarly to what has been done on the cp15 front, repaint the
debug registers to use their AArch64 counterparts. This results
in some simplification as we can remove the 32bit-specific
accessors.
Signed-off-by: Marc Zyngier <maz@kernel.org>
The use of the AArch32-specific accessors have always been a bit
annoying on 64bit, and it is time for a change.
Let's move the AArch32 exception injection over to the AArch64 encoding,
which requires us to split the two halves of FAR_EL1 into DFAR and IFAR.
This enables us to drop the preempt_disable() games on VHE, and to kill
the last user of the vcpu_cp15() macro.
Signed-off-by: Marc Zyngier <maz@kernel.org>
ARMv8.2 introduced TTBCR2, which shares TCR_EL1 with TTBCR.
Gracefully handle traps to this register when HCR_EL2.TVM is set.
Cc: stable@vger.kernel.org
Reported-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
The only use of the register mapping code was for the sake of the LR
mapping, which we trivially solved in a previous patch. Get rid of
the whole thing now.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Move the AArch32 exception injection code back into the inject_fault.c
file, removing the need for a few non-static functions now that AArch32
host support is a thing of the past.
Signed-off-by: Marc Zyngier <maz@kernel.org>
The SPSR setting code is now completely unused, including that dealing
with banked AArch32 SPSRs. Cleanup time.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Move the AArch64 exception injection code from EL1 to HYP, leaving
only the ESR_EL1 updates to EL1. In order to come with the differences
between VHE and nVHE, two set of system register accessors are provided.
SPSR, ELR, PC and PSTATE are now completely handled in the hypervisor.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Add the basic infrastructure to describe injection of exceptions
into a guest. So far, nothing uses this code path.
Signed-off-by: Marc Zyngier <maz@kernel.org>
As we are about to need to access system registers from the HYP
code based on their internal encoding, move the direct sysreg
accessors to a common include file, with a VHE-specific guard.
No functionnal change.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
In an effort to remove the vcpu PC manipulations from EL1 on nVHE
systems, move kvm_skip_instr() to be HYP-specific. EL1's intent
to increment PC post emulation is now signalled via a flag in the
vcpu structure.
Signed-off-by: Marc Zyngier <maz@kernel.org>
There is no need to feed the result of kvm_vcpu_trap_il_is32bit()
to kvm_skip_instr(), as only AArch32 has a variable length ISA, and
this helper can equally be called from kvm_skip_instr32(), reducing
the complexity at all the call sites.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
When building with LTO, there is an increased risk of the compiler
converting an address dependency headed by a READ_ONCE() invocation
into a control dependency and consequently allowing for harmful
reordering by the CPU.
Ensure that such transformations are harmless by overriding the generic
READ_ONCE() definition with one that provides acquire semantics when
building with LTO.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Armv8.3 introduced the LDAPR instruction, which provides weaker memory
ordering semantics than LDARi (RCpc vs RCsc). Generally, we provide an
RCsc implementation when implementing the Linux memory model, but LDAPR
can be used as a useful alternative to dependency ordering, particularly
when the compiler is capable of breaking the dependencies.
Since LDAPR is not available on all CPUs, add a cpufeature to detect it at
runtime and allow the instruction to be used with alternative code
patching.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
asm/alternative.h contains both the macros needed to use alternatives,
as well the type definitions and function prototypes for applying them.
Split the header in two, so that alternatives can be used from core
header files such as linux/compiler.h without the risk of circular
includes
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The uao_* alternative asm macros are only used by the uaccess assembly
routines in arch/arm64/lib/, where they are included indirectly via
asm-uaccess.h. Since they're specific to the uaccess assembly (and will
lose the alternatives in subsequent patches), let's move them into
asm-uaccess.h.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
[will: update #include in mte.S to pull in uao asm macros]
Signed-off-by: Will Deacon <will@kernel.org>
Tidy up the way the top of the kernel VA space is organized, by mirroring
the 256 MB region we have below the vmalloc space, and populating it top
down with the PCI I/O space, some guard regions, and the fixmap region.
The latter region is itself populated top down, and today only covers
about 4 MB, and so 224 MB is ample, and no guard region is therefore
required.
The resulting layout is identical between 48-bit/4k and 52-bit/64k
configurations.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Link: https://lore.kernel.org/r/20201008153602.9467-5-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that we have reverted the introduction of the vmemmap struct page
pointer and the separate physvirt_offset, we can simplify things further,
and place the vmemmap region in the VA space in such a way that virtual
to page translations and vice versa can be implemented using a single
arithmetic shift.
One happy coincidence resulting from this is that the 48-bit/4k and
52-bit/64k configurations (which are assumed to be the two most
prevalent) end up with the same placement of the vmemmap region. In
a subsequent patch, we will take advantage of this, and unify the
memory maps even more.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Link: https://lore.kernel.org/r/20201008153602.9467-4-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
For historical reasons, the arm64 kernel VA space is configured as two
equally sized halves, i.e., on a 48-bit VA build, the VA space is split
into a 47-bit vmalloc region and a 47-bit linear region.
When support for 52-bit virtual addressing was added, this equal split
was kept, resulting in a substantial waste of virtual address space in
the linear region:
48-bit VA 52-bit VA
0xffff_ffff_ffff_ffff +-------------+ +-------------+
| vmalloc | | vmalloc |
0xffff_8000_0000_0000 +-------------+ _PAGE_END(48) +-------------+
| linear | : :
0xffff_0000_0000_0000 +-------------+ : :
: : : :
: : : :
: : : :
: : : currently :
: unusable : : :
: : : unused :
: by : : :
: : : :
: hardware : : :
: : : :
0xfff8_0000_0000_0000 : : _PAGE_END(52) +-------------+
: : | |
: : | |
: : | |
: : | |
: : | |
: unusable : | |
: : | linear |
: by : | |
: : | region |
: hardware : | |
: : | |
: : | |
: : | |
: : | |
: : | |
: : | |
0xfff0_0000_0000_0000 +-------------+ PAGE_OFFSET +-------------+
As illustrated above, the 52-bit VA kernel uses 47 bits for the vmalloc
space (as before), to ensure that a single 64k granule kernel image can
support any 64k granule capable system, regardless of whether it supports
the 52-bit virtual addressing extension. However, due to the fact that
the VA space is still split in equal halves, the linear region is only
2^51 bytes in size, wasting almost half of the 52-bit VA space.
Let's fix this, by abandoning the equal split, and simply assigning all
VA space outside of the vmalloc region to the linear region.
The KASAN shadow region is reconfigured so that it ends at the start of
the vmalloc region, and grows downwards. That way, the arrangement of
the vmalloc space (which contains kernel mappings, modules, BPF region,
the vmemmap array etc) is identical between non-KASAN and KASAN builds,
which aids debugging.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Link: https://lore.kernel.org/r/20201008153602.9467-3-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
KVM/arm64 is so far unable to deal with function pointers, as the compiler
will generate the kernel's runtime VA, and not the linear mapping address,
meaning that kern_hyp_va() will give the wrong result.
We so far have been able to use PC-relative addressing, but that's not
always easy to use, and prevents the implementation of things like
the mapping of an index to a pointer.
To allow this, provide a new helper that computes the required
translation from the kernel image to the HYP VA space.
Signed-off-by: Marc Zyngier <maz@kernel.org>