2012-03-05 19:49:26 +08:00
|
|
|
/*
|
|
|
|
* Based on arch/arm/kernel/asm-offsets.c
|
|
|
|
*
|
|
|
|
* Copyright (C) 1995-2003 Russell King
|
|
|
|
* 2001-2002 Keith Owens
|
|
|
|
* Copyright (C) 2012 ARM Ltd.
|
|
|
|
*
|
|
|
|
* This program is free software; you can redistribute it and/or modify
|
|
|
|
* it under the terms of the GNU General Public License version 2 as
|
|
|
|
* published by the Free Software Foundation.
|
|
|
|
*
|
|
|
|
* This program is distributed in the hope that it will be useful,
|
|
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
* GNU General Public License for more details.
|
|
|
|
*
|
|
|
|
* You should have received a copy of the GNU General Public License
|
|
|
|
* along with this program. If not, see <http://www.gnu.org/licenses/>.
|
|
|
|
*/
|
|
|
|
|
arm64: kernel: Add arch-specific SDEI entry code and CPU masking
The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement RAS notifications.
Such notifications enter the kernel at the registered entry-point
with the register values of the interrupted CPU context. Because this
is not a CPU exception, it cannot reuse the existing entry code.
(crucially we don't implicitly know which exception level we interrupted),
Add the entry point to entry.S to set us up for calling into C code. If
the event interrupted code that had interrupts masked, we always return
to that location. Otherwise we pretend this was an IRQ, and use SDEI's
complete_and_resume call to return to vbar_el1 + offset.
This allows the kernel to deliver signals to user space processes. For
KVM this triggers the world switch, a quick spin round vcpu_run, then
back into the guest, unless there are pending signals.
Add sdei_mask_local_cpu() calls to the smp_send_stop() code, this covers
the panic() code-path, which doesn't invoke cpuhotplug notifiers.
Because we can interrupt entry-from/exit-to another EL, we can't trust the
value in sp_el0 or x29, even if we interrupted the kernel, in this case
the code in entry.S will save/restore sp_el0 and use the value in
__entry_task.
When we have VMAP stacks we can interrupt the stack-overflow test, which
stirs x0 into sp, meaning we have to have our own VMAP stacks. For now
these are allocated when we probe the interface. Future patches will add
refcounting hooks to allow the arch code to allocate them lazily.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-08 23:38:12 +08:00
|
|
|
#include <linux/arm_sdei.h>
|
2012-03-05 19:49:26 +08:00
|
|
|
#include <linux/sched.h>
|
|
|
|
#include <linux/mm.h>
|
|
|
|
#include <linux/dma-mapping.h>
|
2013-07-04 20:34:32 +08:00
|
|
|
#include <linux/kvm_host.h>
|
2018-03-29 21:13:23 +08:00
|
|
|
#include <linux/preempt.h>
|
2016-04-28 00:47:12 +08:00
|
|
|
#include <linux/suspend.h>
|
2016-09-09 21:07:16 +08:00
|
|
|
#include <asm/cpufeature.h>
|
2017-11-14 22:14:17 +08:00
|
|
|
#include <asm/fixmap.h>
|
2012-03-05 19:49:26 +08:00
|
|
|
#include <asm/thread_info.h>
|
|
|
|
#include <asm/memory.h>
|
arm64: kernel: cpu_{suspend/resume} implementation
Kernel subsystems like CPU idle and suspend to RAM require a generic
mechanism to suspend a processor, save its context and put it into
a quiescent state. The cpu_{suspend}/{resume} implementation provides
such a framework through a kernel interface allowing to save/restore
registers, flush the context to DRAM and suspend/resume to/from
low-power states where processor context may be lost.
The CPU suspend implementation relies on the suspend protocol registered
in CPU operations to carry out a suspend request after context is
saved and flushed to DRAM. The cpu_suspend interface:
int cpu_suspend(unsigned long arg);
allows to pass an opaque parameter that is handed over to the suspend CPU
operations back-end so that it can take action according to the
semantics attached to it. The arg parameter allows suspend to RAM and CPU
idle drivers to communicate to suspend protocol back-ends; it requires
standardization so that the interface can be reused seamlessly across
systems, paving the way for generic drivers.
Context memory is allocated on the stack, whose address is stashed in a
per-cpu variable to keep track of it and passed to core functions that
save/restore the registers required by the architecture.
Even though, upon successful execution, the cpu_suspend function shuts
down the suspending processor, the warm boot resume mechanism, based
on the cpu_resume function, makes the resume path operate as a
cpu_suspend function return, so that cpu_suspend can be treated as a C
function by the caller, which simplifies coding the PM drivers that rely
on the cpu_suspend API.
Upon context save, the minimal amount of memory is flushed to DRAM so
that it can be retrieved when the MMU is off and caches are not searched.
The suspend CPU operation, depending on the required operations (eg CPU vs
Cluster shutdown) is in charge of flushing the cache hierarchy either
implicitly (by calling firmware implementations like PSCI) or explicitly
by executing the required cache maintainance functions.
Debug exceptions are disabled during cpu_{suspend}/{resume} operations
so that debug registers can be saved and restored properly preventing
preemption from debug agents enabled in the kernel.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2013-07-22 19:22:13 +08:00
|
|
|
#include <asm/smp_plat.h>
|
|
|
|
#include <asm/suspend.h>
|
2012-03-05 19:49:26 +08:00
|
|
|
#include <asm/vdso_datapage.h>
|
|
|
|
#include <linux/kbuild.h>
|
2016-01-04 22:44:32 +08:00
|
|
|
#include <linux/arm-smccc.h>
|
2012-03-05 19:49:26 +08:00
|
|
|
|
|
|
|
int main(void)
|
|
|
|
{
|
|
|
|
DEFINE(TSK_ACTIVE_MM, offsetof(struct task_struct, active_mm));
|
|
|
|
BLANK();
|
arm64: split thread_info from task stack
This patch moves arm64's struct thread_info from the task stack into
task_struct. This protects thread_info from corruption in the case of
stack overflows, and makes its address harder to determine if stack
addresses are leaked, making a number of attacks more difficult. Precise
detection and handling of overflow is left for subsequent patches.
Largely, this involves changing code to store the task_struct in sp_el0,
and acquire the thread_info from the task struct. Core code now
implements current_thread_info(), and as noted in <linux/sched.h> this
relies on offsetof(task_struct, thread_info) == 0, enforced by core
code.
This change means that the 'tsk' register used in entry.S now points to
a task_struct, rather than a thread_info as it used to. To make this
clear, the TI_* field offsets are renamed to TSK_TI_*, with asm-offsets
appropriately updated to account for the structural change.
Userspace clobbers sp_el0, and we can no longer restore this from the
stack. Instead, the current task is cached in a per-cpu variable that we
can safely access from early assembly as interrupts are disabled (and we
are thus not preemptible).
Both secondary entry and idle are updated to stash the sp and task
pointer separately.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-11-04 04:23:13 +08:00
|
|
|
DEFINE(TSK_TI_FLAGS, offsetof(struct task_struct, thread_info.flags));
|
|
|
|
DEFINE(TSK_TI_PREEMPT, offsetof(struct task_struct, thread_info.preempt_count));
|
|
|
|
DEFINE(TSK_TI_ADDR_LIMIT, offsetof(struct task_struct, thread_info.addr_limit));
|
2016-07-01 23:53:00 +08:00
|
|
|
#ifdef CONFIG_ARM64_SW_TTBR0_PAN
|
|
|
|
DEFINE(TSK_TI_TTBR0, offsetof(struct task_struct, thread_info.ttbr0));
|
|
|
|
#endif
|
arm64: split thread_info from task stack
This patch moves arm64's struct thread_info from the task stack into
task_struct. This protects thread_info from corruption in the case of
stack overflows, and makes its address harder to determine if stack
addresses are leaked, making a number of attacks more difficult. Precise
detection and handling of overflow is left for subsequent patches.
Largely, this involves changing code to store the task_struct in sp_el0,
and acquire the thread_info from the task struct. Core code now
implements current_thread_info(), and as noted in <linux/sched.h> this
relies on offsetof(task_struct, thread_info) == 0, enforced by core
code.
This change means that the 'tsk' register used in entry.S now points to
a task_struct, rather than a thread_info as it used to. To make this
clear, the TI_* field offsets are renamed to TSK_TI_*, with asm-offsets
appropriately updated to account for the structural change.
Userspace clobbers sp_el0, and we can no longer restore this from the
stack. Instead, the current task is cached in a per-cpu variable that we
can safely access from early assembly as interrupts are disabled (and we
are thus not preemptible).
Both secondary entry and idle are updated to stash the sp and task
pointer separately.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-11-04 04:23:13 +08:00
|
|
|
DEFINE(TSK_STACK, offsetof(struct task_struct, stack));
|
2018-12-12 20:08:44 +08:00
|
|
|
#ifdef CONFIG_STACKPROTECTOR
|
|
|
|
DEFINE(TSK_STACK_CANARY, offsetof(struct task_struct, stack_canary));
|
|
|
|
#endif
|
2012-03-05 19:49:26 +08:00
|
|
|
BLANK();
|
|
|
|
DEFINE(THREAD_CPU_CONTEXT, offsetof(struct task_struct, thread.cpu_context));
|
|
|
|
BLANK();
|
|
|
|
DEFINE(S_X0, offsetof(struct pt_regs, regs[0]));
|
|
|
|
DEFINE(S_X2, offsetof(struct pt_regs, regs[2]));
|
|
|
|
DEFINE(S_X4, offsetof(struct pt_regs, regs[4]));
|
|
|
|
DEFINE(S_X6, offsetof(struct pt_regs, regs[6]));
|
2016-07-09 00:35:52 +08:00
|
|
|
DEFINE(S_X8, offsetof(struct pt_regs, regs[8]));
|
|
|
|
DEFINE(S_X10, offsetof(struct pt_regs, regs[10]));
|
|
|
|
DEFINE(S_X12, offsetof(struct pt_regs, regs[12]));
|
|
|
|
DEFINE(S_X14, offsetof(struct pt_regs, regs[14]));
|
|
|
|
DEFINE(S_X16, offsetof(struct pt_regs, regs[16]));
|
|
|
|
DEFINE(S_X18, offsetof(struct pt_regs, regs[18]));
|
|
|
|
DEFINE(S_X20, offsetof(struct pt_regs, regs[20]));
|
|
|
|
DEFINE(S_X22, offsetof(struct pt_regs, regs[22]));
|
|
|
|
DEFINE(S_X24, offsetof(struct pt_regs, regs[24]));
|
|
|
|
DEFINE(S_X26, offsetof(struct pt_regs, regs[26]));
|
|
|
|
DEFINE(S_X28, offsetof(struct pt_regs, regs[28]));
|
2012-03-05 19:49:26 +08:00
|
|
|
DEFINE(S_LR, offsetof(struct pt_regs, regs[30]));
|
|
|
|
DEFINE(S_SP, offsetof(struct pt_regs, sp));
|
|
|
|
DEFINE(S_PSTATE, offsetof(struct pt_regs, pstate));
|
|
|
|
DEFINE(S_PC, offsetof(struct pt_regs, pc));
|
|
|
|
DEFINE(S_SYSCALLNO, offsetof(struct pt_regs, syscallno));
|
2016-06-21 01:28:01 +08:00
|
|
|
DEFINE(S_ORIG_ADDR_LIMIT, offsetof(struct pt_regs, orig_addr_limit));
|
2019-01-31 22:58:46 +08:00
|
|
|
DEFINE(S_PMR_SAVE, offsetof(struct pt_regs, pmr_save));
|
arm64: unwind: reference pt_regs via embedded stack frame
As it turns out, the unwind code is slightly broken, and probably has
been for a while. The problem is in the dumping of the exception stack,
which is intended to dump the contents of the pt_regs struct at each
level in the call stack where an exception was taken and routed to a
routine marked as __exception (which means its stack frame is right
below the pt_regs struct on the stack).
'Right below the pt_regs struct' is ill defined, though: the unwind
code assigns 'frame pointer + 0x10' to the .sp member of the stackframe
struct at each level, and dump_backtrace() happily dereferences that as
the pt_regs pointer when encountering an __exception routine. However,
the actual size of the stack frame created by this routine (which could
be one of many __exception routines we have in the kernel) is not known,
and so frame.sp is pretty useless to figure out where struct pt_regs
really is.
So it seems the only way to ensure that we can find our struct pt_regs
when walking the stack frames is to put it at a known fixed offset of
the stack frame pointer that is passed to such __exception routines.
The simplest way to do that is to put it inside pt_regs itself, which is
the main change implemented by this patch. As a bonus, doing this allows
us to get rid of a fair amount of cruft related to walking from one stack
to the other, which is especially nice since we intend to introduce yet
another stack for overflow handling once we add support for vmapped
stacks. It also fixes an inconsistency where we only add a stack frame
pointing to ELR_EL1 if we are executing from the IRQ stack but not when
we are executing from the task stack.
To consistly identify exceptions regs even in the presence of exceptions
taken from entry code, we must check whether the next frame was created
by entry text, rather than whether the current frame was crated by
exception text.
To avoid backtracing using PCs that fall in the idmap, or are controlled
by userspace, we must explcitly zero the FP and LR in startup paths, and
must ensure that the frame embedded in pt_regs is zeroed upon entry from
EL0. To avoid these NULL entries showin in the backtrace, unwind_frame()
is updated to avoid them.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[Mark: compare current frame against .entry.text, avoid bogus PCs]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
2017-07-23 01:45:33 +08:00
|
|
|
DEFINE(S_STACKFRAME, offsetof(struct pt_regs, stackframe));
|
2012-03-05 19:49:26 +08:00
|
|
|
DEFINE(S_FRAME_SIZE, sizeof(struct pt_regs));
|
|
|
|
BLANK();
|
2015-10-07 01:46:24 +08:00
|
|
|
DEFINE(MM_CONTEXT_ID, offsetof(struct mm_struct, context.id.counter));
|
2012-03-05 19:49:26 +08:00
|
|
|
BLANK();
|
|
|
|
DEFINE(VMA_VM_MM, offsetof(struct vm_area_struct, vm_mm));
|
|
|
|
DEFINE(VMA_VM_FLAGS, offsetof(struct vm_area_struct, vm_flags));
|
|
|
|
BLANK();
|
|
|
|
DEFINE(VM_EXEC, VM_EXEC);
|
|
|
|
BLANK();
|
|
|
|
DEFINE(PAGE_SZ, PAGE_SIZE);
|
|
|
|
BLANK();
|
|
|
|
DEFINE(DMA_TO_DEVICE, DMA_TO_DEVICE);
|
|
|
|
DEFINE(DMA_FROM_DEVICE, DMA_FROM_DEVICE);
|
|
|
|
BLANK();
|
2018-03-29 21:13:23 +08:00
|
|
|
DEFINE(PREEMPT_DISABLE_OFFSET, PREEMPT_DISABLE_OFFSET);
|
|
|
|
BLANK();
|
2012-03-05 19:49:26 +08:00
|
|
|
DEFINE(CLOCK_REALTIME, CLOCK_REALTIME);
|
|
|
|
DEFINE(CLOCK_MONOTONIC, CLOCK_MONOTONIC);
|
arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO
So far the arm64 clock_gettime() vDSO implementation only supported
the following clocks, falling back to the syscall for the others:
- CLOCK_REALTIME{,_COARSE}
- CLOCK_MONOTONIC{,_COARSE}
This patch adds support for the CLOCK_MONOTONIC_RAW clock, taking
advantage of the recent refactoring of the vDSO time functions. Like
the non-_COARSE clocks, this only works when the "arch_sys_counter"
clocksource is in use (allowing us to read the current time from the
virtual counter register), otherwise we also have to fall back to the
syscall.
Most of the data is shared with CLOCK_MONOTONIC, and the algorithm is
similar. The reference implementation in kernel/time/timekeeping.c
shows that:
- CLOCK_MONOTONIC = tk->wall_to_monotonic + tk->xtime_sec +
timekeeping_get_ns(&tk->tkr_mono)
- CLOCK_MONOTONIC_RAW = tk->raw_time + timekeeping_get_ns(&tk->tkr_raw)
- tkr_mono and tkr_raw are identical (in particular, same
clocksource), except these members:
* mult (only mono's multiplier is NTP-adjusted)
* xtime_nsec (always 0 for raw)
Therefore, tk->raw_time and tkr_raw->mult are now also stored in the
vDSO data page.
Cc: Ali Saidi <ali.saidi@arm.com>
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-12 18:24:00 +08:00
|
|
|
DEFINE(CLOCK_MONOTONIC_RAW, CLOCK_MONOTONIC_RAW);
|
2012-03-05 19:49:26 +08:00
|
|
|
DEFINE(CLOCK_REALTIME_RES, MONOTONIC_RES_NSEC);
|
|
|
|
DEFINE(CLOCK_REALTIME_COARSE, CLOCK_REALTIME_COARSE);
|
|
|
|
DEFINE(CLOCK_MONOTONIC_COARSE,CLOCK_MONOTONIC_COARSE);
|
|
|
|
DEFINE(CLOCK_COARSE_RES, LOW_RES_NSEC);
|
|
|
|
DEFINE(NSEC_PER_SEC, NSEC_PER_SEC);
|
|
|
|
BLANK();
|
|
|
|
DEFINE(VDSO_CS_CYCLE_LAST, offsetof(struct vdso_data, cs_cycle_last));
|
arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO
So far the arm64 clock_gettime() vDSO implementation only supported
the following clocks, falling back to the syscall for the others:
- CLOCK_REALTIME{,_COARSE}
- CLOCK_MONOTONIC{,_COARSE}
This patch adds support for the CLOCK_MONOTONIC_RAW clock, taking
advantage of the recent refactoring of the vDSO time functions. Like
the non-_COARSE clocks, this only works when the "arch_sys_counter"
clocksource is in use (allowing us to read the current time from the
virtual counter register), otherwise we also have to fall back to the
syscall.
Most of the data is shared with CLOCK_MONOTONIC, and the algorithm is
similar. The reference implementation in kernel/time/timekeeping.c
shows that:
- CLOCK_MONOTONIC = tk->wall_to_monotonic + tk->xtime_sec +
timekeeping_get_ns(&tk->tkr_mono)
- CLOCK_MONOTONIC_RAW = tk->raw_time + timekeeping_get_ns(&tk->tkr_raw)
- tkr_mono and tkr_raw are identical (in particular, same
clocksource), except these members:
* mult (only mono's multiplier is NTP-adjusted)
* xtime_nsec (always 0 for raw)
Therefore, tk->raw_time and tkr_raw->mult are now also stored in the
vDSO data page.
Cc: Ali Saidi <ali.saidi@arm.com>
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-12 18:24:00 +08:00
|
|
|
DEFINE(VDSO_RAW_TIME_SEC, offsetof(struct vdso_data, raw_time_sec));
|
2012-03-05 19:49:26 +08:00
|
|
|
DEFINE(VDSO_XTIME_CLK_SEC, offsetof(struct vdso_data, xtime_clock_sec));
|
|
|
|
DEFINE(VDSO_XTIME_CRS_SEC, offsetof(struct vdso_data, xtime_coarse_sec));
|
|
|
|
DEFINE(VDSO_XTIME_CRS_NSEC, offsetof(struct vdso_data, xtime_coarse_nsec));
|
|
|
|
DEFINE(VDSO_WTM_CLK_SEC, offsetof(struct vdso_data, wtm_clock_sec));
|
|
|
|
DEFINE(VDSO_TB_SEQ_COUNT, offsetof(struct vdso_data, tb_seq_count));
|
arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO
So far the arm64 clock_gettime() vDSO implementation only supported
the following clocks, falling back to the syscall for the others:
- CLOCK_REALTIME{,_COARSE}
- CLOCK_MONOTONIC{,_COARSE}
This patch adds support for the CLOCK_MONOTONIC_RAW clock, taking
advantage of the recent refactoring of the vDSO time functions. Like
the non-_COARSE clocks, this only works when the "arch_sys_counter"
clocksource is in use (allowing us to read the current time from the
virtual counter register), otherwise we also have to fall back to the
syscall.
Most of the data is shared with CLOCK_MONOTONIC, and the algorithm is
similar. The reference implementation in kernel/time/timekeeping.c
shows that:
- CLOCK_MONOTONIC = tk->wall_to_monotonic + tk->xtime_sec +
timekeeping_get_ns(&tk->tkr_mono)
- CLOCK_MONOTONIC_RAW = tk->raw_time + timekeeping_get_ns(&tk->tkr_raw)
- tkr_mono and tkr_raw are identical (in particular, same
clocksource), except these members:
* mult (only mono's multiplier is NTP-adjusted)
* xtime_nsec (always 0 for raw)
Therefore, tk->raw_time and tkr_raw->mult are now also stored in the
vDSO data page.
Cc: Ali Saidi <ali.saidi@arm.com>
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-12 18:24:00 +08:00
|
|
|
DEFINE(VDSO_CS_MONO_MULT, offsetof(struct vdso_data, cs_mono_mult));
|
2012-03-05 19:49:26 +08:00
|
|
|
DEFINE(VDSO_CS_SHIFT, offsetof(struct vdso_data, cs_shift));
|
|
|
|
DEFINE(VDSO_TZ_MINWEST, offsetof(struct vdso_data, tz_minuteswest));
|
|
|
|
DEFINE(VDSO_USE_SYSCALL, offsetof(struct vdso_data, use_syscall));
|
|
|
|
BLANK();
|
|
|
|
DEFINE(TVAL_TV_SEC, offsetof(struct timeval, tv_sec));
|
|
|
|
DEFINE(TSPEC_TV_SEC, offsetof(struct timespec, tv_sec));
|
|
|
|
BLANK();
|
|
|
|
DEFINE(TZ_MINWEST, offsetof(struct timezone, tz_minuteswest));
|
|
|
|
DEFINE(TZ_DSTTIME, offsetof(struct timezone, tz_dsttime));
|
2012-12-11 00:40:18 +08:00
|
|
|
BLANK();
|
2016-02-23 18:31:42 +08:00
|
|
|
DEFINE(CPU_BOOT_STACK, offsetof(struct secondary_data, stack));
|
arm64: split thread_info from task stack
This patch moves arm64's struct thread_info from the task stack into
task_struct. This protects thread_info from corruption in the case of
stack overflows, and makes its address harder to determine if stack
addresses are leaked, making a number of attacks more difficult. Precise
detection and handling of overflow is left for subsequent patches.
Largely, this involves changing code to store the task_struct in sp_el0,
and acquire the thread_info from the task struct. Core code now
implements current_thread_info(), and as noted in <linux/sched.h> this
relies on offsetof(task_struct, thread_info) == 0, enforced by core
code.
This change means that the 'tsk' register used in entry.S now points to
a task_struct, rather than a thread_info as it used to. To make this
clear, the TI_* field offsets are renamed to TSK_TI_*, with asm-offsets
appropriately updated to account for the structural change.
Userspace clobbers sp_el0, and we can no longer restore this from the
stack. Instead, the current task is cached in a per-cpu variable that we
can safely access from early assembly as interrupts are disabled (and we
are thus not preemptible).
Both secondary entry and idle are updated to stash the sp and task
pointer separately.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-11-04 04:23:13 +08:00
|
|
|
DEFINE(CPU_BOOT_TASK, offsetof(struct secondary_data, task));
|
2016-02-23 18:31:42 +08:00
|
|
|
BLANK();
|
2012-12-11 00:40:18 +08:00
|
|
|
#ifdef CONFIG_KVM_ARM_HOST
|
|
|
|
DEFINE(VCPU_CONTEXT, offsetof(struct kvm_vcpu, arch.ctxt));
|
KVM: arm64: Handle RAS SErrors from EL2 on guest exit
We expect to have firmware-first handling of RAS SErrors, with errors
notified via an APEI method. For systems without firmware-first, add
some minimal handling to KVM.
There are two ways KVM can take an SError due to a guest, either may be a
RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO,
or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit.
The current SError from EL2 code unmasks SError and tries to fence any
pending SError into a single instruction window. It then leaves SError
unmasked.
With the v8.2 RAS Extensions we may take an SError for a 'corrected'
error, but KVM is only able to handle SError from EL2 if they occur
during this single instruction window...
The RAS Extensions give us a new instruction to synchronise and
consume SErrors. The RAS Extensions document (ARM DDI0587),
'2.4.1 ESB and Unrecoverable errors' describes ESB as synchronising
SError interrupts generated by 'instructions, translation table walks,
hardware updates to the translation tables, and instruction fetches on
the same PE'. This makes ESB equivalent to KVMs existing
'dsb, mrs-daifclr, isb' sequence.
Use the alternatives to synchronise and consume any SError using ESB
instead of unmasking and taking the SError. Set ARM_EXIT_WITH_SERROR_BIT
in the exit_code so that we can restart the vcpu if it turns out this
SError has no impact on the vcpu.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-16 03:39:05 +08:00
|
|
|
DEFINE(VCPU_FAULT_DISR, offsetof(struct kvm_vcpu, arch.fault.disr_el1));
|
2018-05-29 20:11:17 +08:00
|
|
|
DEFINE(VCPU_WORKAROUND_FLAGS, offsetof(struct kvm_vcpu, arch.workaround_flags));
|
KVM: arm/arm64: Context-switch ptrauth registers
When pointer authentication is supported, a guest may wish to use it.
This patch adds the necessary KVM infrastructure for this to work, with
a semi-lazy context switch of the pointer auth state.
Pointer authentication feature is only enabled when VHE is built
in the kernel and present in the CPU implementation so only VHE code
paths are modified.
When we schedule a vcpu, we disable guest usage of pointer
authentication instructions and accesses to the keys. While these are
disabled, we avoid context-switching the keys. When we trap the guest
trying to use pointer authentication functionality, we change to eagerly
context-switching the keys, and enable the feature. The next time the
vcpu is scheduled out/in, we start again. However the host key save is
optimized and implemented inside ptrauth instruction/register access
trap.
Pointer authentication consists of address authentication and generic
authentication, and CPUs in a system might have varied support for
either. Where support for either feature is not uniform, it is hidden
from guests via ID register emulation, as a result of the cpufeature
framework in the host.
Unfortunately, address authentication and generic authentication cannot
be trapped separately, as the architecture provides a single EL2 trap
covering both. If we wish to expose one without the other, we cannot
prevent a (badly-written) guest from intermittently using a feature
which is not uniformly supported (when scheduled on a physical CPU which
supports the relevant feature). Hence, this patch expects both type of
authentication to be present in a cpu.
This switch of key is done from guest enter/exit assembly as preparation
for the upcoming in-kernel pointer authentication support. Hence, these
key switching routines are not implemented in C code as they may cause
pointer authentication key signing error in some situations.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
[Only VHE, key switch in full assembly, vcpu_has_ptrauth checks
, save host key in ptrauth exception trap]
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
[maz: various fixups]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-23 12:42:35 +08:00
|
|
|
DEFINE(VCPU_HCR_EL2, offsetof(struct kvm_vcpu, arch.hcr_el2));
|
2012-12-11 00:40:18 +08:00
|
|
|
DEFINE(CPU_GP_REGS, offsetof(struct kvm_cpu_context, gp_regs));
|
KVM: arm/arm64: Context-switch ptrauth registers
When pointer authentication is supported, a guest may wish to use it.
This patch adds the necessary KVM infrastructure for this to work, with
a semi-lazy context switch of the pointer auth state.
Pointer authentication feature is only enabled when VHE is built
in the kernel and present in the CPU implementation so only VHE code
paths are modified.
When we schedule a vcpu, we disable guest usage of pointer
authentication instructions and accesses to the keys. While these are
disabled, we avoid context-switching the keys. When we trap the guest
trying to use pointer authentication functionality, we change to eagerly
context-switching the keys, and enable the feature. The next time the
vcpu is scheduled out/in, we start again. However the host key save is
optimized and implemented inside ptrauth instruction/register access
trap.
Pointer authentication consists of address authentication and generic
authentication, and CPUs in a system might have varied support for
either. Where support for either feature is not uniform, it is hidden
from guests via ID register emulation, as a result of the cpufeature
framework in the host.
Unfortunately, address authentication and generic authentication cannot
be trapped separately, as the architecture provides a single EL2 trap
covering both. If we wish to expose one without the other, we cannot
prevent a (badly-written) guest from intermittently using a feature
which is not uniformly supported (when scheduled on a physical CPU which
supports the relevant feature). Hence, this patch expects both type of
authentication to be present in a cpu.
This switch of key is done from guest enter/exit assembly as preparation
for the upcoming in-kernel pointer authentication support. Hence, these
key switching routines are not implemented in C code as they may cause
pointer authentication key signing error in some situations.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
[Only VHE, key switch in full assembly, vcpu_has_ptrauth checks
, save host key in ptrauth exception trap]
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
[maz: various fixups]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-23 12:42:35 +08:00
|
|
|
DEFINE(CPU_APIAKEYLO_EL1, offsetof(struct kvm_cpu_context, sys_regs[APIAKEYLO_EL1]));
|
|
|
|
DEFINE(CPU_APIBKEYLO_EL1, offsetof(struct kvm_cpu_context, sys_regs[APIBKEYLO_EL1]));
|
|
|
|
DEFINE(CPU_APDAKEYLO_EL1, offsetof(struct kvm_cpu_context, sys_regs[APDAKEYLO_EL1]));
|
|
|
|
DEFINE(CPU_APDBKEYLO_EL1, offsetof(struct kvm_cpu_context, sys_regs[APDBKEYLO_EL1]));
|
|
|
|
DEFINE(CPU_APGAKEYLO_EL1, offsetof(struct kvm_cpu_context, sys_regs[APGAKEYLO_EL1]));
|
2012-12-11 00:40:18 +08:00
|
|
|
DEFINE(CPU_USER_PT_REGS, offsetof(struct kvm_regs, regs));
|
2017-10-08 23:01:56 +08:00
|
|
|
DEFINE(HOST_CONTEXT_VCPU, offsetof(struct kvm_cpu_context, __hyp_running_vcpu));
|
arm64: kernel: cpu_{suspend/resume} implementation
Kernel subsystems like CPU idle and suspend to RAM require a generic
mechanism to suspend a processor, save its context and put it into
a quiescent state. The cpu_{suspend}/{resume} implementation provides
such a framework through a kernel interface allowing to save/restore
registers, flush the context to DRAM and suspend/resume to/from
low-power states where processor context may be lost.
The CPU suspend implementation relies on the suspend protocol registered
in CPU operations to carry out a suspend request after context is
saved and flushed to DRAM. The cpu_suspend interface:
int cpu_suspend(unsigned long arg);
allows to pass an opaque parameter that is handed over to the suspend CPU
operations back-end so that it can take action according to the
semantics attached to it. The arg parameter allows suspend to RAM and CPU
idle drivers to communicate to suspend protocol back-ends; it requires
standardization so that the interface can be reused seamlessly across
systems, paving the way for generic drivers.
Context memory is allocated on the stack, whose address is stashed in a
per-cpu variable to keep track of it and passed to core functions that
save/restore the registers required by the architecture.
Even though, upon successful execution, the cpu_suspend function shuts
down the suspending processor, the warm boot resume mechanism, based
on the cpu_resume function, makes the resume path operate as a
cpu_suspend function return, so that cpu_suspend can be treated as a C
function by the caller, which simplifies coding the PM drivers that rely
on the cpu_suspend API.
Upon context save, the minimal amount of memory is flushed to DRAM so
that it can be retrieved when the MMU is off and caches are not searched.
The suspend CPU operation, depending on the required operations (eg CPU vs
Cluster shutdown) is in charge of flushing the cache hierarchy either
implicitly (by calling firmware implementations like PSCI) or explicitly
by executing the required cache maintainance functions.
Debug exceptions are disabled during cpu_{suspend}/{resume} operations
so that debug registers can be saved and restored properly preventing
preemption from debug agents enabled in the kernel.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2013-07-22 19:22:13 +08:00
|
|
|
#endif
|
2015-01-27 02:33:44 +08:00
|
|
|
#ifdef CONFIG_CPU_PM
|
arm64: kernel: cpu_{suspend/resume} implementation
Kernel subsystems like CPU idle and suspend to RAM require a generic
mechanism to suspend a processor, save its context and put it into
a quiescent state. The cpu_{suspend}/{resume} implementation provides
such a framework through a kernel interface allowing to save/restore
registers, flush the context to DRAM and suspend/resume to/from
low-power states where processor context may be lost.
The CPU suspend implementation relies on the suspend protocol registered
in CPU operations to carry out a suspend request after context is
saved and flushed to DRAM. The cpu_suspend interface:
int cpu_suspend(unsigned long arg);
allows to pass an opaque parameter that is handed over to the suspend CPU
operations back-end so that it can take action according to the
semantics attached to it. The arg parameter allows suspend to RAM and CPU
idle drivers to communicate to suspend protocol back-ends; it requires
standardization so that the interface can be reused seamlessly across
systems, paving the way for generic drivers.
Context memory is allocated on the stack, whose address is stashed in a
per-cpu variable to keep track of it and passed to core functions that
save/restore the registers required by the architecture.
Even though, upon successful execution, the cpu_suspend function shuts
down the suspending processor, the warm boot resume mechanism, based
on the cpu_resume function, makes the resume path operate as a
cpu_suspend function return, so that cpu_suspend can be treated as a C
function by the caller, which simplifies coding the PM drivers that rely
on the cpu_suspend API.
Upon context save, the minimal amount of memory is flushed to DRAM so
that it can be retrieved when the MMU is off and caches are not searched.
The suspend CPU operation, depending on the required operations (eg CPU vs
Cluster shutdown) is in charge of flushing the cache hierarchy either
implicitly (by calling firmware implementations like PSCI) or explicitly
by executing the required cache maintainance functions.
Debug exceptions are disabled during cpu_{suspend}/{resume} operations
so that debug registers can be saved and restored properly preventing
preemption from debug agents enabled in the kernel.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2013-07-22 19:22:13 +08:00
|
|
|
DEFINE(CPU_CTX_SP, offsetof(struct cpu_suspend_ctx, sp));
|
|
|
|
DEFINE(MPIDR_HASH_MASK, offsetof(struct mpidr_hash, mask));
|
|
|
|
DEFINE(MPIDR_HASH_SHIFTS, offsetof(struct mpidr_hash, shift_aff));
|
2016-04-28 00:47:06 +08:00
|
|
|
DEFINE(SLEEP_STACK_DATA_SYSTEM_REGS, offsetof(struct sleep_stack_data, system_regs));
|
|
|
|
DEFINE(SLEEP_STACK_DATA_CALLEE_REGS, offsetof(struct sleep_stack_data, callee_saved_regs));
|
2012-12-11 00:40:18 +08:00
|
|
|
#endif
|
2017-02-02 01:28:27 +08:00
|
|
|
DEFINE(ARM_SMCCC_RES_X0_OFFS, offsetof(struct arm_smccc_res, a0));
|
|
|
|
DEFINE(ARM_SMCCC_RES_X2_OFFS, offsetof(struct arm_smccc_res, a2));
|
|
|
|
DEFINE(ARM_SMCCC_QUIRK_ID_OFFS, offsetof(struct arm_smccc_quirk, id));
|
|
|
|
DEFINE(ARM_SMCCC_QUIRK_STATE_OFFS, offsetof(struct arm_smccc_quirk, state));
|
2016-04-28 00:47:12 +08:00
|
|
|
BLANK();
|
|
|
|
DEFINE(HIBERN_PBE_ORIG, offsetof(struct pbe, orig_address));
|
|
|
|
DEFINE(HIBERN_PBE_ADDR, offsetof(struct pbe, address));
|
|
|
|
DEFINE(HIBERN_PBE_NEXT, offsetof(struct pbe, next));
|
2016-09-09 21:07:16 +08:00
|
|
|
DEFINE(ARM64_FTR_SYSVAL, offsetof(struct arm64_ftr_reg, sys_val));
|
2017-11-14 22:14:17 +08:00
|
|
|
BLANK();
|
|
|
|
#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
|
|
|
|
DEFINE(TRAMP_VALIAS, TRAMP_VALIAS);
|
arm64: kernel: Add arch-specific SDEI entry code and CPU masking
The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement RAS notifications.
Such notifications enter the kernel at the registered entry-point
with the register values of the interrupted CPU context. Because this
is not a CPU exception, it cannot reuse the existing entry code.
(crucially we don't implicitly know which exception level we interrupted),
Add the entry point to entry.S to set us up for calling into C code. If
the event interrupted code that had interrupts masked, we always return
to that location. Otherwise we pretend this was an IRQ, and use SDEI's
complete_and_resume call to return to vbar_el1 + offset.
This allows the kernel to deliver signals to user space processes. For
KVM this triggers the world switch, a quick spin round vcpu_run, then
back into the guest, unless there are pending signals.
Add sdei_mask_local_cpu() calls to the smp_send_stop() code, this covers
the panic() code-path, which doesn't invoke cpuhotplug notifiers.
Because we can interrupt entry-from/exit-to another EL, we can't trust the
value in sp_el0 or x29, even if we interrupted the kernel, in this case
the code in entry.S will save/restore sp_el0 and use the value in
__entry_task.
When we have VMAP stacks we can interrupt the stack-overflow test, which
stirs x0 into sp, meaning we have to have our own VMAP stacks. For now
these are allocated when we probe the interface. Future patches will add
refcounting hooks to allow the arch code to allocate them lazily.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-08 23:38:12 +08:00
|
|
|
#endif
|
|
|
|
#ifdef CONFIG_ARM_SDE_INTERFACE
|
|
|
|
DEFINE(SDEI_EVENT_INTREGS, offsetof(struct sdei_registered_event, interrupted_regs));
|
|
|
|
DEFINE(SDEI_EVENT_PRIORITY, offsetof(struct sdei_registered_event, priority));
|
2017-11-14 22:14:17 +08:00
|
|
|
#endif
|
2012-03-05 19:49:26 +08:00
|
|
|
return 0;
|
|
|
|
}
|