In preparation for enabling the stackleak plugin on arm64,
we need a way to get the bounds of the current stack. Extend
on_accessible_stack to get this information.
Acked-by: Alexander Popov <alex.popov@linux.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
[will: folded in fix for allmodconfig build breakage w/ sdei]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Kconfig reports a warning on x86 builds after the ARM64 dependency
was added.
drivers/acpi/Kconfig:6:error: recursive dependency detected!
drivers/acpi/Kconfig:6: symbol ACPI depends on EFI
This rephrases the dependency to keep the ARM64 details out of the
shared Kconfig file, so Kconfig no longer gets confused by it.
For consistency, all three architectures that support ACPI now
select ARCH_SUPPORTS_ACPI in exactly the configuration in which
they allow it. We still need the 'default x86', as each one
wants a different default: default-y on x86, default-n on arm64,
and always-y on ia64.
Fixes: 5bcd44083a ("drivers: acpi: add dependency of EFI for arm64")
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This is a fix against the issue that crash dump kernel may hang up
during booting, which can happen on any ACPI-based system with "ACPI
Reclaim Memory."
(kernel messages after panic kicked off kdump)
(snip...)
Bye!
(snip...)
ACPI: Core revision 20170728
pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
Internal error: Oops: 96000021 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
task: ffff000008d05180 task.stack: ffff000008cc0000
PC is at acpi_ns_lookup+0x25c/0x3c0
LR is at acpi_ds_load1_begin_op+0xa4/0x294
(snip...)
Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
Call trace:
(snip...)
[<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
[<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
[<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
[<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
[<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
[<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
[<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
[<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
[<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
[<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
[<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
[<ffff000008badc20>] acpi_early_init+0x9c/0xd0
[<ffff000008b70d50>] start_kernel+0x3b4/0x43c
Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
---[ end trace c46ed37f9651c58e ]---
Kernel panic - not syncing: Fatal exception
Rebooting in 10 seconds..
(diagnosis)
* This fault is a data abort, alignment fault (ESR=0x96000021)
during reading out ACPI table.
* Initial ACPI tables are normally stored in system ram and marked as
"ACPI Reclaim memory" by the firmware.
* After the commit f56ab9a5b7 ("efi/arm: Don't mark ACPI reclaim
memory as MEMBLOCK_NOMAP"), those regions are differently handled
as they are "memblock-reserved", without NOMAP bit.
* So they are now excluded from device tree's "usable-memory-range"
which kexec-tools determines based on a current view of /proc/iomem.
* When crash dump kernel boots up, it tries to accesses ACPI tables by
mapping them with ioremap(), not ioremap_cache(), in acpi_os_ioremap()
since they are no longer part of mapped system ram.
* Given that ACPI accessor/helper functions are compiled in without
unaligned access support (ACPI_MISALIGNMENT_NOT_SUPPORTED),
any unaligned access to ACPI tables can cause a fatal panic.
With this patch, acpi_os_ioremap() always honors memory attribute
information provided by the firmware (EFI) and retaining cacheability
allows the kernel safe access to ACPI tables.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reported-by and Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
There has been some confusion around what is necessary to prevent kexec
overwriting important memory regions. memblock: reserve, or nomap?
Only memblock nomap regions are reported via /proc/iomem, kexec's
user-space doesn't know about memblock_reserve()d regions.
Until commit f56ab9a5b7 ("efi/arm: Don't mark ACPI reclaim memory
as MEMBLOCK_NOMAP") the ACPI tables were nomap, now they are reserved
and thus possible for kexec to overwrite with the new kernel or initrd.
But this was always broken, as the UEFI memory map is also reserved
and not marked as nomap.
Exporting both nomap and reserved memblock types is a nuisance as
they live in different memblock structures which we can't walk at
the same time.
Take a second walk over memblock.reserved and add new 'reserved'
subnodes for the memblock_reserved() regions that aren't already
described by the existing code. (e.g. Kernel Code)
We use reserve_region_with_split() to find the gaps in existing named
regions. This handles the gap between 'kernel code' and 'kernel data'
which is memblock_reserve()d, but already partially described by
request_standard_resources(). e.g.:
| 80000000-dfffffff : System RAM
| 80080000-80ffffff : Kernel code
| 81000000-8158ffff : reserved
| 81590000-8237efff : Kernel data
| a0000000-dfffffff : Crash kernel
| e00f0000-f949ffff : System RAM
reserve_region_with_split needs kzalloc() which isn't available when
request_standard_resources() is called, use an initcall.
Reported-by: Bhupesh Sharma <bhsharma@redhat.com>
Reported-by: Tyler Baicar <tbaicar@codeaurora.org>
Suggested-by: Akashi Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: James Morse <james.morse@arm.com>
Fixes: d28f6df130 ("arm64/kexec: Add core kexec support")
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
CC: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Not all toolchains have the baremetal elf targets, RedHat/Fedora ones
in particular. So, probe for whether it's available and use the previous
(linux) targets if it isn't.
Reported-by: Laura Abbott <labbott@redhat.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Paul Kocialkowski <contact@paulk.fr>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
It's possible for userspace to control idx. Sanitize idx when using it
as an array index, to inhibit the potential spectre-v1 write gadget.
Found by smatch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
syscall_trace_{enter,exit} are only called from C code, so drop the
asmlinkage qualifier from their definitions.
Signed-off-by: Will Deacon <will.deacon@arm.com>
To minimize the risk of userspace-controlled values being used under
speculation, this patch adds pt_regs based syscall wrappers for arm64,
which pass the minimum set of required userspace values to syscall
implementations. For each syscall, a wrapper which takes a pt_regs
argument is automatically generated, and this extracts the arguments
before calling the "real" syscall implementation.
Each syscall has three functions generated:
* __do_<compat_>sys_<name> is the "real" syscall implementation, with
the expected prototype.
* __se_<compat_>sys_<name> is the sign-extension/narrowing wrapper,
inherited from common code. This takes a series of long parameters,
casting each to the requisite types required by the "real" syscall
implementation in __do_<compat_>sys_<name>.
This wrapper *may* not be necessary on arm64 given the AAPCS rules on
unused register bits, but it seemed safer to keep the wrapper for now.
* __arm64_<compat_>_sys_<name> takes a struct pt_regs pointer, and
extracts *only* the relevant register values, passing these on to the
__se_<compat_>sys_<name> wrapper.
The syscall invocation code is updated to handle the calling convention
required by __arm64_<compat_>_sys_<name>, and passes a single struct
pt_regs pointer.
The compiler can fold the syscall implementation and its wrappers, such
that the overhead of this approach is minimized.
Note that we play games with sys_ni_syscall(). It can't be defined with
SYSCALL_DEFINE0() because we must avoid the possibility of error
injection. Additionally, there are a couple of locations where we need
to call it from C code, and we don't (currently) have a
ksys_ni_syscall(). While it has no wrapper, passing in a redundant
pt_regs pointer is benign per the AAPCS.
When ARCH_HAS_SYSCALL_WRAPPER is selected, no prototype is defines for
sys_ni_syscall(). Since we need to treat it differently for in-kernel
calls and the syscall tables, the prototype is defined as-required.
The wrappers are largely the same as their x86 counterparts, but
simplified as we don't have a variety of compat calling conventions that
require separate stubs. Unlike x86, we have some zero-argument compat
syscalls, and must define COMPAT_SYSCALL_DEFINE0() to ensure that these
are also given an __arm64_compat_sys_ prefix.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In preparation for converting to pt_regs syscall wrappers, convert our
existing compat wrappers to C. This will allow the pt_regs wrappers to
be automatically generated, and will allow for the compat register
manipulation to be folded in with the pt_regs accesses.
To avoid confusion with the upcoming pt_regs wrappers and existing
compat wrappers provided by core code, the C wrappers are renamed to
compat_sys_aarch32_<syscall>.
With the assembly wrappers gone, we can get rid of entry32.S and the
associated boilerplate.
Note that these must call the ksys_* syscall entry points, as the usual
sys_* entry points will be modified to take a single pt_regs pointer
argument.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We don't currently annotate our mmap implementation as a syscall, as we
need to do to use pt_regs syscall wrappers.
Let's mark it as a real syscall.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We don't currently annotate our various sigreturn functions as syscalls,
as we need to do to use pt_regs syscall wrappers.
Let's mark them as real syscalls.
For compat_sys_sigreturn and compat_sys_rt_sigreturn, this changes the
return type from int to long, matching the prototypes in sys32.c.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
With pt_regs syscall wrappers, the calling convention for
sys_personality() will change. Use ksys_personality(), which is
functionally equivalent.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Our syscall tables are aligned to 4096 bytes, which allowed their
addresses to be generated with a single adrp in entry.S. This has the
unfortunate property of wasting space in .rodata for the necessary
padding.
Now that the address is generated by C code, we can rely on the compiler
to do the right thing, and drop the alignemnt.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We can zero GPRs x0 - x29 upon entry from EL0 to make it harder for
userspace to control values consumed by speculative gadgets.
We don't blat x30, since this is stashed much later, and we'll blat it
before invoking C code.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that all of the syscall logic works on the saved pt_regs, apply_ssbd
can safely corrupt x0-x3 in the entry paths, and we no longer need to
restore them. So let's remove the logic doing so.
With that logic gone, we can fold the branch target into the macro, so
that callers need not deal with this. GAS provides \@, which provides a
unique value per macro invocation, which we can use to create a unique
label.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that syscalls are invoked with pt_regs, we no longer need to ensure
that the argument regsiters are live in the entry assembly, and it's
fine to not restore them after context_tracking_user_exit() has
corrupted them.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that the syscall invocation logic is in C, we can migrate the rest
of the syscall entry logic over, so that the entry assembly needn't look
at the register values at all.
The SVE reset across syscall logic now unconditionally clears TIF_SVE,
but sve_user_disable() will only write back to CPACR_EL1 when SVE is
actually enabled.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently syscall tracing is a tricky assembly state machine, which can
be rather difficult to follow, and even harder to modify. Before we
start fiddling with it for pt_regs syscalls, let's convert it to C.
This is not intended to have any functional change.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
As a first step towards invoking syscalls with a pt_regs argument,
convert the raw syscall invocation logic to C. We end up with a bit more
register shuffling, but the unified invocation logic means we can unify
the tracing paths, too.
Previously, assembly had to open-code calls to ni_sys() when the system
call number was out-of-bounds for the relevant syscall table. This case
is now handled by invoke_syscall(), and the assembly no longer need to
handle this case explicitly. This allows the tracing paths to be
simplified and unified, as we no longer need the __ni_sys_trace path and
the __sys_trace_return label.
This only converts the invocation of the syscall. The rest of the
syscall triage and tracing is left in assembly for now, and will be
converted in subsequent patches.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In preparation for invoking arbitrary syscalls from C code, let's define
a type for an arbitrary syscall, matching the parameter passing rules of
the AAPCS.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The arm64 sigreturn* syscall handlers are non-standard. Rather than
taking a number of user parameters in registers as per the AAPCS,
they expect the pt_regs as their sole argument.
To make this work, we override the syscall definitions to invoke
wrappers written in assembly, which mov the SP into x0, and branch to
their respective C functions.
On other architectures (such as x86), the sigreturn* functions take no
argument and instead use current_pt_regs() to acquire the user
registers. This requires less boilerplate code, and allows for other
features such as interposing C code in this path.
This patch takes the same approach for arm64.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tentatively-reviewed-by: Dave Martin <dave.martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In subsequent patches, we'll want to make use of sve_user_enable() and
sve_user_disable() outside of kernel/fpsimd.c. Let's move these to
<asm/fpsimd.h> where we can make use of them.
To avoid ifdeffery in sequences like:
if (system_supports_sve() && some_condition)
sve_user_disable();
... empty stubs are provided when support for SVE is not enabled. Note
that system_supports_sve() contains as IS_ENABLED(CONFIG_ARM64_SVE), so
the sve_user_disable() call should be optimized away entirely when
CONFIG_ARM64_SVE is not selected.
To ensure that this is the case, the stub definitions contain a
BUILD_BUG(), as we do for other stubs for which calls should always be
optimized away when the relevant config option is not selected.
At the same time, the include list of <asm/fpsimd.h> is sorted while
adding <asm/sysreg.h>.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that we have sysreg_clear_set(), we can use this instead of
change_cpacr().
Note that the order of the set and clear arguments differs between
change_cpacr() and sysreg_clear_set(), so these are flipped as part of
the conversion. Also, sve_user_enable() redundantly clears
CPACR_EL1_ZEN_EL0EN before setting it; this is removed for clarity.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that we have sysreg_clear_set(), we can consistently use this
instead of config_sctlr_el1().
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently we assert that the SCTLR_EL{1,2}_{SET,CLEAR} bits are
self-consistent with an assertion in config_sctlr_el1(). This is a bit
unusual, since config_sctlr_el1() doesn't make use of these definitions,
and is far away from the definitions themselves.
We can use the CPP #error directive to have equivalent assertions in
<asm/sysreg.h>, next to the definitions of the set/clear bits, which is
a bit clearer and simpler.
At the same time, lets fill in the upper 32 bits for both registers in
their respective RES0 definitions. This could be a little nicer with
GENMASK_ULL(63, 32), but this currently lives in <linux/bitops.h>, which
cannot safely be included from assembly, as <asm/sysreg.h> can.
Note the when the preprocessor evaluates an expression for an #if
directive, all signed or unsigned values are treated as intmax_t or
uintmax_t respectively. To avoid ambiguity, we define explicitly define
the mask of all 64 bits.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In do_notify_resume, we manipulate thread_flags as a 32-bit unsigned
int, whereas thread_info::flags is a 64-bit unsigned long, and elsewhere
(e.g. in the entry assembly) we manipulate the flags as a 64-bit
quantity.
For consistency, and to avoid problems if we end up with more than 32
flags, let's make do_notify_resume take the flags as a 64-bit unsigned
long.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This reverts commit 7e7df71fd5.
When unwinding out of the IRQ stack and onto the interrupted EL1 stack,
we cannot rely on the frame pointer being strictly increasing, as this
could terminate the backtrace early depending on how the stacks have
been allocated.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Implement calls to rseq_signal_deliver, rseq_handle_notify_resume
and rseq_syscall so that we can select HAVE_RSEQ on arm64.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Building without NUMA but with FLATMEM results in a link error
because mem_map[] is not available:
aarch64-linux-ld -EB -maarch64elfb --no-undefined -X -pie -shared -Bsymbolic --no-apply-dynamic-relocs --build-id -o .tmp_vmlinux1 -T ./arch/arm64/kernel/vmlinux.lds --whole-archive built-in.a --no-whole-archive --start-group arch/arm64/lib/lib.a lib/lib.a --end-group
init/do_mounts.o: In function `mount_block_root':
do_mounts.c:(.init.text+0x1e8): undefined reference to `mem_map'
arch/arm64/kernel/vdso.o: In function `vdso_init':
vdso.c:(.init.text+0xb4): undefined reference to `mem_map'
This uses the same trick as the other architectures, making flatmem
depend on !NUMA to avoid the broken configuration.
Fixes: e7d4bac428 ("arm64: add ARM64-specific support for flatmem")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Current ACPI ARM64 NUMA initialization code in
acpi_numa_gicc_affinity_init()
carries out NUMA nodes creation and cpu<->node mappings at the same time
in the arch backend so that a single SRAT walk is needed to parse both
pieces of information. This implies that the cpu<->node mappings must
be stashed in an array (sized NR_CPUS) so that SMP code can later use
the stashed values to avoid another SRAT table walk to set-up the early
cpu<->node mappings.
If the kernel is configured with a NR_CPUS value less than the actual
processor entries in the SRAT (and MADT), the logic in
acpi_numa_gicc_affinity_init() is broken in that the cpu<->node mapping
is only carried out (and stashed for future use) only for a number of
SRAT entries up to NR_CPUS, which do not necessarily correspond to the
possible cpus detected at SMP initialization in
acpi_map_gic_cpu_interface() (ie MADT and SRAT processor entries order
is not enforced), which leaves the kernel with broken cpu<->node
mappings.
Furthermore, given the current ACPI NUMA code parsing logic in
acpi_numa_gicc_affinity_init(), PXM domains for CPUs that are not parsed
because they exceed NR_CPUS entries are not mapped to NUMA nodes (ie the
PXM corresponding node is not created in the kernel) leaving the system
with a broken NUMA topology.
Rework the ACPI ARM64 NUMA initialization process so that the NUMA
nodes creation and cpu<->node mappings are decoupled. cpu<->node
mappings are moved to SMP initialization code (where they are needed),
at the cost of an extra SRAT walk so that ACPI NUMA mappings can be
batched before being applied, fixing current parsing pitfalls.
Acked-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: John Garry <john.garry@huawei.com>
Fixes: d8b47fca8c ("arm64, ACPI, NUMA: NUMA support based on SRAT and
SLIT")
Link: http://lkml.kernel.org/r/1527768879-88161-2-git-send-email-xiexiuqi@huawei.com
Reported-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Punit Agrawal <punit.agrawal@arm.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Flatmem is useful in reducing kernel memory usage.
One usecase is in kdump kernel. We are able to save
~14M by moving to flatmem scheme.
Cc: xe-kernel@external.cisco.com
Cc: Nikunj Kela <nkela@cisco.com>
Signed-off-by: Nikunj Kela <nkela@cisco.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
lkdtm calls flush_icache_range(), which results in an out-of-line call
to __flush_icache_range(), which is not exported to modules.
Export the symbol to modules to fix this build breakage.
Fixes: 3b8c9f1cdf ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings")
Signed-off-by: Will Deacon <will.deacon@arm.com>
Commit 37c3ec2d81 ("arm64: topology: divorce MC scheduling domain from
core_siblings") selected the smallest of LLC, socket siblings, and NUMA
node siblings to ensure that the sched domain we build for the MC layer
isn't larger than the DIE above it or it's shrunk to the socket or NUMA
node if LLC exist acrosis NUMA node/chiplets.
Commit acd32e52e4e0 ("arm64: topology: Avoid checking numa mask for
scheduler MC selection") reverted the NUMA siblings checks since the
CPU topology masks weren't updated on hotplug at that time.
This patch re-introduces numa mask check as the CPU and NUMA topology
is now updated in hotplug paths. Effectively, this patch does the
partial revert of commit acd32e52e4e0.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Similar to core_sibling and thread_sibling, it's better to align and
rename llc_siblings to llc_sibling.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We already repopulate the information on CPU hotplug-in, so we can safely
remove the CPU topology and NUMA cpumap information during CPU hotplug
out operation. This will help to provide the correct cpumask for
scheduler domains.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
It's incorrect to iterate over all the possible CPUs to update the
sibling masks when any CPU is hotplugged in. In case the topology
siblings masks of the CPU is removed when is it hotplugged out, we
end up updating those masks when one of it's sibling is powered up
again. This will provide inconsistent view.
Further, since the CPU calling update_sibling_masks is yet to be set
online, there's no need to compare itself with each online CPU when
updating the siblings masks.
This patch restricts updation of sibling masks only for CPUs that are
already online. It also the drops the unnecessary cpuid check.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch adds support to remove all the CPU topology information using
clear_cpu_topology and also resetting the sibling information on other
sibling CPUs. This will be used in cpu_disable so that all the topology
sibling information is removed on CPU hotplug out.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently numa_clear_node removes both cpu information from the NUMA
node cpumap as well as the NUMA node id from the cpu. Similarly
numa_store_cpu_info updates both percpu nodeid and NUMA cpumap.
However we need to retain the numa node id for the cpu and only remove
the cpu information from the numa node cpumap during CPU hotplug out.
The same can be extended for hotplugging in the CPU.
This patch separates out numa_{add,remove}_cpu from numa_clear_node and
numa_store_cpu_info.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently reset_cpu_topology clears all the CPU topology information
and resets to default values. However we may need to just clear the
information when we hotplug out the CPU. In preparation to add the
support the same, let's refactor reset_cpu_topology to just reset
the information and move clearing out the topology information to
clear_cpu_topology.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The ERRATA_MIDR_REV_RANGE macro assigns ARM64_CPUCAP_LOCAL_CPU_ERRATUM
to the '.type' field of the 'struct arm64_cpu_capabilities', so there's
no need to assign it explicitly as well.
Signed-off-by: Will Deacon <will.deacon@arm.com>
arm64 requires break-before-make. Originally, before
setting up new pmd/pud entry for huge mapping, in few
cases, the modifying pmd/pud entry was still valid
and pointing to next level page table as we only
clear off leaf PTE in unmap leg.
a) This was resulting into stale entry in TLBs (as few
TLBs also cache intermediate mapping for performance
reasons)
b) Also, modifying pmd/pud was the only reference to
next level page table and it was getting lost without
freeing it. So, page leaks were happening.
Implement pud_free_pmd_page() and pmd_free_pte_page() to
enforce BBM and also free the leaking page tables.
Implementation requires,
1) Clearing off the current pud/pmd entry
2) Invalidation of TLB
3) Freeing of the un-used next level page tables
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Add an interface to invalidate intermediate page tables
from TLB for kernel.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Pull in core ioremap changes from -tip, since we depend on these for
re-enabling huge I/O mappings on arm64.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Patching kernel instructions at runtime requires other CPUs to undergo
a context synchronisation event via an explicit ISB or an IPI in order
to ensure that the new instructions are visible. This is required even
for "hotpatch" instructions such as NOP and BL, so avoid optimising in
this case and always go via stop_machine() when performing general
patching.
ftrace isn't quite as strict, so it can continue to call the nosync
code directly.
Signed-off-by: Will Deacon <will.deacon@arm.com>
When invalidating the instruction cache for a kernel mapping via
flush_icache_range(), it is also necessary to flush the pipeline for
other CPUs so that instructions fetched into the pipeline before the
I-cache invalidation are discarded. For example, if module 'foo' is
unloaded and then module 'bar' is loaded into the same area of memory,
a CPU could end up executing instructions from 'foo' when branching into
'bar' if these instructions were fetched into the pipeline before 'foo'
was unloaded.
Whilst this is highly unlikely to occur in practice, particularly as
any exception acts as a context-synchronizing operation, following the
letter of the architecture requires us to execute an ISB on each CPU
in order for the new instruction stream to be visible.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that users have been migrated to PSR_AA32, kill the unused
COMPAT_PSR definitions.
The only difference we need a definition for is COMPAT_PSR_DIT_BIT,
which differs from PSR_AA32_DIT_BIT.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Some code cares about the SPSR_ELx format for exceptions taken from
AArch32 to inspect or manipulate the SPSR_ELx value, which is already in
the SPSR_ELx format, and not in the AArch32 PSR format.
To separate these from cases where we care about the AArch32 PSR format,
migrate these cases to use the PSR_AA32_* definitions rather than
COMPAT_PSR_*.
There should be no functional change as a result of this patch.
Note that arm64 KVM does not support a compat KVM API, and always uses
the SPSR_ELx format, even for AArch32 guests.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Some code cares about the SPSR_ELx format for exceptions taken from
AArch32 to inspect or manipulate the SPSR_ELx value, which is already in
the SPSR_ELx format, and not in the AArch32 PSR format.
To separate these from cases where we care about the AArch32 PSR format,
migrate these cases to use the PSR_AA32_* definitions rather than
COMPAT_PSR_*.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The SPSR_ELx format for exceptions taken from AArch32 is slightly
different to the AArch32 PSR format.
Map between the two in the compat ptrace code.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 7206dc93a5 ("arm64: Expose Arm v8.4 features")
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The SPSR_ELx format for exceptions taken from AArch32 differs from the
AArch32 PSR format. Thus, we must translate between the two when setting
up a compat sigframe, or restoring context from a compat sigframe.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 7206dc93a5 ("arm64: Expose Arm v8.4 features")
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>