Commit Graph

678149 Commits

Author SHA1 Message Date
Dave Martin 936eb65ca2 arm64: ptrace: Flush user-RW TLS reg to thread_struct before reading
When reading current's user-writable TLS register (which occurs
when dumping core for native tasks), it is possible that userspace
has modified it since the time the task was last scheduled out.
The new TLS register value is not guaranteed to have been written
immediately back to thread_struct in this case.

As a result, a coredump can capture stale data for this register.
Reading the register for a stopped task via ptrace is unaffected.

For native tasks, this patch explicitly flushes the TPIDR_EL0
register back to thread_struct before dumping when operating on
current, thus ensuring that coredump contents are up to date.  For
compat tasks, the TLS register is not user-writable and so cannot
be out of sync, so no flush is required in compat_tls_get().

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-22 15:58:20 +01:00
Dave Martin e1d5a8fb73 arm64: ptrace: Flush FPSIMD regs back to thread_struct before reading
When reading the FPSIMD state of current (which occurs when dumping
core), it is possible that userspace has modified the FPSIMD
registers since the time the task was last scheduled out.  Such
changes are not guaranteed to be reflected immedately in
thread_struct.

As a result, a coredump can contain stale values for these
registers.  Reading the registers of a stopped task via ptrace is
unaffected.

This patch explicitly flushes the CPU state back to thread_struct
before dumping when operating on current, thus ensuring that
coredump contents are up to date.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-22 15:58:19 +01:00
Dave Martin af66b2d88a arm64: ptrace: Fix VFP register dumping in compat coredumps
Currently, VFP registers are omitted from coredumps for compat
processes, due to a bug in the REGSET_COMPAT_VFP regset
implementation.

compat_vfp_get() needs to transfer non-contiguous data from
thread_struct.fpsimd_state, and uses put_user() to handle the
offending trailing word (FPSCR).  This fails when copying to a
kernel address (i.e., kbuf && !ubuf), which is what happens when
dumping core.  As a result, the ELF coredump core code silently
omits the NT_ARM_VFP note from the dump.

It would be possible to work around this with additional special
case code for the put_user(), but since user_regset_copyout() is
explicitly designed to handle this scenario it is cleaner to port
the put_user() to a user_regset_copyout() call, which this patch
does.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-22 15:58:19 +01:00
Luc Van Oostenryck f5d284900c arm64: pass machine size to sparse
When using sparse on the arm64 tree we get many thousands of
warnings like 'constant ... is so big it is unsigned long long'
or 'shift too big (32) for type unsigned long'. This happens
because by default sparse considers the machine as 32bit and
defines the size of the types accordingly.

Fix this by passing the '-m64' flag to sparse so that
sparse can correctly define longs as being 64bit.

CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: linux-arm-kernel@lists.infradead.org
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-20 16:47:16 +01:00
Dave Martin bb4322f743 arm64: signal: factor out signal frame record allocation
This patch factors out the allocator for signal frame optional
records into a separate function, to ensure consistency and
facilitate later expansion.

No overrun checking is currently done, because the allocation is in
user memory and anyway the kernel never tries to allocate enough
space in the signal frame yet for an overrun to occur.  This
behaviour will be refined in future patches.

The approach taken in this patch to allocation of the terminator
record is not very clean: this will also be replaced in subsequent
patches.

For future extension, a comment is added in sigcontext.h
documenting the current static allocations in __reserved[].  This
will be important for determining under what circumstances
userspace may or may not see an expanded signal frame.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-20 12:42:59 +01:00
Dave Martin bb4891a6c3 arm64: signal: factor frame layout and population into separate passes
In preparation for expanding the signal frame, this patch refactors
the signal frame setup code in setup_sigframe() into two separate
passes.

The first pass, setup_sigframe_layout(), determines the size of the
signal frame and its internal layout, including the presence and
location of optional records.  The resulting knowledge is used to
allocate and locate the user stack space required for the signal
frame and to determine which optional records to include.

The second pass, setup_sigframe(), is called once the stack frame
is allocated in order to populate it with the necessary context
information.

As a result of these changes, it becomes more natural to represent
locations in the signal frame by a base pointer and an offset,
since the absolute address of each location is not known during the
layout pass.  To be more consistent with this logic,
parse_user_sigframe() is refactored to describe signal frame
locations in a similar way.

This change has no effect on the signal ABI, but will make it
easier to expand the signal frame in future patches.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-20 12:42:59 +01:00
Dave Martin 47ccb02868 arm64: signal: Refactor sigcontext parsing in rt_sigreturn
Currently, rt_sigreturn does very limited checking on the
sigcontext coming from userspace.

Future additions to the sigcontext data will increase the potential
for surprises.  Also, it is not clear whether the sigcontext
extension records are supposed to occur in a particular order.

To allow the parsing code to be extended more easily, this patch
factors out the sigcontext parsing into a separate function, and
adds extra checks to validate the well-formedness of the sigcontext
structure.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-20 12:42:58 +01:00
Dave Martin 20987de3c2 arm64: signal: split frame link record from sigcontext structure
In order to be able to increase the amount of the data currently
written to the __reserved[] array in the signal frame, it is
necessary to overwrite the locations currently occupied by the
{fp,lr} frame link record pushed at the top of the signal stack.

In order for this to work, this patch detaches the frame link
record from struct rt_sigframe and places it separately at the top
of the signal stack.  This will allow subsequent patches to insert
data between it and __reserved[].

This change relies on the non-ABI status of the placement of the
frame record with respect to struct sigframe: this status is
undocumented, but the placement is not declared or described in the
user headers, and known unwinder implementations (libgcc,
libunwind, gdb) appear not to rely on it.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-20 12:42:58 +01:00
Ard Biesheuvel 8f36094802 arm64: mm: select CONFIG_ARCH_PROC_KCORE_TEXT
To avoid issues with the /proc/kcore code getting confused about the
kernels block mappings in the VMALLOC region, enable the existing
facility that describes the [_text, _end) interval as a separate
KCORE_TEXT region, which supersedes the KCORE_VMALLOC region that
it intersects with on arm64.

Reported-by: Tan Xiaojun <tanxiaojun@huawei.com>
Tested-by: Tan Xiaojun <tanxiaojun@huawei.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-20 12:42:58 +01:00
Ard Biesheuvel 737326aa51 fs/proc: kcore: use kcore_list type to check for vmalloc/module address
Instead of passing each start address into is_vmalloc_or_module_addr()
to decide whether it falls into either the VMALLOC or the MODULES region,
we can simply check the type field of the current kcore_list entry, since
it will be set to KCORE_VMALLOC based on exactly the same conditions.

As a bonus, when reading the KCORE_TEXT region on architectures that have
one, this will avoid using vread() on the region if it happens to intersect
with a KCORE_VMALLOC region. This is due the fact that the KCORE_TEXT
region is the first one to be added to the kcore region list.

Reported-by: Tan Xiaojun <tanxiaojun@huawei.com>
Tested-by: Tan Xiaojun <tanxiaojun@huawei.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-20 12:42:57 +01:00
Ard Biesheuvel 06c35ef1fd drivers/char: kmem: disable on arm64
As it turns out, arm64 deviates from other architectures in the way it
maps the VMALLOC region: on most (all?) other architectures, it resides
strictly above the kernel's direct mapping of DRAM, but on arm64, this
is the other way around. For instance, for a 48-bit VA configuration,
we have

  modules : 0xffff000000000000 - 0xffff000008000000   (   128 MB)
  vmalloc : 0xffff000008000000 - 0xffff7dffbfff0000   (129022 GB)
  ...
  vmemmap : 0xffff7e0000000000 - 0xffff800000000000   (  2048 GB maximum)
            0xffff7e0000000000 - 0xffff7e0003ff0000   (    63 MB actual)
  memory  : 0xffff800000000000 - 0xffff8000ffc00000   (  4092 MB)

This has mostly gone unnoticed until now, but it does appear that it
breaks an assumption in the kmem read/write code, which does something
like

  if (p < (unsigned long) high_memory) {
    ... use straight copy_[to|from]_user() using p as virtual address ...
  }
  ...
  if (count > 0) {
    ... use vread/vwrite for accesses past high_memory ...
  }

The first condition will inadvertently hold for the VMALLOC region if
VMALLOC_START < PAGE_OFFSET [which is the case on arm64], but the read
or write will subsequently fail the virt_addr_valid() check, resulting
in a -ENXIO return value.

Given how kmem seems to be living in borrowed time anyway, and given
the fact that nobody noticed that the read/write interface is broken
on arm64 in the first place, let's not bother trying to fix it, but
simply disable the /dev/kmem interface entirely for arm64.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-20 12:42:42 +01:00
Dustin Brown e27c7fa015 arm64: Export save_stack_trace_tsk()
The kernel watchdog is a great debugging tool for finding tasks that
consume a disproportionate amount of CPU time in contiguous chunks. One
can imagine building a similar watchdog for arbitrary driver threads
using save_stack_trace_tsk() and print_stack_trace(). However, this is
not viable for dynamically loaded driver modules on ARM platforms
because save_stack_trace_tsk() is not exported for those architectures.
Export save_stack_trace_tsk() for the ARM64 architecture to align with
x86 and support various debugging use cases such as arbitrary driver
thread watchdog timers.

Signed-off-by: Dustin Brown <dustinb@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-15 11:52:35 +01:00
Lorenzo Pieralisi 91e0bf8125 ACPI/IORT: Remove iort_node_match()
Commit 316ca8804e ("ACPI/IORT: Remove linker section for IORT entries
probing") removed the linker section for IORT entries probing.

Since those IORT entries were the only iort_node_match() interface
users, the iort_node_match() became obsolete and can then be removed.

Remove the ACPI IORT iort_node_match() interface from the kernel.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-15 11:41:37 +01:00
Lorenzo Pieralisi c6bb8f89fa ARM64/irqchip: Update ACPI_IORT symbol selection logic
ACPI IORT is an ACPI addendum to describe the connection topology of
devices with IOMMUs and interrupt controllers on ARM64 ACPI systems.

Currently the ACPI IORT Kbuild symbol is selected whenever the Kbuild
symbol ARM_GIC_V3_ITS is enabled, which in turn is selected by ARM64
Kbuild defaults. This makes the logic behind ACPI_IORT selection a bit
twisted and not easy to follow. On ARM64 systems enabling ACPI the
kbuild symbol ACPI_IORT should always be selected in that it is a kernel
layer provided to the ARM64 arch code to parse and enable ACPI firmware
bindings.

Make the ACPI_IORT selection explicit in ARM64 Kbuild and remove the
selection from ARM_GIC_V3_ITS entry, making the ACPI_IORT selection
logic clearer to follow.

Acked-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Hanjun Guo <hanjun.guo@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-15 11:41:21 +01:00
Olav Haugan 577dfe16b8 arm64/dma-mapping: Remove extraneous null-pointer checks
The current null-pointer check in __dma_alloc_coherent and
__dma_free_coherent is not needed anymore since the
__dma_alloc/__dma_free functions won't be called if !dev (dummy ops will
be called instead).

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-15 11:40:22 +01:00
Jonathan (Zhixiong) Zhang c484f2564d arm64: kconfig: allow support for memory failure handling
Declare ARCH_SUPPORTS_MEMORY_FAILURE, as arm64 does support
memory failure recovery attempt.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
(Dropped changes to ACPI APEI Kconfig and updated commit log)
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
Tested-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-12 16:04:29 +01:00
Punit Agrawal 0e3a902639 arm64: mm: Update perf accounting to handle poison faults
Re-organise the perf accounting for fault handling in preparation for
enabling handling of hardware poison faults in subsequent commits. The
change updates perf accounting to be inline with the behaviour on
x86.

With this update, the perf fault accounting -

  * Always report PERF_COUNT_SW_PAGE_FAULTS

  * Doesn't report anything else for VM_FAULT_ERROR (which includes
    hwpoison faults)

  * Reports PERF_COUNT_SW_PAGE_FAULTS_MAJ if it's a major
    fault (indicated by VM_FAULT_MAJOR)

  * Otherwise, reports PERF_COUNT_SW_PAGE_FAULTS_MIN

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-12 16:04:29 +01:00
Jonathan (Zhixiong) Zhang e7c600f149 arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
Add VM_FAULT_HWPOISON[_LARGE] handling to the arm64 page fault
handler. Handling of VM_FAULT_HWPOISON[_LARGE] is very similar
to VM_FAULT_OOM, the only difference is that a different si_code
(BUS_MCEERR_AR) is passed to user space and si_addr_lsb field is
initialized.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
(fix new __do_user_fault call-site)
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
Tested-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-12 16:04:29 +01:00
Punit Agrawal f02ab08afb arm64: hugetlb: Fix huge_pte_offset to return poisoned page table entries
When memory failure is enabled, a poisoned hugepage pte is marked as a
swap entry. huge_pte_offset() does not return the poisoned page table
entries when it encounters PUD/PMD hugepages.

This behaviour of huge_pte_offset() leads to error such as below when
munmap is called on poisoned hugepages.

[  344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.

Fix huge_pte_offset() to return the poisoned pte which is then
appropriately handled by the generic layer code.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Woods <dwoods@mellanox.com>
Tested-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-12 16:04:28 +01:00
Will Deacon 687644209a arm64: ftrace: fix building without CONFIG_MODULES
When CONFIG_MODULES is disabled, we cannot dereference a module pointer:

arch/arm64/kernel/ftrace.c: In function 'ftrace_make_call':
arch/arm64/kernel/ftrace.c:107:36: error: dereferencing pointer to incomplete type 'struct module'
   trampoline = (unsigned long *)mod->arch.ftrace_trampoline;

Also, the within_module() function is not defined:

arch/arm64/kernel/ftrace.c: In function 'ftrace_make_nop':
arch/arm64/kernel/ftrace.c:171:8: error: implicit declaration of function 'within_module'; did you mean 'init_module'? [-Werror=implicit-function-declaration]

This addresses both by adding replacing the IS_ENABLED(CONFIG_ARM64_MODULE_PLTS)
checks with #ifdef versions.

Fixes: e71a4e1beb ("arm64: ftrace: add support for far branches to dynamic ftrace")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-12 14:43:25 +01:00
Will Deacon 1eb34b6e51 arm64: fault: Print info about page table structure when dumping pte
Whilst debugging a remote crash, I noticed that show_pte is unhelpful
when it comes to describing the structure of the page table being walked.
This is easily fixed by printing out the page table (swapper vs user),
page size and virtual address size when displaying the PGD address.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-12 12:33:54 +01:00
Kristina Martsenko 83016b2042 arm64: mm: print file name of faulting vma
Print out the name of the file associated with the vma that faulted.
This is usually the executable or shared library name. We already print
out the task name, but also printing the library name is useful for
pinpointing bugs to libraries.

Also print the base address and size of the vma, which together with the
PC (printed by __show_regs) gives the offset into the library.

Fault prints now look like:
test[2361]: unhandled level 2 translation fault (11) at 0x00000012, esr 0x92000006, in libfoo.so[ffffa0145000+1000]

This is already done on x86, for more details see commit 03252919b7
("x86: print which shared library/executable faulted in segfault etc.
messages v3").

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-12 12:33:37 +01:00
Kristina Martsenko bf396c09c2 arm64: mm: don't print out page table entries on EL0 faults
When we take a fault from EL0 that can't be handled, we print out the
page table entries associated with the faulting address. This allows
userspace to print out any current page table entries, including kernel
(TTBR1) entries. Exposing kernel mappings like this could pose a
security risk, so don't print out page table information on EL0 faults.
(But still print it out for EL1 faults.) This also follows the same
behaviour as x86, printing out page table entries on kernel mode faults
but not user mode faults.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-12 12:33:37 +01:00
Kristina Martsenko 67ce16ec15 arm64: mm: print out correct page table entries
When we take a fault that can't be handled, we print out the page table
entries associated with the faulting address. In some cases we currently
print out the wrong entries. For a faulting TTBR1 address, we sometimes
print out TTBR0 table entries instead, and for a faulting TTBR0 address
we sometimes print out TTBR1 table entries. Fix this by choosing the
tables based on the faulting address.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[will: zero-extend addrs to 64-bit, don't walk swapper w/ TTBR0 addr]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-12 12:33:37 +01:00
Ard Biesheuvel e71a4e1beb arm64: ftrace: add support for far branches to dynamic ftrace
Currently, dynamic ftrace support in the arm64 kernel assumes that all
core kernel code is within range of ordinary branch instructions that
occur in module code, which is usually the case, but is no longer
guaranteed now that we have support for module PLTs and address space
randomization.

Since on arm64, all patching of branch instructions involves function
calls to the same entry point [ftrace_caller()], we can emit the modules
with a trampoline that has unlimited range, and patch both the trampoline
itself and the branch instruction to redirect the call via the trampoline.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: minor clarification to smp_wmb() comment]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-07 11:52:02 +01:00
Ard Biesheuvel f8af0b364e arm64: ftrace: don't validate branch via PLT in ftrace_make_nop()
When turning branch instructions into NOPs, we attempt to validate the
action by comparing the old value at the call site with the opcode of
a direct relative branch instruction pointing at the old target.

However, these call sites are statically initialized to call _mcount(),
and may be redirected via a PLT entry if the module is loaded far away
from the kernel text, leading to false negatives and spurious errors.

So skip the validation if CONFIG_ARM64_MODULE_PLTS is configured.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-07 11:50:34 +01:00
Kees Cook dbbb08f500 arm64, vdso: Define vdso_{start,end} as array
Adjust vdso_{start|end} to be char arrays to avoid compile-time analysis
that flags "too large" memcmp() calls with CONFIG_FORTIFY_SOURCE.

Cc: Jisheng Zhang <jszhang@marvell.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-06 17:49:55 +01:00
Will Deacon 8dd0ee651d arm64: cpufeature: Fix CPU_OUT_OF_SPEC taint for uniform systems
Commit 3fde2999fa ("arm64: cpufeature: Don't dump useless backtrace on
CPU_OUT_OF_SPEC") changed the cpufeature detection code to use add_taint
instead of WARN_TAINT_ONCE when detecting a heterogeneous system with
mismatched feature support. Unfortunately, this resulted in all systems
getting the taint, regardless of any feature mismatch.

This patch fixes the problem by conditionalising the taint on detecting
a feature mismatch.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-05 11:40:23 +01:00
Ard Biesheuvel 1151f838cb arm64: kernel: restrict /dev/mem read() calls to linear region
When running lscpu on an AArch64 system that has SMBIOS version 2.0
tables, it will segfault in the following way:

  Unable to handle kernel paging request at virtual address ffff8000bfff0000
  pgd = ffff8000f9615000
  [ffff8000bfff0000] *pgd=0000000000000000
  Internal error: Oops: 96000007 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 0 PID: 1284 Comm: lscpu Not tainted 4.11.0-rc3+ #103
  Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
  task: ffff8000fa78e800 task.stack: ffff8000f9780000
  PC is at __arch_copy_to_user+0x90/0x220
  LR is at read_mem+0xcc/0x140

This is caused by the fact that lspci issues a read() on /dev/mem at the
offset where it expects to find the SMBIOS structure array. However, this
region is classified as EFI_RUNTIME_SERVICE_DATA (as per the UEFI spec),
and so it is omitted from the linear mapping.

So let's restrict /dev/mem read/write access to those areas that are
covered by the linear region.

Reported-by: Alexander Graf <agraf@suse.de>
Fixes: 4dffbfc48d ("arm64/efi: mark UEFI reserved regions as MEMBLOCK_NOMAP")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-01 18:26:26 +01:00
Lorenzo Pieralisi db46a72b97 ARM64/PCI: Set root bus NUMA node on ACPI systems
PCI core requires the NUMA node for the struct pci_host_bridge.dev to
be set by using the pcibus_to_node(struct pci_bus*) API, that on ARM64
systems relies on the struct pci_host_bridge->bus.dev NUMA node.

The struct pci_host_bridge.dev NUMA node is then propagated through
the PCI device hierarchy as PCI devices (and bridges) are enumerated
under it.

Therefore, in order to set-up the PCI NUMA hierarchy appropriately, the
struct pci_host_bridge->bus.dev NUMA node must be set before core
code calls pcibus_to_node(struct pci_bus*) on it so that PCI core can
retrieve the NUMA node for the struct pci_host_bridge.dev device and can
propagate it through the PCI bus tree.

On ARM64 ACPI based systems the struct pci_host_bridge->bus.dev NUMA
node can be set-up in pcibios_root_bridge_prepare() by parsing the root
bridge ACPI device firmware binding.

Add code to the pcibios_root_bridge_prepare() that, when booting with
ACPI, parse the root bridge ACPI device companion NUMA binding and set
the corresponding struct pci_host_bridge->bus.dev NUMA node
appropriately.

Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Robert Richter <rrichter@cavium.com>
Tested-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-05-30 11:45:21 +01:00
Will Deacon 5f16a046f8 arm64: futex: Fix undefined behaviour with FUTEX_OP_OPARG_SHIFT usage
FUTEX_OP_OPARG_SHIFT instructs the futex code to treat the 12-bit oparg
field as a shift value, potentially leading to a left shift value that
is negative or with an absolute value that is significantly larger then
the size of the type. UBSAN chokes with:

================================================================================
UBSAN: Undefined behaviour in ./arch/arm64/include/asm/futex.h:60:13
shift exponent -1 is negative
CPU: 1 PID: 1449 Comm: syz-executor0 Not tainted 4.11.0-rc4-00005-g977eb52-dirty #11
Hardware name: linux,dummy-virt (DT)
Call trace:
[<ffff200008094778>] dump_backtrace+0x0/0x538 arch/arm64/kernel/traps.c:73
[<ffff200008094cd0>] show_stack+0x20/0x30 arch/arm64/kernel/traps.c:228
[<ffff200008c194a8>] __dump_stack lib/dump_stack.c:16 [inline]
[<ffff200008c194a8>] dump_stack+0x120/0x188 lib/dump_stack.c:52
[<ffff200008cc24b8>] ubsan_epilogue+0x18/0x98 lib/ubsan.c:164
[<ffff200008cc3098>] __ubsan_handle_shift_out_of_bounds+0x250/0x294 lib/ubsan.c:421
[<ffff20000832002c>] futex_atomic_op_inuser arch/arm64/include/asm/futex.h:60 [inline]
[<ffff20000832002c>] futex_wake_op kernel/futex.c:1489 [inline]
[<ffff20000832002c>] do_futex+0x137c/0x1740 kernel/futex.c:3231
[<ffff200008320504>] SYSC_futex kernel/futex.c:3281 [inline]
[<ffff200008320504>] SyS_futex+0x114/0x268 kernel/futex.c:3249
[<ffff200008084770>] el0_svc_naked+0x24/0x28
================================================================================
syz-executor1 uses obsolete (PF_INET,SOCK_PACKET)
sock: process `syz-executor0' is using obsolete setsockopt SO_BSDCOMPAT

This patch attempts to fix some of this by:

  * Making encoded_op an unsigned type, so we can shift it left even if
    the top bit is set.

  * Casting to signed prior to shifting right when extracting oparg
    and cmparg

  * Consider only the bottom 5 bits of oparg when using it as a left-shift
    value.

Whilst I think this catches all of the issues, I'd much prefer to remove
this stuff, as I think it's unused and the bugs are copy-pasted between
a bunch of architectures.

Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-05-30 11:07:42 +01:00
Kefeng Wang 690e95dd4d arm64: check return value of of_flat_dt_get_machine_name
It's useless to print machine name and setup arch-specific system
identifiers if of_flat_dt_get_machine_name() return NULL, especially
when ACPI-based boot.

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-05-30 11:07:42 +01:00
Will Deacon 3fde2999fa arm64: cpufeature: Don't dump useless backtrace on CPU_OUT_OF_SPEC
Unfortunately, it turns out that mismatched CPU features in big.LITTLE
systems are starting to appear in the wild. Whilst we should continue to
taint the kernel with CPU_OUT_OF_SPEC for features that differ in ways
that we can't fix up, dumping a useless backtrace out of the cpufeature
code is pointless and irritating.

This patch removes the backtrace from the taint.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-05-30 11:07:42 +01:00
Tobias Klauser 6efd8499d9 arm64: mm: explicity include linux/vmalloc.h
arm64's mm/mmu.c uses vm_area_add_early, struct vm_area and other
definitions  but relies on implict inclusion of linux/vmalloc.h which
means that changes in other headers could break the build. Thus, add an
explicit include.

Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-05-30 11:07:42 +01:00
Kefeng Wang 1149aad10b arm64: Add dump_backtrace() in show_regs
Generic code expects show_regs() to dump the stack, but arm64's
show_regs() does not. This makes it hard to debug softlockups and
other issues that result in show_regs() being called.

This patch updates arm64's show_regs() to dump the stack, as common
code expects.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
[will: folded in bug_handler fix from mrutland]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-05-30 11:07:42 +01:00
Kefeng Wang c07ab957d9 arm64: Call __show_regs directly
Generic code expects show_regs() to also dump the stack, but arm64's
show_reg() does not do this. Some arm64 callers of show_regs() *only*
want the registers dumped, without the stack.

To enable generic code to work as expected, we need to make
show_regs() dump the stack. Where we only want the registers dumped,
we must use __show_regs().

This patch updates code to use __show_regs() where only registers are
desired. A subsequent patch will modify show_regs().

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-05-30 11:07:41 +01:00
Dong Bo 48f99c8ec0 arm64: Preventing READ_IMPLIES_EXEC propagation
Like arch/arm/, we inherit the READ_IMPLIES_EXEC personality flag across
fork(). This is undesirable for a number of reasons:

  * ELF files that don't require executable stack can end up with it
    anyway

  * We end up performing un-necessary I-cache maintenance when mapping
    what should be non-executable pages

  * Restricting what is executable is generally desirable when defending
    against overflow attacks

This patch clears the personality flag when setting up the personality for
newly spwaned native tasks. Given that semi-recent AArch64 toolchains emit
a non-executable PT_GNU_STACK header, userspace applications can already
not rely on READ_IMPLIES_EXEC so shouldn't be adversely affected by this
change.

Cc: <stable@vger.kernel.org>
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Dong Bo <dongbo4@huawei.com>
[will: added comment to compat code, rewrote commit message]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-05-30 11:07:41 +01:00
Linus Torvalds 5ed02dbb49 Linux 4.12-rc3 2017-05-28 17:20:53 -07:00
Linus Torvalds d09bc680ca Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
Pull thermal SoC management fixes from Eduardo Valentin:

 - fixes to TI SoC driver, Broadcom, qoriq

 - small sparse warning fix on thermal core

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
  thermal: broadcom: ns-thermal: default on iProc SoCs
  ti-soc-thermal: Fix a typo in a comment line
  ti-soc-thermal: Delete error messages for failed memory allocations in ti_bandgap_build()
  ti-soc-thermal: Use devm_kcalloc() in ti_bandgap_build()
  thermal: core: make thermal_emergency_poweroff static
  thermal: qoriq: remove useless call for of_thermal_get_trip_points()
2017-05-28 16:18:27 -07:00
Linus Torvalds 249f1efd8e TTY/Serial fixes for 4.12-rc3
Here are some serial and tty fixes for 4.12-rc3.  They are a bit
 "bigger" than normal, which is why I had them "bake" in linux-next for a
 few weeks and didn't send them to you for -rc2.
 
 They revert a few of the serdev patches from 4.12-rc1, and bring things
 back to how they were in 4.11, to try to make things a bit more stable
 there.  Rob and Johan both agree that this is the way forward, so this
 isn't people squabbling over semantics.  Other than that, just a few
 minor serial driver fixes that people have had problems with.
 
 All of these have been in linux-next for a few weeks with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWSlOHA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylDCACgn7RHT16JUASggJmRUBeadxQcFQAAnjtxX2kc
 0AQLqXxqGyFxVZClAYMy
 =Y6+X
 -----END PGP SIGNATURE-----

Merge tag 'tty-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
 "Here are some serial and tty fixes for 4.12-rc3. They are a bit bigger
  than normal, which is why I had them bake in linux-next for a few
  weeks and didn't send them to you for -rc2.

  They revert a few of the serdev patches from 4.12-rc1, and bring
  things back to how they were in 4.11, to try to make things a bit more
  stable there. Rob and Johan both agree that this is the way forward,
  so this isn't people squabbling over semantics. Other than that, just
  a few minor serial driver fixes that people have had problems with.

  All of these have been in linux-next for a few weeks with no reported
  issues"

* tag 'tty-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: altera_uart: call iounmap() at driver remove
  serial: imx: ensure UCR3 and UFCR are setup correctly
  MAINTAINERS/serial: Change maintainer of jsm driver
  serial: enable serdev support
  tty/serdev: add serdev registration interface
  serdev: Restore serdev_device_write_buf for atomic context
  serial: core: fix crash in uart_suspend_port
  tty: fix port buffer locking
  tty: ehv_bytechan: clean up init error handling
  serial: ifx6x60: fix use-after-free on module unload
  serial: altera_jtaguart: adding iounmap()
  serial: exar: Fix stuck MSIs
  serial: efm32: Fix parity management in 'efm32_uart_console_get_options()'
  serdev: fix tty-port client deregistration
  Revert "tty_port: register tty ports with serdev bus"
  drivers/tty: 8250: only call fintek_8250_probe when doing port I/O
2017-05-27 09:39:09 -07:00
Linus Torvalds 6f68a6ae1f powerpc fixes for 4.12 #4
Fix running SPU programs on Cell, and a few other minor fixes.
 
 Thanks to:
   Alistair Popple, Jeremy Kerr, Michael Neuling, Nicholas Piggin.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZKURPAAoJEFHr6jzI4aWAhckQAJxZZHt2OMbdNu0PxHdhZgxo
 +eSODIF0jvzBnYs/Pe9aqqrxuONxW+ioclyUIVYLUlHwLjCGf7x2Y5tJe0OmEff6
 ZOaUUcwECKw4cy2UJY6NCGv0nw/8INTDN5xPcQq9M8gExmX6plTmbnQg8Y10ONdQ
 LYu36GWyXF4ygblvLo7kXs8tuZYKozO6kPRqxiQ3zML2dOAyqWqPwpnoWSc6c7oR
 W+z/Vuxe3lTR+QHbfvnSpQhmdVi+WEnwFvgNmIise5R9Jd30Q1f1vES5E089ifmN
 b0Qi5/gkb6YWBkROvxTARFRdmU0/YPNDFWUsuyHJB/Nz1MnqqXx5X+5KpqqinPya
 azVoYW010x2zawm0aX+BF/WeH5ymsl++R84/aO8UR0fA2AIwEOQeLGWZvaZb8CMl
 9vd3NqCJ+diBhgCHiHp80pjD978bqt7Ls1nfbHhYTJ31HRT8Yz/ympWOhLE6rp+t
 kGR+UOHduaZWK3KHoE2WIoUFJuQMvRgFmjoy2G+YaK/PcUc8OA+90v1665fnbk+N
 wmZyAirP39gveHkHXDywqbEjN4CSMgsqrRW/KwPo0b4mf2m3rsIAshO9pBbZRv+P
 evhrAkCYRv5zGbGIYJ/TiEyball+8NQzxzoYmMzq62pjE27gyIe94Sqy80U4zyOC
 RqqUxflOBgMDC8Ufc30u
 =EM32
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Fix running SPU programs on Cell, and a few other minor fixes.

  Thanks to Alistair Popple, Jeremy Kerr, Michael Neuling, Nicholas
  Piggin"

* tag 'powerpc-4.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc: Add PPC_FEATURE userspace bits for SCV and DARN instructions
  powerpc/spufs: Fix hash faults for kernel regions
  powerpc: Fix booting P9 hash with CONFIG_PPC_RADIX_MMU=N
  powerpc/powernv/npu-dma.c: Fix opal_npu_destroy_context() call
  selftests/powerpc: Fix TM resched DSCR test with some compilers
2017-05-27 09:28:34 -07:00
Linus Torvalds 38e6bf238d Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A series of fixes for X86:

   - The final fix for the end-of-stack issue in the unwinder
   - Handle non PAT systems gracefully
   - Prevent access to uninitiliazed memory
   - Move early delay calaibration after basic init
   - Fix Kconfig help text
   - Fix a cross compile issue
   - Unbreak older make versions"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/timers: Move simple_udelay_calibration past init_hypervisor_platform
  x86/alternatives: Prevent uninitialized stack byte read in apply_alternatives()
  x86/PAT: Fix Xorg regression on CPUs that don't support PAT
  x86/watchdog: Fix Kconfig help text file path reference to lockup watchdog documentation
  x86/build: Permit building with old make versions
  x86/unwind: Add end-of-stack check for ftrace handlers
  Revert "x86/entry: Fix the end of the stack for newly forked tasks"
  x86/boot: Use CROSS_COMPILE prefix for readelf
2017-05-27 09:17:58 -07:00
Linus Torvalds 39b8ab31bc Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixlet from Thomas Gleixner:
 "Silence dmesg spam by making the posix cpu timer printks depend on
  print_fatal_signals"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-timers: Make signal printks conditional
2017-05-27 09:14:24 -07:00
Linus Torvalds de0b9d751b Merge branch 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS fixes from Thomas Gleixner:
 "Two fixlets for RAS:

   - Export memory_error() so the NFIT module can utilize it

   - Handle memory errors in NFIT correctly"

* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  acpi, nfit: Fix the memory error check in nfit_handle_mce()
  x86/MCE: Export memory_error()
2017-05-27 09:06:43 -07:00
Linus Torvalds fac3fcae32 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf tooling fixes from Thomas Gleixner:

 - Synchronization of tools and kernel headers

 - A series of fixes for perf report addressing various failures:
    * Handle invalid maps proper
    * Plug a memory leak
    * Handle frames and callchain order correctly

 - Fixes for handling inlines and children mode

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tools/include: Sync kernel ABI headers with tooling headers
  perf tools: Put caller above callee in --children mode
  perf report: Do not drop last inlined frame
  perf report: Always honor callchain order for inlined nodes
  perf script: Add --inline option for debugging
  perf report: Fix off-by-one for non-activation frames
  perf report: Fix memory leak in addr2line when called by addr2inlines
  perf report: Don't crash on invalid maps in `-g srcline` mode
2017-05-27 09:02:41 -07:00
Linus Torvalds 805f286907 Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Thomas Gleixner:
 "A fix for a state leak which was introduced in the recent rework of
  futex/rtmutex interaction"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex,rt_mutex: Fix rt_mutex_cleanup_proxy_lock()
2017-05-27 08:59:37 -07:00
Linus Torvalds d024baa58a Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull kthread fix from Thomas Gleixner:
 "A single fix which prevents a use after free when kthread fork fails"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kthread: Fix use-after-free if kthread fork fails
2017-05-27 08:52:27 -07:00
Linus Torvalds 77d6465695 There's been a few memory issues found with ftrace.
One was simply a memory leak where not all was being freed that should
 have been in releasing a file pointer on set_graph_function.
 
 Then Thomas found that the ftrace trampolines were marked for read/write
 as well as execute. To shrink the possible attack surface, he added
 calls to set them to ro. Which also uncovered some other issues with
 freeing module allocated memory that had its permissions changed.
 
 Kprobes had a similar issue which is fixed and a selftest was added
 to trigger that issue again.
 -----BEGIN PGP SIGNATURE-----
 
 iQExBAABCAAbBQJZKOiVFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
 vBoH/jxVozuAEVCv+Nbj6fhRxe4emjo0lZZb32EbEaSV/nUQGqHIZFdDQtbt+ld+
 sn06/BSMBI+L4BqLj1BCAW0e/zIn/4birIg53SX5jQwc3AlhUG7HS2d+RJZZCrp9
 Zofq9L6xZ4Hl2XjkPXqwEgtrwxQtkIPLlJqeYDJ6BVrlPfOPEwB7bfR7B684wiYT
 6h2Qo7f/ZQzgJ1sK8N2IjHEnAgE08KCYcj4IB4WHJk6SqQz3bv1Y00WBg2UQihVT
 TPPSVhYLnrSw53fxyALqZbHo2DvnQf1TnNadWxvSIpbvgm/T5GG60FDtvHgNfbwz
 yKuKAog+P9xBLkoAcfvODLY9O5s=
 =75TZ
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull ftrace fixes from Steven Rostedt:
 "There's been a few memory issues found with ftrace.

  One was simply a memory leak where not all was being freed that should
  have been in releasing a file pointer on set_graph_function.

  Then Thomas found that the ftrace trampolines were marked for
  read/write as well as execute. To shrink the possible attack surface,
  he added calls to set them to ro. Which also uncovered some other
  issues with freeing module allocated memory that had its permissions
  changed.

  Kprobes had a similar issue which is fixed and a selftest was added to
  trigger that issue again"

* tag 'trace-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  x86/ftrace: Make sure that ftrace trampolines are not RWX
  x86/mm/ftrace: Do not bug in early boot on irqs_disabled in cpu_flush_range()
  selftests/ftrace: Add a testcase for many kprobe events
  kprobes/x86: Fix to set RWX bits correctly before releasing trampoline
  ftrace: Fix memory leak in ftrace_graph_release()
2017-05-27 08:30:30 -07:00
Thomas Gleixner 6ee98ffeea x86/ftrace: Make sure that ftrace trampolines are not RWX
ftrace use module_alloc() to allocate trampoline pages. The mapping of
module_alloc() is RWX, which makes sense as the memory is written to right
after allocation. But nothing makes these pages RO after writing to them.

Add proper set_memory_rw/ro() calls to protect the trampolines after
modification.

Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705251056410.1862@nanos

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-26 22:37:02 -04:00
Steven Rostedt (VMware) a53276e282 x86/mm/ftrace: Do not bug in early boot on irqs_disabled in cpu_flush_range()
With function tracing starting in early bootup and having its trampoline
pages being read only, a bug triggered with the following:

kernel BUG at arch/x86/mm/pageattr.c:189!
invalid opcode: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc2-test+ #3
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
task: ffffffffb4222500 task.stack: ffffffffb4200000
RIP: 0010:change_page_attr_set_clr+0x269/0x302
RSP: 0000:ffffffffb4203c88 EFLAGS: 00010046
RAX: 0000000000000046 RBX: 0000000000000000 RCX: 00000001b6000000
RDX: ffffffffb4203d40 RSI: 0000000000000000 RDI: ffffffffb4240d60
RBP: ffffffffb4203d18 R08: 00000001b6000000 R09: 0000000000000001
R10: ffffffffb4203aa8 R11: 0000000000000003 R12: ffffffffc029b000
R13: ffffffffb4203d40 R14: 0000000000000001 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff9a639ea00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff9a636b384000 CR3: 00000001ea21d000 CR4: 00000000000406b0
Call Trace:
 change_page_attr_clear+0x1f/0x21
 set_memory_ro+0x1e/0x20
 arch_ftrace_update_trampoline+0x207/0x21c
 ? ftrace_caller+0x64/0x64
 ? 0xffffffffc029b000
 ftrace_startup+0xf4/0x198
 register_ftrace_function+0x26/0x3c
 function_trace_init+0x5e/0x73
 tracer_init+0x1e/0x23
 tracing_set_tracer+0x127/0x15a
 register_tracer+0x19b/0x1bc
 init_function_trace+0x90/0x92
 early_trace_init+0x236/0x2b3
 start_kernel+0x200/0x3f5
 x86_64_start_reservations+0x29/0x2b
 x86_64_start_kernel+0x17c/0x18f
 secondary_startup_64+0x9f/0x9f
 ? secondary_startup_64+0x9f/0x9f

Interrupts should not be enabled at this early in the boot process. It is
also fine to leave interrupts enabled during this time as there's only one
CPU running, and on_each_cpu() means to only run on the current CPU.

If early_boot_irqs_disabled is set, it is safe to run cpu_flush_range() with
interrupts disabled. Don't trigger a BUG_ON() in that case.

Link: http://lkml.kernel.org/r/20170526093717.0be3b849@gandalf.local.home
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-26 22:37:01 -04:00