Commit Graph

210 Commits

Author SHA1 Message Date
Linus Torvalds d42f323a7d Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "A few misc subsystems and some of MM.

  175 patches.

  Subsystems affected by this patch series: ia64, kbuild, scripts, sh,
  ocfs2, kfifo, vfs, kernel/watchdog, and mm (slab-generic, slub,
  kmemleak, debug, pagecache, msync, gup, memremap, memcg, pagemap,
  mremap, dma, sparsemem, vmalloc, documentation, kasan, initialization,
  pagealloc, and memory-failure)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (175 commits)
  mm/memory-failure: unnecessary amount of unmapping
  mm/mmzone.h: fix existing kernel-doc comments and link them to core-api
  mm: page_alloc: ignore init_on_free=1 for debug_pagealloc=1
  net: page_pool: use alloc_pages_bulk in refill code path
  net: page_pool: refactor dma_map into own function page_pool_dma_map
  SUNRPC: refresh rq_pages using a bulk page allocator
  SUNRPC: set rq_page_end differently
  mm/page_alloc: inline __rmqueue_pcplist
  mm/page_alloc: optimize code layout for __alloc_pages_bulk
  mm/page_alloc: add an array-based interface to the bulk page allocator
  mm/page_alloc: add a bulk page allocator
  mm/page_alloc: rename alloced to allocated
  mm/page_alloc: duplicate include linux/vmalloc.h
  mm, page_alloc: avoid page_to_pfn() in move_freepages()
  mm/Kconfig: remove default DISCONTIGMEM_MANUAL
  mm: page_alloc: dump migrate-failed pages
  mm/mempolicy: fix mpol_misplaced kernel-doc
  mm/mempolicy: rewrite alloc_pages_vma documentation
  mm/mempolicy: rewrite alloc_pages documentation
  mm/mempolicy: rename alloc_pages_current to alloc_pages
  ...
2021-04-30 14:38:01 -07:00
Nicholas Piggin 8309c9d717 powerpc: inline huge vmap supported functions
This allows unsupported levels to be constant folded away, and so
p4d_free_pud_page can be removed because it's no longer linked to.

Link: https://lkml.kernel.org/r/20210317062402.533919-8-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ding Tianhong <dingtianhong@huawei.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:40 -07:00
Nicholas Piggin bbc180a5ad mm: HUGE_VMAP arch support cleanup
This changes the awkward approach where architectures provide init
functions to determine which levels they can provide large mappings for,
to one where the arch is queried for each call.

This removes code and indirection, and allows constant-folding of dead
code for unsupported levels.

This also adds a prot argument to the arch query.  This is unused
currently but could help with some architectures (e.g., some powerpc
processors can't map uncacheable memory with large pages).

Link: https://lkml.kernel.org/r/20210317062402.533919-7-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:40 -07:00
Xiongwei Song 7153d4bf0b powerpc/traps: Enhance readability for trap types
Define macros to list ppc interrupt types in interttupt.h, replace the
reference of the trap hex values with these macros.

Referred the hex numbers in arch/powerpc/kernel/exceptions-64e.S,
arch/powerpc/kernel/exceptions-64s.S, arch/powerpc/kernel/head_*.S,
arch/powerpc/kernel/head_booke.h and arch/powerpc/include/asm/kvm_asm.h.

Signed-off-by: Xiongwei Song <sxwjean@gmail.com>
[mpe: Resolve conflicts in nmi_disables_ftrace(), fix 40x build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1618398033-13025-1-git-send-email-sxwjean@me.com
2021-04-17 22:20:19 +10:00
Michael Ellerman 7098f8f0cf powerpc/mm/radix: Make radix__change_memory_range() static
The lkp bot pointed out that with W=1 we get:

  arch/powerpc/mm/book3s64/radix_pgtable.c:183:6: error: no previous
  prototype for 'radix__change_memory_range'

Which is really saying that it could be static, make it so.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2021-04-14 23:04:45 +10:00
Nicholas Piggin c45ba4f44f powerpc: clean up do_page_fault
search_exception_tables + __bad_page_fault can be substituted with
bad_page_fault, do_page_fault no longer needs to return a value
to asm for any sub-architecture, and __bad_page_fault can be static.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-10-npiggin@gmail.com
2021-04-14 23:04:44 +10:00
Vaibhav Jain a5d6a3e73a powerpc/mm: Add cond_resched() while removing hpte mappings
While removing large number of mappings from hash page tables for
large memory systems as soft-lockup is reported because of the time
spent inside htap_remove_mapping() like one below:

 watchdog: BUG: soft lockup - CPU#8 stuck for 23s!
 <snip>
 NIP plpar_hcall+0x38/0x58
 LR  pSeries_lpar_hpte_invalidate+0x68/0xb0
 Call Trace:
  0x1fffffffffff000 (unreliable)
  pSeries_lpar_hpte_removebolted+0x9c/0x230
  hash__remove_section_mapping+0xec/0x1c0
  remove_section_mapping+0x28/0x3c
  arch_remove_memory+0xfc/0x150
  devm_memremap_pages_release+0x180/0x2f0
  devm_action_release+0x30/0x50
  release_nodes+0x28c/0x300
  device_release_driver_internal+0x16c/0x280
  unbind_store+0x124/0x170
  drv_attr_store+0x44/0x60
  sysfs_kf_write+0x64/0x90
  kernfs_fop_write+0x1b0/0x290
  __vfs_write+0x3c/0x70
  vfs_write+0xd4/0x270
  ksys_write+0xdc/0x130
  system_call+0x5c/0x70

Fix this by adding a cond_resched() to the loop in
htap_remove_mapping() that issues hcall to remove hpte mapping. The
call to cond_resched() is issued every HZ jiffies which should prevent
the soft-lockup from being reported.

Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210404163148.321346-1-vaibhav@linux.ibm.com
2021-04-14 23:04:12 +10:00
Michael Ellerman 87e65ad7bd powerpc/mm/64s/hash: Add real-mode change_memory_range() for hash LPAR
When we enabled STRICT_KERNEL_RWX we received some reports of boot
failures when using the Hash MMU and running under phyp. The crashes
are intermittent, and often exhibit as a completely unresponsive
system, or possibly an oops.

One example, which was caught in xmon:

  [   14.068327][    T1] devtmpfs: mounted
  [   14.069302][    T1] Freeing unused kernel memory: 5568K
  [   14.142060][  T347] BUG: Unable to handle kernel instruction fetch
  [   14.142063][    T1] Run /sbin/init as init process
  [   14.142074][  T347] Faulting instruction address: 0xc000000000004400
  cpu 0x2: Vector: 400 (Instruction Access) at [c00000000c7475e0]
      pc: c000000000004400: exc_virt_0x4400_instruction_access+0x0/0x80
      lr: c0000000001862d4: update_rq_clock+0x44/0x110
      sp: c00000000c747880
     msr: 8000000040001031
    current = 0xc00000000c60d380
    paca    = 0xc00000001ec9de80   irqmask: 0x03   irq_happened: 0x01
      pid   = 347, comm = kworker/2:1
  ...
  enter ? for help
  [c00000000c747880] c0000000001862d4 update_rq_clock+0x44/0x110 (unreliable)
  [c00000000c7478f0] c000000000198794 update_blocked_averages+0xb4/0x6d0
  [c00000000c7479f0] c000000000198e40 update_nohz_stats+0x90/0xd0
  [c00000000c747a20] c0000000001a13b4 _nohz_idle_balance+0x164/0x390
  [c00000000c747b10] c0000000001a1af8 newidle_balance+0x478/0x610
  [c00000000c747be0] c0000000001a1d48 pick_next_task_fair+0x58/0x480
  [c00000000c747c40] c000000000eaab5c __schedule+0x12c/0x950
  [c00000000c747cd0] c000000000eab3e8 schedule+0x68/0x120
  [c00000000c747d00] c00000000016b730 worker_thread+0x130/0x640
  [c00000000c747da0] c000000000174d50 kthread+0x1a0/0x1b0
  [c00000000c747e10] c00000000000e0f0 ret_from_kernel_thread+0x5c/0x6c

This shows that CPU 2, which was idle, woke up and then appears to
randomly take an instruction fault on a completely valid area of
kernel text.

The cause turns out to be the call to hash__mark_rodata_ro(), late in
boot. Due to the way we layout text and rodata, that function actually
changes the permissions for all of text and rodata to read-only plus
execute.

To do the permission change we use a hypervisor call, H_PROTECT. On
phyp that appears to be implemented by briefly removing the mapping of
the kernel text, before putting it back with the updated permissions.
If any other CPU is executing during that window, it will see spurious
faults on the kernel text and/or data, leading to crashes.

To fix it we use stop machine to collect all other CPUs, and then have
them drop into real mode (MMU off), while we change the mapping. That
way they are unaffected by the mapping temporarily disappearing.

We don't see this bug on KVM because KVM always use VPM=1, where
faults are directed to the hypervisor, and the fault will be
serialised vs the h_protect() by HPTE_V_HVLOCK.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331003845.216246-5-mpe@ellerman.id.au
2021-04-08 21:17:43 +10:00
Michael Ellerman 6f223ebe9c powerpc/mm/64s/hash: Factor out change_memory_range()
Pull the loop calling hpte_updateboltedpp() out of
hash__change_memory_range() into a helper function. We need it to be a
separate function for the next patch.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331003845.216246-4-mpe@ellerman.id.au
2021-04-08 21:17:43 +10:00
Michael Ellerman 2c02e656a2 powerpc/64s: Use htab_convert_pte_flags() in hash__mark_rodata_ro()
In hash__mark_rodata_ro() we pass the raw PP_RXXX value to
hash__change_memory_range(). That has the effect of setting the key to
zero, because PP_RXXX contains no key value.

Fix it by using htab_convert_pte_flags(), which knows how to convert a
pgprot into a pp value, including the key.

Fixes: d94b827e89 ("powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash translation")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Link: https://lore.kernel.org/r/20210331003845.216246-3-mpe@ellerman.id.au
2021-04-08 21:17:43 +10:00
Jordan Niethe b8b2f37cf6 powerpc/64s: Fix pte update for kernel memory on radix
When adding a PTE a ptesync is needed to order the update of the PTE
with subsequent accesses otherwise a spurious fault may be raised.

radix__set_pte_at() does not do this for performance gains. For
non-kernel memory this is not an issue as any faults of this kind are
corrected by the page fault handler. For kernel memory these faults
are not handled. The current solution is that there is a ptesync in
flush_cache_vmap() which should be called when mapping from the
vmalloc region.

However, map_kernel_page() does not call flush_cache_vmap(). This is
troublesome in particular for code patching with Strict RWX on radix.
In do_patch_instruction() the page frame that contains the instruction
to be patched is mapped and then immediately patched. With no ordering
or synchronization between setting up the PTE and writing to the page
it is possible for faults.

As the code patching is done using __put_user_asm_goto() the resulting
fault is obscured - but using a normal store instead it can be seen:

  BUG: Unable to handle kernel data access on write at 0xc008000008f24a3c
  Faulting instruction address: 0xc00000000008bd74
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in: nop_module(PO+) [last unloaded: nop_module]
  CPU: 4 PID: 757 Comm: sh Tainted: P           O      5.10.0-rc5-01361-ge3c1b78c8440-dirty #43
  NIP:  c00000000008bd74 LR: c00000000008bd50 CTR: c000000000025810
  REGS: c000000016f634a0 TRAP: 0300   Tainted: P           O       (5.10.0-rc5-01361-ge3c1b78c8440-dirty)
  MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 44002884  XER: 00000000
  CFAR: c00000000007c68c DAR: c008000008f24a3c DSISR: 42000000 IRQMASK: 1

This results in the kind of issue reported here:
  https://lore.kernel.org/linuxppc-dev/15AC5B0E-A221-4B8C-9039-FA96B8EF7C88@lca.pw/

Chris Riedl suggested a reliable way to reproduce the issue:
  $ mount -t debugfs none /sys/kernel/debug
  $ (while true; do echo function > /sys/kernel/debug/tracing/current_tracer ; echo nop > /sys/kernel/debug/tracing/current_tracer ; done) &

Turning ftrace on and off does a large amount of code patching which
in usually less then 5min will crash giving a trace like:

   ftrace-powerpc: (____ptrval____): replaced (4b473b11) != old (60000000)
   ------------[ ftrace bug ]------------
   ftrace failed to modify
   [<c000000000bf8e5c>] napi_busy_loop+0xc/0x390
    actual:   11:3b:47:4b
   Setting ftrace call site to call ftrace function
   ftrace record flags: 80000001
    (1)
    expected tramp: c00000000006c96c
   ------------[ cut here ]------------
   WARNING: CPU: 4 PID: 809 at kernel/trace/ftrace.c:2065 ftrace_bug+0x28c/0x2e8
   Modules linked in: nop_module(PO-) [last unloaded: nop_module]
   CPU: 4 PID: 809 Comm: sh Tainted: P           O      5.10.0-rc5-01360-gf878ccaf250a #1
   NIP:  c00000000024f334 LR: c00000000024f330 CTR: c0000000001a5af0
   REGS: c000000004c8b760 TRAP: 0700   Tainted: P           O       (5.10.0-rc5-01360-gf878ccaf250a)
   MSR:  900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28008848  XER: 20040000
   CFAR: c0000000001a9c98 IRQMASK: 0
   GPR00: c00000000024f330 c000000004c8b9f0 c000000002770600 0000000000000022
   GPR04: 00000000ffff7fff c000000004c8b6d0 0000000000000027 c0000007fe9bcdd8
   GPR08: 0000000000000023 ffffffffffffffd8 0000000000000027 c000000002613118
   GPR12: 0000000000008000 c0000007fffdca00 0000000000000000 0000000000000000
   GPR16: 0000000023ec37c5 0000000000000000 0000000000000000 0000000000000008
   GPR20: c000000004c8bc90 c0000000027a2d20 c000000004c8bcd0 c000000002612fe8
   GPR24: 0000000000000038 0000000000000030 0000000000000028 0000000000000020
   GPR28: c000000000ff1b68 c000000000bf8e5c c00000000312f700 c000000000fbb9b0
   NIP ftrace_bug+0x28c/0x2e8
   LR  ftrace_bug+0x288/0x2e8
   Call Trace:
     ftrace_bug+0x288/0x2e8 (unreliable)
     ftrace_modify_all_code+0x168/0x210
     arch_ftrace_update_code+0x18/0x30
     ftrace_run_update_code+0x44/0xc0
     ftrace_startup+0xf8/0x1c0
     register_ftrace_function+0x4c/0xc0
     function_trace_init+0x80/0xb0
     tracing_set_tracer+0x2a4/0x4f0
     tracing_set_trace_write+0xd4/0x130
     vfs_write+0xf0/0x330
     ksys_write+0x84/0x140
     system_call_exception+0x14c/0x230
     system_call_common+0xf0/0x27c

To fix this when updating kernel memory PTEs using ptesync.

Fixes: f1cb8f9beb ("powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags")
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Tidy up change log slightly]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210208032957.1232102-1-jniethe5@gmail.com
2021-04-08 21:17:42 +10:00
Bhaskar Chowdhury 4763d37827 powerpc: Spelling/typo fixes
Various spelling/typo fixes.

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2021-04-08 21:17:42 +10:00
Nicholas Piggin 1479e3d3b7 powerpc/64s: Fix hash fault to use TRAP accessor
Hash faults use the trap vector to decide whether this is an
instruction or data fault. This should use the TRAP accessor
rather than open access regs->trap.

This won't cause a problem at the moment because 64s only uses
trap flags for system call interrupts (the norestart flag), but
that could change if any other trap flags get used in future.

Fixes: a4922f5442 ("powerpc/64s: move the hash fault handling logic to C")
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316105205.407767-1-npiggin@gmail.com
2021-03-29 13:22:15 +11:00
Michael Ellerman c2a2a5d027 powerpc/64s: Fold update_current_thread_[i]amr() into their only callers
lkp reported warnings in some configuration due to
update_current_thread_amr() being unused:

  arch/powerpc/mm/book3s64/pkeys.c:284:20: error: unused function 'update_current_thread_amr'
  static inline void update_current_thread_amr(u64 value)

Which is because it's only use is inside an ifdef. We could move it
inside the ifdef, but it's a single line function and only has one
caller, so just fold it in.

Similarly update_current_thread_iamr() is small and only called once,
so fold it in also.

Fixes: 48a8ab4eeb ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode.")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210314093320.132331-1-mpe@ellerman.id.au
2021-03-29 13:22:14 +11:00
Bhaskar Chowdhury 7a7d744ffe powerpc/mm/book3s64: Fix a typo in mmu_context.c
s/detalis/details/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210312112537.4585-1-unixbhaskar@gmail.com
2021-03-29 13:22:12 +11:00
Linus Torvalds b12b472496 powerpc updates for 5.12
A large series adding wrappers for our interrupt handlers, so that irq/nmi/user
 tracking can be isolated in the wrappers rather than spread in each handler.
 
 Conversion of the 32-bit syscall handling into C.
 
 A series from Nick to streamline our TLB flushing when using the Radix MMU.
 
 Switch to using queued spinlocks by default for 64-bit server CPUs.
 
 A rework of our PCI probing so that it happens later in boot, when more generic
 infrastructure is available.
 
 Two small fixes to allow 32-bit little-endian processes to run on 64-bit
 kernels.
 
 Other smaller features, fixes & cleanups.
 
 Thanks to:
   Alexey Kardashevskiy, Ananth N Mavinakayanahalli, Aneesh Kumar K.V, Athira
   Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chengyang Fan, Christophe Leroy,
   Christopher M. Riedl, Fabiano Rosas, Florian Fainelli, Frederic Barrat, Ganesh
   Goudar, Hari Bathini, Jiapeng Chong, Joseph J Allen, Kajol Jain, Markus
   Elfring, Michal Suchanek, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver
   O'Halloran, Pingfan Liu, Po-Hsu Lin, Qian Cai, Ram Pai, Randy Dunlap, Sandipan
   Das, Stephen Rothwell, Tyrel Datwyler, Will Springer, Yury Norov, Zheng
   Yongjun.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmAzMagTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgAbBD/wMS2g1Q9oAGZPsx2NGd2RoeAauGxUs
 Yj6cZVmR+oa6sJyFYgEG7dT7tcwJITQxLBD3HpsHSnJ/rLrMloE33+cZNA9c4STz
 0mlzm3R7M5pOgcEqZglsgLP0RQeUuHSSF01g0kf1N3r+HYtmbmPjuUIl8CnAjlbT
 iMD2ZN2p8/r3kDDht0iBO534HUpsqhc00duSZgQhsV/PR7ZWVxoPk7PEJeo4vXlJ
 77986F7J5NLUTjMiLv5lTx49FcPbRd7a1jubsBtahJrwXj2GVvuy2i86G7HY+a+B
 eSxN7zJQgaFeLo0YPo7fZLBI0MAsIQt3nnZhKX0TMglbv/K8Aq64xiJqsVQdJ883
 CeEt0HvSJhsSC0C4O595NEINfDhDd+5IeSF9MvsujYXiUKRXtRkm1EPuAzTcZIzW
 NwkCLRo33NMXa+khMKaiqF/g7INayPUXoWESx75NXFsuNfcORvstkeUuEoi5GwJo
 TSlmosFqwRjghQ8eTLZuWBzmh3EpPGdtC4gm6D+lbzhzjah5c/1whyuLqra275kK
 E3Qt0/V0ixKyvlG7MI5yYh3L7+R/hrsflH7xIJJxZp2DW6mwBJzQYmkxDbSS8PzK
 nWien2XgpIQhSFat3QqreEFSfNkzdN2MClVi2Y1hpAgi+2Zm9rPdPNGcQI+DSOsB
 kpJkjOjWNJU/PQ==
 =dB2S
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - A large series adding wrappers for our interrupt handlers, so that
   irq/nmi/user tracking can be isolated in the wrappers rather than
   spread in each handler.

 - Conversion of the 32-bit syscall handling into C.

 - A series from Nick to streamline our TLB flushing when using the
   Radix MMU.

 - Switch to using queued spinlocks by default for 64-bit server CPUs.

 - A rework of our PCI probing so that it happens later in boot, when
   more generic infrastructure is available.

 - Two small fixes to allow 32-bit little-endian processes to run on
   64-bit kernels.

 - Other smaller features, fixes & cleanups.

Thanks to: Alexey Kardashevskiy, Ananth N Mavinakayanahalli, Aneesh
Kumar K.V, Athira Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chengyang
Fan, Christophe Leroy, Christopher M. Riedl, Fabiano Rosas, Florian
Fainelli, Frederic Barrat, Ganesh Goudar, Hari Bathini, Jiapeng Chong,
Joseph J Allen, Kajol Jain, Markus Elfring, Michal Suchanek, Nathan
Lynch, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pingfan Liu,
Po-Hsu Lin, Qian Cai, Ram Pai, Randy Dunlap, Sandipan Das, Stephen
Rothwell, Tyrel Datwyler, Will Springer, Yury Norov, and Zheng Yongjun.

* tag 'powerpc-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (188 commits)
  powerpc/perf: Adds support for programming of Thresholding in P10
  powerpc/pci: Remove unimplemented prototypes
  powerpc/uaccess: Merge raw_copy_to_user_allowed() into raw_copy_to_user()
  powerpc/uaccess: Merge __put_user_size_allowed() into __put_user_size()
  powerpc/uaccess: get rid of small constant size cases in raw_copy_{to,from}_user()
  powerpc/64: Fix stack trace not displaying final frame
  powerpc/time: Remove get_tbl()
  powerpc/time: Avoid using get_tbl()
  spi: mpc52xx: Avoid using get_tbl()
  powerpc/syscall: Avoid storing 'current' in another pointer
  powerpc/32: Handle bookE debugging in C in syscall entry/exit
  powerpc/syscall: Do not check unsupported scv vector on PPC32
  powerpc/32: Remove the counter in global_dbcr0
  powerpc/32: Remove verification of MSR_PR on syscall in the ASM entry
  powerpc/syscall: implement system call entry/exit logic in C for PPC32
  powerpc/32: Always save non volatile GPRs at syscall entry
  powerpc/syscall: Change condition to check MSR_RI
  powerpc/syscall: Save r3 in regs->orig_r3
  powerpc/syscall: Use is_compat_task()
  powerpc/syscall: Make interrupt.c buildable on PPC32
  ...
2021-02-22 14:34:00 -08:00
Aneesh Kumar K.V ec94b9b23d powerpc/mm: Add PG_dcache_clean to indicate dcache clean state
This just add a better name for PG_arch_1. No functional change in this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210203045812.234439-2-aneesh.kumar@linux.ibm.com
2021-02-11 23:35:06 +11:00
Michael Ellerman 2bb421a3d9 powerpc/mm/64s: Fix no previous prototype warning
As reported by lkp:

  arch/powerpc/mm/book3s64/radix_tlb.c:646:6: warning: no previous
  prototype for function 'exit_lazy_flush_tlb'

Fix it by moving the prototype into the existing header.

Fixes: 032b7f0893 ("powerpc/64s/radix: serialize_against_pte_lookup IPIs trim mm_cpumask")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210210130804.3190952-2-mpe@ellerman.id.au
2021-02-11 23:28:51 +11:00
Nicholas Piggin 032b7f0893 powerpc/64s/radix: serialize_against_pte_lookup IPIs trim mm_cpumask
serialize_against_pte_lookup() performs IPIs to all CPUs in mm_cpumask.
Take this opportunity to try trim the CPU out of mm_cpumask. This can
reduce the cost of future serialize_against_pte_lookup() and/or the
cost of future TLB flushes.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-7-npiggin@gmail.com
2021-02-09 01:09:45 +11:00
Nicholas Piggin 9393544842 powerpc/64s/radix: occasionally attempt to trim mm_cpumask
A single-threaded process that is flushing its own address space is
so far the only case where the mm_cpumask is attempted to be trimmed.
This patch expands that to flush in other situations, multi-threaded
processes and external sources. For now it's a relatively simple
occasional trim attempt. The main aim is to add the mechanism,
tweaking and tuning can come with more data.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-6-npiggin@gmail.com
2021-02-09 01:09:45 +11:00
Nicholas Piggin 780de40601 powerpc/64s/radix: Allow mm_cpumask trimming from external sources
mm_cpumask trimming is currently restricted to be issued by the current
thread of a single-threaded mm. This patch relaxes that and allows the
mask to be trimmed from any context.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-5-npiggin@gmail.com
2021-02-09 01:09:44 +11:00
Nicholas Piggin 54bb503345 powerpc/64s/radix: Check for no TLB flush required
If there are no CPUs in mm_cpumask, no TLB flush is required at all.
This patch adds a check for this case.

Currently it's not tested for, in fact mm_is_thread_local() returns
false if the current CPU is not in mm_cpumask, so it's treated as a
global flush.

This can come up in some cases like exec failure before the new mm has
ever been switched to. This patch reduces TLBIE instructions required
to build a kernel from about 120,000 to 45,000. Another situation it
could help is page reclaim, KSM, THP, etc., (i.e., asynch operations
external to the process) where the process is sleeping and has all TLBs
flushed out of all CPUs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-4-npiggin@gmail.com
2021-02-09 01:09:44 +11:00
Nicholas Piggin 26418b36a1 powerpc/64s/radix: refactor TLB flush type selection
The logic to decide what kind of TLB flush is required (local, global,
or IPI) is spread multiple times over the several kinds of TLB flushes.

Move it all into a single function which may issue IPIs if necessary,
and also returns a flush type that is to be used.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-3-npiggin@gmail.com
2021-02-09 01:09:44 +11:00
Nicholas Piggin a2496049f1 powerpc/64s/radix: add warning and comments in mm_cpumask trim
Add a comment explaining part of the logic for mm_cpumask trimming, and
add a (hopefully graceful) check and warning in case something gets it
wrong.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201217134731.488135-2-npiggin@gmail.com
2021-02-09 01:09:44 +11:00
Nicholas Piggin 540d4d34be powerpc/64: context tracking move to interrupt wrappers
This moves exception_enter/exit calls to wrapper functions for
synchronous interrupts. More interrupt handlers are covered by
this than previously.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-33-npiggin@gmail.com
2021-02-09 00:10:46 +11:00
Nicholas Piggin a008f8f9fd powerpc/64s/hash: improve context tracking of hash faults
This moves the 64s/hash context tracking from hash_page_mm() to
__do_hash_fault(), so it's no longer called by OCXL / SPU
accelerators, which was certainly the wrong thing to be doing,
because those callers are not low level interrupt handlers, so
should have entered a kernel context tracking already.

Then remain in kernel context for the duration of the fault,
rather than enter/exit for the hash fault then enter/exit for
the page fault, which is pointless.

Even still, calling exception_enter/exit in __do_hash_fault seems
questionable because that's touching per-cpu variables, tracing,
etc., which might have been interrupted by this hash fault or
themselves cause hash faults. But maybe I miss something because
hash_page_mm very deliberately calls trace_hash_fault too, for
example. So for now go with it, it's no worse than before, in this
regard.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-32-npiggin@gmail.com
2021-02-09 00:02:12 +11:00
Nicholas Piggin 3a96570ffc powerpc: convert interrupt handlers to use wrappers
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-29-npiggin@gmail.com
2021-02-09 00:02:12 +11:00
Nicholas Piggin e44370abb2 powerpc/64s: slb comment update
This makes a small improvement to the description of the SLB interrupt
environment. Move the memory access restrictions into one paragraph,
and the interrupt restrictions into the next rather than mix them.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-18-npiggin@gmail.com
2021-02-09 00:02:10 +11:00
Nicholas Piggin bf0e2374aa powerpc/64s: split do_hash_fault
This is required for subsequent interrupt wrapper implementation.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-16-npiggin@gmail.com
2021-02-09 00:02:10 +11:00
Nicholas Piggin 8458c628a5 powerpc: bad_page_fault get registers from regs
Similar to the previous patch this makes interrupt handler function
types more regular so they can be wrapped with the next patch.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-12-npiggin@gmail.com
2021-02-09 00:02:09 +11:00
Nicholas Piggin a01a3f2ddb powerpc: remove arguments from fault handler functions
Make mm fault handlers all just take the pt_regs * argument and load
DAR/DSISR from that. Make those that return a value return long.

This is done to make the function signatures match other handlers, which
will help with a future patch to add wrappers. Explicit arguments could
be added for performance but that would require more wrapper macro
variants.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-7-npiggin@gmail.com
2021-02-09 00:02:08 +11:00
Nicholas Piggin a4922f5442 powerpc/64s: move the hash fault handling logic to C
The fault handling still has some complex logic particularly around
hash table handling, in asm. Implement most of this in C.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-6-npiggin@gmail.com
2021-02-09 00:02:08 +11:00
Aneesh Kumar K.V 8c511eff18 powerpc/kuap: Allow kernel thread to access userspace after kthread_use_mm
This fix the bad fault reported by KUAP when io_wqe_worker access userspace.

 Bug: Read fault blocked by KUAP!
 WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0
 NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0
 LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
..........
 Call Trace:
 [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable)
 [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120
 [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c
 --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0
..........
 NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0
 LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0
 interrupt: 300
 [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280
 [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780
 [c000000016367990] [c000000000710330] iomap_file_buffered_write+0xa0/0x120
 [c0000000163679e0] [c00800000040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs]
 [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460
 [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200
 [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250
 [c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800
 [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0
 [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0
 [c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c

The kernel consider thread AMR value for kernel thread to be
AMR_KUAP_BLOCKED. Hence access to userspace is denied. This
of course not correct and we should allow userspace access after
kthread_use_mm(). To be precise, kthread_use_mm() should inherit the
AMR value of the operating address space. But, the AMR value is
thread-specific and we inherit the address space and not thread
access restrictions. Because of this ignore AMR value when accessing
userspace via kernel thread.

current_thread_amr/iamr() are updated, because we use them in the
below stack.
....
[  530.710838] CPU: 13 PID: 5587 Comm: io_wqe_worker-0 Tainted: G      D           5.11.0-rc6+ #3
....

 NIP [c0000000000aa0c8] pkey_access_permitted+0x28/0x90
 LR [c0000000004b9278] gup_pte_range+0x188/0x420
 --- interrupt: 700
 [c00000001c4ef3f0] [0000000000000000] 0x0 (unreliable)
 [c00000001c4ef490] [c0000000004bd39c] gup_pgd_range+0x3ac/0xa20
 [c00000001c4ef5a0] [c0000000004bdd44] internal_get_user_pages_fast+0x334/0x410
 [c00000001c4ef620] [c000000000852028] iov_iter_get_pages+0xf8/0x5c0
 [c00000001c4ef6a0] [c0000000007da44c] bio_iov_iter_get_pages+0xec/0x700
 [c00000001c4ef770] [c0000000006a325c] iomap_dio_bio_actor+0x2ac/0x4f0
 [c00000001c4ef810] [c00000000069cd94] iomap_apply+0x2b4/0x740
 [c00000001c4ef920] [c0000000006a38b8] __iomap_dio_rw+0x238/0x5c0
 [c00000001c4ef9d0] [c0000000006a3c60] iomap_dio_rw+0x20/0x80
 [c00000001c4ef9f0] [c008000001927a30] xfs_file_dio_aio_write+0x1f8/0x650 [xfs]
 [c00000001c4efa60] [c0080000019284dc] xfs_file_write_iter+0xc4/0x130 [xfs]
 [c00000001c4efa90] [c000000000669984] io_write+0x104/0x4b0
 [c00000001c4efbb0] [c00000000066cea4] io_issue_sqe+0x3d4/0xf50
 [c00000001c4efc60] [c000000000670200] io_wq_submit_work+0xb0/0x2f0
 [c00000001c4efcb0] [c000000000674268] io_worker_handle_work+0x248/0x4a0
 [c00000001c4efd30] [c0000000006746e8] io_wqe_worker+0x228/0x2a0
 [c00000001c4efda0] [c00000000019d994] kthread+0x1b4/0x1c0

Fixes: 48a8ab4eeb ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode.")
Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210206025634.521979-1-aneesh.kumar@linux.ibm.com
2021-02-06 23:13:04 +11:00
Qian Cai b5952f8125 powerpc/mm/book3s64/iommu: fix some RCU-list locks
It is safe to traverse mm->context.iommu_group_mem_list with either
mem_list_mutex or the RCU read lock held. Silence a few RCU-list false
positive warnings and fix a few missing RCU read locks.

 arch/powerpc/mm/book3s64/iommu_api.c:330 RCU-list traversed in non-reader section!!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 2 locks held by qemu-kvm/4305:
  #0: c000000bc3fe4d68 (&container->lock){+.+.}-{3:3}, at: tce_iommu_ioctl.part.9+0xc7c/0x1870 [vfio_iommu_spapr_tce]
  #1: c000000001501910 (mem_list_mutex){+.+.}-{3:3}, at: mm_iommu_get+0x50/0x190

 ====
 arch/powerpc/mm/book3s64/iommu_api.c:132 RCU-list traversed in non-reader section!!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 2 locks held by qemu-kvm/4305:
  #0: c000000bc3fe4d68 (&container->lock){+.+.}-{3:3}, at: tce_iommu_ioctl.part.9+0xc7c/0x1870 [vfio_iommu_spapr_tce]
  #1: c000000001501910 (mem_list_mutex){+.+.}-{3:3}, at: mm_iommu_do_alloc+0x120/0x5f0

 ====
 arch/powerpc/mm/book3s64/iommu_api.c:292 RCU-list traversed in non-reader section!!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 2 locks held by qemu-kvm/4312:
  #0: c000000ecafe23c8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0xdc/0x950 [kvm]
  #1: c000000045e6c468 (&kvm->srcu){....}-{0:0}, at: kvmppc_h_put_tce+0x88/0x340 [kvm]

 ====
 arch/powerpc/mm/book3s64/iommu_api.c:424 RCU-list traversed in non-reader section!!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 2 locks held by qemu-kvm/4312:
  #0: c000000ecafe23c8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0xdc/0x950 [kvm]
  #1: c000000045e6c468 (&kvm->srcu){....}-{0:0}, at: kvmppc_h_put_tce+0x88/0x340 [kvm]

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200510051559.1959-1-cai@lca.pw
2021-01-31 22:35:49 +11:00
Cédric Le Goater 11f9c1d2fb powerpc/mm: Move hpte_insert_repeating() prototype
It fixes this W=1 compile error :

../arch/powerpc/mm/book3s64/hash_utils.c:1867:6: error: no previous prototype for ‘hpte_insert_repeating’ [-Werror=missing-prototypes]
 1867 | long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
      |      ^~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210104143206.695198-14-clg@kaod.org
2021-01-30 11:39:29 +11:00
Linus Torvalds 8a5be36b93 powerpc updates for 5.11
- Switch to the generic C VDSO, as well as some cleanups of our VDSO
    setup/handling code.
 
  - Support for KUAP (Kernel User Access Prevention) on systems using the hashed
    page table MMU, using memory protection keys.
 
  - Better handling of PowerVM SMT8 systems where all threads of a core do not
    share an L2, allowing the scheduler to make better scheduling decisions.
 
  - Further improvements to our machine check handling.
 
  - Show registers when unwinding interrupt frames during stack traces.
 
  - Improvements to our pseries (PowerVM) partition migration code.
 
  - Several series from Christophe refactoring and cleaning up various parts of
    the 32-bit code.
 
  - Other smaller features, fixes & cleanups.
 
 Thanks to:
   Alan Modra, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Ard
   Biesheuvel, Athira Rajeev, Balamuruhan S, Bill Wendling, Cédric Le Goater,
   Christophe Leroy, Christophe Lombard, Colin Ian King, Daniel Axtens, David
   Hildenbrand, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geert
   Uytterhoeven, Giuseppe Sacco, Greg Kurz, Harish, Jan Kratochvil, Jordan
   Niethe, Kaixu Xia, Laurent Dufour, Leonardo Bras, Madhavan Srinivasan, Mahesh
   Salgaonkar, Mathieu Desnoyers, Nathan Lynch, Nicholas Piggin, Oleg Nesterov,
   Oliver O'Halloran, Oscar Salvador, Po-Hsu Lin, Qian Cai, Qinglang Miao, Randy
   Dunlap, Ravi Bangoria, Sachin Sant, Sandipan Das, Sebastian Andrzej Siewior ,
   Segher Boessenkool, Srikar Dronamraju, Tyrel Datwyler, Uwe Kleine-König,
   Vincent Stehlé, Youling Tang, Zhang Xiaoxu.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl/bURITHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgEzBEAC1Vwibcog2P9rkJPb0q3UGWSYSx25V
 h/LwwxtM9Tm14j/LZsSgkOgIsfMaWEBIw/8D4efQ7AX9aFo+R0c2DdQMx1MG5MXz
 gZk58+l3LwId6h9+OrwurpEW+ZmURLAtGMSyFdkeiZ3/XTnkbf1XnewC0QWQe56a
 EGLmjx1MFl45jspoy7UIUXsXoNJIfflEKhrgUzSUh8X2eLmvB9ws6A4BXxbVzyZl
 lZv3+uWimU2pFgdkB9jOCxoG4zFEr2o5ovLHG7zCCVo5JoXmTPQ5cMVBraH206ms
 +5vCmu4qI8uP5UlZW/mZfhrtDiMdHdQqqFOaQwVlOmoUbU6L6E6rxm3iVnov2Bbi
 iUgxoeJDxAb2cM2EWFK6oWVgr7+NkwvXM1IG8xtprhHrCdnC9r+psQr/dswb3LSg
 MJ7u/RCq3uixy2kWP8E0NEHw7ngQZ/ZKPqzfnmIWOC7tYUxgaL02I8Ff9/ZXAI2J
 CnmqFYOjrimHkcwXGOtKkXNvfU0DiL97qpK2AQNWElE8+bUUmpw+ltUrsdSycYmv
 Afc4WIcVrTA+a9laSLgjdZbolbNSa3p+cMIYdPrVx9g+xqygbxIKv+EDGNv1WHfD
 GU1gmohMY+ZkMOvFRMi8LAsEm0DH/etWE0py/8uyxDYKnGyD1Ur6452DStkmgGb2
 azmcaOyLdb+HXA==
 =Ga3K
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Switch to the generic C VDSO, as well as some cleanups of our VDSO
   setup/handling code.

 - Support for KUAP (Kernel User Access Prevention) on systems using the
   hashed page table MMU, using memory protection keys.

 - Better handling of PowerVM SMT8 systems where all threads of a core
   do not share an L2, allowing the scheduler to make better scheduling
   decisions.

 - Further improvements to our machine check handling.

 - Show registers when unwinding interrupt frames during stack traces.

 - Improvements to our pseries (PowerVM) partition migration code.

 - Several series from Christophe refactoring and cleaning up various
   parts of the 32-bit code.

 - Other smaller features, fixes & cleanups.

Thanks to: Alan Modra, Alexey Kardashevskiy, Andrew Donnellan, Aneesh
Kumar K.V, Ard Biesheuvel, Athira Rajeev, Balamuruhan S, Bill Wendling,
Cédric Le Goater, Christophe Leroy, Christophe Lombard, Colin Ian King,
Daniel Axtens, David Hildenbrand, Frederic Barrat, Ganesh Goudar,
Gautham R. Shenoy, Geert Uytterhoeven, Giuseppe Sacco, Greg Kurz,
Harish, Jan Kratochvil, Jordan Niethe, Kaixu Xia, Laurent Dufour,
Leonardo Bras, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu
Desnoyers, Nathan Lynch, Nicholas Piggin, Oleg Nesterov, Oliver
O'Halloran, Oscar Salvador, Po-Hsu Lin, Qian Cai, Qinglang Miao, Randy
Dunlap, Ravi Bangoria, Sachin Sant, Sandipan Das, Sebastian Andrzej
Siewior , Segher Boessenkool, Srikar Dronamraju, Tyrel Datwyler, Uwe
Kleine-König, Vincent Stehlé, Youling Tang, and Zhang Xiaoxu.

* tag 'powerpc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (304 commits)
  powerpc/32s: Fix cleanup_cpu_mmu_context() compile bug
  powerpc: Add config fragment for disabling -Werror
  powerpc/configs: Add ppc64le_allnoconfig target
  powerpc/powernv: Rate limit opal-elog read failure message
  powerpc/pseries/memhotplug: Quieten some DLPAR operations
  powerpc/ps3: use dma_mapping_error()
  powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10
  powerpc/perf: Fix Threshold Event Counter Multiplier width for P10
  powerpc/mm: Fix hugetlb_free_pmd_range() and hugetlb_free_pud_range()
  KVM: PPC: Book3S HV: Fix mask size for emulated msgsndp
  KVM: PPC: fix comparison to bool warning
  KVM: PPC: Book3S: Assign boolean values to a bool variable
  powerpc: Inline setup_kup()
  powerpc/64s: Mark the kuap/kuep functions non __init
  KVM: PPC: Book3S HV: XIVE: Add a comment regarding VP numbering
  powerpc/xive: Improve error reporting of OPAL calls
  powerpc/xive: Simplify xive_do_source_eoi()
  powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_EOI_FW
  powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_MASK_FW
  powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_SHIFT_BUG
  ...
2020-12-17 13:34:25 -08:00
Aneesh Kumar K.V 44b4c4450f powerpc/64s: Mark the kuap/kuep functions non __init
The kernel calls these functions on CPU online and hence they must not
be marked __init.

Otherwise if the memory they occupied has been reused the system can
crash in various ways. Sachin reported it caused his LPAR to
spontaneously restart with no other output. With xmon enabled it may
drop into xmon with a dump like:

  cpu 0x1: Vector: 700 (Program Check) at [c000000003c5fcb0]
      pc: 00000000011e0a78
      lr: 00000000011c51d4
      sp: c000000003c5ff50
     msr: 8000000000081001
    current = 0xc000000002c12b00
    paca    = 0xc000000003cff280	 irqmask: 0x03	 irq_happened: 0x01
      pid   = 0, comm = swapper/1
  ...
  [c000000003c5ff50] 0000000000087c38 (unreliable)
  [c000000003c5ff70] 000000000003870c
  [c000000003c5ff90] 000000000000d108

Fixes: 3b47b7549e ("powerpc/book3s64/kuap: Move KUAP related function outside radix")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Expand change log with details and xmon output]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201214080121.358567-1-aneesh.kumar@linux.ibm.com
2020-12-14 23:12:27 +11:00
Nicholas Piggin c33cd1ed60 powerpc/64s/iommu: Don't use atomic_ function on atomic64_t type
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111110723.3148665-3-npiggin@gmail.com
2020-12-09 23:48:14 +11:00
Ganesh Goudar 3ba150fb21 lkdtm/powerpc: Add SLB multihit test
To check machine check handling, add support to inject slb
multihit errors.

Co-developed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
[mpe: Use CONFIG_PPC_BOOK3S_64 to fix compile errors reported by lkp@intel.com]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201130083057.135610-1-ganeshgr@linux.ibm.com
2020-12-04 01:01:34 +11:00
Aneesh Kumar K.V 61130e203d powerpc/book3s64/kup: Check max key supported before enabling kup
Don't enable KUEP/KUAP if we support less than or equal to 3 keys.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201202043854.76406-1-aneesh.kumar@linux.ibm.com
2020-12-04 01:01:27 +11:00
Aneesh Kumar K.V c91435d95c powerpc/book3s64/hash/kuep: Enable KUEP on hash
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201127044424.40686-21-aneesh.kumar@linux.ibm.com
2020-12-04 01:01:27 +11:00
Aneesh Kumar K.V b2ff33a10c powerpc/book3s64/hash/kuap: Enable kuap on hash
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201127044424.40686-20-aneesh.kumar@linux.ibm.com
2020-12-04 01:01:27 +11:00
Aneesh Kumar K.V 48a8ab4eeb powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode.
Now that kernel correctly store/restore userspace AMR/IAMR values, avoid
manipulating AMR and IAMR from the kernel on behalf of userspace.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201127044424.40686-15-aneesh.kumar@linux.ibm.com
2020-12-04 01:01:26 +11:00
Aneesh Kumar K.V d5fa30e699 powerpc/book3s64/pkeys: Reset userspace AMR correctly on exec
On fork, we inherit from the parent and on exec, we should switch to default_amr values.

Also, avoid changing the AMR register value within the kernel. The kernel now runs with
different AMR values.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201127044424.40686-13-aneesh.kumar@linux.ibm.com
2020-12-04 01:01:26 +11:00
Aneesh Kumar K.V d94b827e89 powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash translation
This patch updates kernel hash page table entries to use storage key 3
for its mapping. This implies all kernel access will now use key 3 to
control READ/WRITE. The patch also prevents the allocation of key 3 from
userspace and UAMOR value is updated such that userspace cannot modify key 3.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201127044424.40686-9-aneesh.kumar@linux.ibm.com
2020-12-04 01:01:25 +11:00
Aneesh Kumar K.V d5b810b5c9 powerpc/book3s64/kuap: Rename MMU_FTR_RADIX_KUAP and MMU_FTR_KUEP
This is in preparation to adding support for kuap with hash translation.
In preparation for that rename/move kuap related functions to
non radix names. Also move the feature bit closer to MMU_FTR_KUEP.

MMU_FTR_KUEP is renamed to MMU_FTR_BOOK3S_KUEP to indicate the feature
is only relevant to BOOK3S_64

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201127044424.40686-8-aneesh.kumar@linux.ibm.com
2020-12-04 01:01:25 +11:00
Aneesh Kumar K.V 57b7505aa8 powerpc/book3s64/kuep: Move KUEP related function outside radix
The next set of patches adds support for kuep with hash translation.
In preparation for that rename/move kuap related functions to
non radix names.

Also set MMU_FTR_KUEP and add the missing isync().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201127044424.40686-7-aneesh.kumar@linux.ibm.com
2020-12-04 01:01:24 +11:00
Aneesh Kumar K.V 3b47b7549e powerpc/book3s64/kuap: Move KUAP related function outside radix
The next set of patches adds support for kuap with hash translation.
In preparation for that rename/move kuap related functions to
non radix names.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201127044424.40686-6-aneesh.kumar@linux.ibm.com
2020-12-04 01:01:24 +11:00
Aneesh Kumar K.V 39df17bc20 powerpc/book3s64/kuap/kuep: Move uamor setup to pkey init
This patch consolidates UAMOR update across pkey, kuap and kuep features.
The boot cpu initialize UAMOR via pkey init and both radix/hash do the
secondary cpu UAMOR init in early_init_mmu_secondary.

We don't check for mmu_feature in radix secondary init because UAMOR
is a supported SPRN with all CPUs supporting radix translation.
The old code was not updating UAMOR if we had smap disabled and smep enabled.
This change handles that case.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201127044424.40686-5-aneesh.kumar@linux.ibm.com
2020-12-04 01:01:24 +11:00
Aneesh Kumar K.V 227ae62552 powerpc/book3s64/kuap/kuep: Add PPC_PKEY config on book3s64
The config CONFIG_PPC_PKEY is used to select the base support that is
required for PPC_MEM_KEYS, KUAP, and KUEP. Adding this dependency
reduces the code complexity(in terms of #ifdefs) and enables us to
move some of the initialization code to pkeys.c

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201127044424.40686-4-aneesh.kumar@linux.ibm.com
2020-12-04 01:01:24 +11:00